Question: Subtract
0
gravatar for Xianrong Wong
6.6 years ago by
Xianrong Wong90 wrote:
Hello, I am using the subtract (whole dataset) tool. I converted my fastq file to tabular with 2 columns: 1. Identifier and 2. sequence. I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset. This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file. Any idea why this is so? Jose
• 949 views
ADD COMMENTlink modified 6.6 years ago by Jennifer Hillman Jackson25k • written 6.6 years ago by Xianrong Wong90
0
gravatar for Jennifer Hillman Jackson
6.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello, Using the 'Subtract' tool between FASTQ datasets can be memory intensive since it literally involves sorting and then comparing each character between the two files. This is likely not necessary. I have seen queries such as yours run successfully on even very large datasets by eliminating the Subtract step and instead using a 'Select' with "NOT Matching' on the original dataset. Example: current dataflow: 1 - original file A 2 - select positive match expression 'X' to create file B 3 - subtract file B from file A to create file C better: 1 - original file A 2 - select negative match expression 'X' to create file C If this failure is on the public main Galaxy server and you do not wish to change your query, then moving to a cloud instance and experimenting with larger memory options is one suggestion: http://usegalaxy.org/cloud Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour