Subtract

Question: Subtract

6.6 years ago by

Hello, I am using the subtract (whole dataset) tool. I converted my fastq file to tabular with 2 columns: 1. Identifier and 2. sequence. I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset. This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file. Any idea why this is so? Jose

• 949 views

ADD COMMENT • link •

modified 6.6 years ago by Jennifer Hillman Jackson ♦ 25k • written 6.6 years ago by Xianrong Wong • 90

6.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello, Using the 'Subtract' tool between FASTQ datasets can be memory intensive since it literally involves sorting and then comparing each character between the two files. This is likely not necessary. I have seen queries such as yours run successfully on even very large datasets by eliminating the Subtract step and instead using a 'Select' with "NOT Matching' on the original dataset. Example: current dataflow: 1 - original file A 2 - select positive match expression 'X' to create file B 3 - subtract file B from file A to create file C better: 1 - original file A 2 - select negative match expression 'X' to create file C If this failure is on the public main Galaxy server and you do not wish to change your query, then moving to a cloud instance and experimenting with larger memory options is one suggestion: http://usegalaxy.org/cloud Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org

ADD COMMENT • link written 6.6 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »