Question: Limit to 500k reads
gravatar for devillartay
3.2 years ago by
devillartay30 wrote:

Hello everyone,

I need to limit my sequence files to 500k reads for a downstream application. I could not find the  way to do it in Galaxy. Any suggestions as for the tool I should use?

thanks for your help


i just found the way


ADD COMMENTlink modified 3.2 years ago by Jennifer Hillman Jackson22k • written 3.2 years ago by devillartay30
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson22k wrote:

Glad this was worked out!

For everyone else, one way to do this is with the tool "Text manipulation -> Select first". Multiply the number of reads you want by 4 (there are 4 lines per sequence in a fastq file) and set the number of lines to that number.

If limiting a fasta file, and these are short reads (sequence text is on a single line), do the same but multiply by 2. (one line for the identifier, one line for the sequence itself).

If fasta with longer sequences that are wrapped, convert the fasta dataset to tabular, then select the exact number of lines, convert tabular to fasta, and finish by wrapping the fasta lines. Extra tools are in the group "Fasta manipulation".

Best, Jen, Galaxy team

ADD COMMENTlink written 3.2 years ago by Jennifer Hillman Jackson22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 92 users visited in the last hour