Hello everyone,
I need to limit my sequence files to 500k reads for a downstream application. I could not find the way to do it in Galaxy. Any suggestions as for the tool I should use?
thanks for your help
i just found the way
Hello everyone,
I need to limit my sequence files to 500k reads for a downstream application. I could not find the way to do it in Galaxy. Any suggestions as for the tool I should use?
thanks for your help
i just found the way
Glad this was worked out!
For everyone else, one way to do this is with the tool "Text manipulation -> Select first". Multiply the number of reads you want by 4 (there are 4 lines per sequence in a fastq file) and set the number of lines to that number.
If limiting a fasta file, and these are short reads (sequence text is on a single line), do the same but multiply by 2. (one line for the identifier, one line for the sequence itself).
If fasta with longer sequences that are wrapped, convert the fasta dataset to tabular, then select the exact number of lines, convert tabular to fasta, and finish by wrapping the fasta lines. Extra tools are in the group "Fasta manipulation".
Best, Jen, Galaxy team