Question: Remove "Unpaired" Reads From Quality-Filtered Pared-End Fastq Files.
gravatar for 柴田 弘紀
5.9 years ago by
柴田 弘紀20 wrote:
Hi there, I obtained two fastq files from GA paired end run. I filtered each file by quality using fastq tool kit. Then some forward reads may be removed by low quality whereas the reverse counterparts are OK to be remained on the other file, or vice versa. I want to remove those "unpaired" reads from filtered fastq files so that the two new fastq files contain the identical sets of the reads. Is it possible to do it on galaxy? Thank you very much. Hiroki
galaxy • 3.9k views
ADD COMMENTlink modified 5.9 years ago by Jennifer Hillman Jackson25k • written 5.9 years ago by 柴田 弘紀20
gravatar for Carlos Borroto
5.9 years ago by
Washington Metropolitan Area
Carlos Borroto390 wrote:
Not the most convenient solution, but what I normally do in this situation is to combine the two files, filter then split again. There are tools for combining and splitting paired fastq files in Galaxy. Hope it helps, Carlos
ADD COMMENTlink written 5.9 years ago by Carlos Borroto390
gravatar for Jennifer Hillman Jackson
5.9 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Hiroki, This question has come up before, and the best advice our team has to offer is that in most cases, filtering the data this way is unnecessary. Still - there are a few methods to do this, but they are tedious to do - one is to basically covert everything to tabular format, extract the IDs, compare and join the datasets with the id lists, then convert back to fastq. Another is the one Carlos brings up - joining, filtering, splitting - but that has not worked for all sequence formats in the past. Neither of these is recommended, but you are of course welcome to test out and try whatever tools/methods you wish to. With most analysis pipelines, is is fine to leave in the extra reads and proceed with the mapping step. Then, after mapping, this would be the next opportunity do some filtering if you wanted to only retain properly paired reads, etc. However, even this is not always necessary - it depends on what analysis you are doing (e.g. not required for RNA-seq analysis). These tool groups manipulate/provide stats on SAM/BAM datasets: NGS: SAM Tools be sure to see -> Filter SAM NGS: Picard (beta) Hopefully this helps! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training
ADD COMMENTlink written 5.9 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour