Remove "Unpaired" Reads From Quality-Filtered Pared-End Fastq Files.

Question: Remove "Unpaired" Reads From Quality-Filtered Pared-End Fastq Files.

5.9 years ago by

Hi there, I obtained two fastq files from GA paired end run. I filtered each file by quality using fastq tool kit. Then some forward reads may be removed by low quality whereas the reverse counterparts are OK to be remained on the other file, or vice versa. I want to remove those "unpaired" reads from filtered fastq files so that the two new fastq files contain the identical sets of the reads. Is it possible to do it on galaxy? Thank you very much. Hiroki

galaxy • 3.9k views

ADD COMMENT • link •

modified 5.9 years ago by Jennifer Hillman Jackson ♦ 25k • written 5.9 years ago by 柴田弘紀 • 20

5.9 years ago by

Carlos Borroto • 390

Washington Metropolitan Area

Carlos Borroto • 390 wrote:

Not the most convenient solution, but what I normally do in this situation is to combine the two files, filter then split again. There are tools for combining and splitting paired fastq files in Galaxy. Hope it helps, Carlos

ADD COMMENT • link written 5.9 years ago by Carlos Borroto • 390

5.9 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Hiroki, This question has come up before, and the best advice our team has to offer is that in most cases, filtering the data this way is unnecessary. Still - there are a few methods to do this, but they are tedious to do - one is to basically covert everything to tabular format, extract the IDs, compare and join the datasets with the id lists, then convert back to fastq. Another is the one Carlos brings up - joining, filtering, splitting - but that has not worked for all sequence formats in the past. Neither of these is recommended, but you are of course welcome to test out and try whatever tools/methods you wish to. With most analysis pipelines, is is fine to leave in the extra reads and proceed with the mapping step. Then, after mapping, this would be the next opportunity do some filtering if you wanted to only retain properly paired reads, etc. However, even this is not always necessary - it depends on what analysis you are doing (e.g. not required for RNA-seq analysis). These tool groups manipulate/provide stats on SAM/BAM datasets: NGS: SAM Tools be sure to see -> Filter SAM NGS: Picard (beta) Hopefully this helps! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

ADD COMMENT • link written 5.9 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »