I obtained two fastq files from GA paired end run. I filtered each
file by quality using fastq tool kit. Then some forward reads may be
removed by low quality whereas the reverse counterparts are OK to be
remained on the other file, or vice versa.
I want to remove those "unpaired" reads from filtered fastq files so
that the two new fastq files contain the identical sets of the reads.
Is it possible to do it on galaxy?
Thank you very much.
Not the most convenient solution, but what I normally do in this
is to combine the two files, filter then split again. There are tools
combining and splitting paired fastq files in Galaxy.
Hope it helps,
This question has come up before, and the best advice our team has to
offer is that in most cases, filtering the data this way is
Still - there are a few methods to do this, but they are tedious to do
one is to basically covert everything to tabular format, extract the
IDs, compare and join the datasets with the id lists, then convert
to fastq. Another is the one Carlos brings up - joining, filtering,
splitting - but that has not worked for all sequence formats in the
past. Neither of these is recommended, but you are of course welcome
test out and try whatever tools/methods you wish to.
With most analysis pipelines, is is fine to leave in the extra reads
proceed with the mapping step. Then, after mapping, this would be the
next opportunity do some filtering if you wanted to only retain
paired reads, etc. However, even this is not always necessary - it
depends on what analysis you are doing (e.g. not required for RNA-seq
These tool groups manipulate/provide stats on SAM/BAM datasets:
NGS: SAM Tools be sure to see -> Filter SAM
NGS: Picard (beta)
Hopefully this helps!
Galaxy Support and Training