Question: Any tools for separate unpaired reads in paired-end sequencing fastq files?
gravatar for zhhxu9
3.8 years ago by
zhhxu90 wrote:


I would like to know if there is any tool can do the following job?

I have some data files in fastq format generating from paired -end sequencing. the reads have been generally trimmed to remove the low quality reads and adaptor contaminations. But the R1 and R2 files do not have equal number reads, which means some R1 files have more reads and some has less. In that case, there will be some reads in each file that do not have a mate in another corresponding file.

So I want to know if there is any tool can separate those unpaired reads from the paired ones and output the files like: R1paired.fastq, R1unpaired.fastq, R2paired.fastq, R2unpaired.fastq? No other trimming will needed.

Thanks for your help.


qc prep reads unpaired paired-end • 3.7k views
ADD COMMENTlink modified 3.8 years ago by Jennifer Hillman Jackson25k • written 3.8 years ago by zhhxu90
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Zhenhua,

This is not needed for most analysis. It is generally a more straightforward approach to go ahead and map the sequences (in pairs, the statistics from the tool run will provide mapping success rates), then filter after for properly paired reads (and optionally remove unmapped reads). You would want to do this in preparation for variant analysis workflows. But, for RNA-seq workflows using the Tuxedo tools, no filtering is required - Cufflinks/Cuffdiff will only consider mapped pairs passing the criteria set in the tool form parameters in the analysis.

However, if you wish to do this at the start anyway (it does make for simplier statistics) - one way is in the Published workflow 'Create matched paired end datasets' below (created by Dave Clements). I just uploaded it right now to Main and have only tested it so far on a CloudMan Galaxy, but I wouldn't expect any problems using it on the public Main Galaxy instance (the tools included are identical between the two). But, any feedback about issues would be appreciated. You can modify the workflow yourself after importing it of course, but we'd love to know so we can fix ours, too (will be testing it on Main very soon - this week, am publishing it early for you!).

Hopefully one of these alternative works out for you! Best, Jen, Galaxy team

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Jennifer Hillman Jackson25k

Thanks Jennifer, I appreciate your kind help. I will try to use the new tool you provided.

ADD REPLYlink written 3.8 years ago by zhhxu90

Hi Jennifer,

When I try to use the workflow, do I have to input the raw data? Because I only have the data which has been trimmed for low quality and adaptor sequence. So I do not have the raw data. Under the first two droplist, there is no blank to be selected. Any idea about this? Thanks.


ADD REPLYlink written 3.8 years ago by zhhxu90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 68 users visited in the last hour