Any tools for separate unpaired reads in paired-end sequencing fastq files?

Question: Any tools for separate unpaired reads in paired-end sequencing fastq files?

4.2 years ago by

zhhxu9 • 0

Canada

zhhxu9 • 0 wrote:

Hi,

I would like to know if there is any tool can do the following job?

I have some data files in fastq format generating from paired -end sequencing. the reads have been generally trimmed to remove the low quality reads and adaptor contaminations. But the R1 and R2 files do not have equal number reads, which means some R1 files have more reads and some has less. In that case, there will be some reads in each file that do not have a mate in another corresponding file.

So I want to know if there is any tool can separate those unpaired reads from the paired ones and output the files like: R1paired.fastq, R1unpaired.fastq, R2paired.fastq, R2unpaired.fastq? No other trimming will needed.

Thanks for your help.

Zhenhua

qc prep reads unpaired paired-end • 4.3k views

ADD COMMENT • link •

modified 4.2 years ago by Jennifer Hillman Jackson ♦ 25k • written 4.2 years ago by zhhxu9 • 0

4.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Zhenhua,

This is not needed for most analysis. It is generally a more straightforward approach to go ahead and map the sequences (in pairs, the statistics from the tool run will provide mapping success rates), then filter after for properly paired reads (and optionally remove unmapped reads). You would want to do this in preparation for variant analysis workflows. But, for RNA-seq workflows using the Tuxedo tools, no filtering is required - Cufflinks/Cuffdiff will only consider mapped pairs passing the criteria set in the tool form parameters in the analysis.

However, if you wish to do this at the start anyway (it does make for simplier statistics) - one way is in the Published workflow 'Create matched paired end datasets' below (created by Dave Clements). I just uploaded it right now to Main and have only tested it so far on a CloudMan Galaxy, but I wouldn't expect any problems using it on the public Main Galaxy instance (the tools included are identical between the two). But, any feedback about issues would be appreciated. You can modify the workflow yourself after importing it of course, but we'd love to know so we can fix ours, too (will be testing it on Main very soon - this week, am publishing it early for you!).

http://usegalaxy.org/u/galaxyproject/w/re-pair-paired-ends-after-qc-may-have-broken-them-imported-from-uploaded-file

Hopefully one of these alternative works out for you! Best, Jen, Galaxy team

ADD COMMENT • link modified 4.2 years ago • written 4.2 years ago by Jennifer Hillman Jackson ♦ 25k

Thanks Jennifer, I appreciate your kind help. I will try to use the new tool you provided.

ADD REPLY • link written 4.2 years ago by zhhxu9 • 0

Hi Jennifer,

When I try to use the workflow, do I have to input the raw data? Because I only have the data which has been trimmed for low quality and adaptor sequence. So I do not have the raw data. Under the first two droplist, there is no blank to be selected. Any idea about this? Thanks.

Zhenhua

ADD REPLY • link written 4.2 years ago by zhhxu9 • 0

Similar posts • Search »