Question: Multi Fastq joining
gravatar for ghliu83
4.5 years ago by
United States
ghliu830 wrote:

Hi All, how to join multi FASTQ left-hand reads or right-hand reads together? I need map sequence data from different experiments to a common reference, but need first put all left-hand reads or right-hand reads together as one file. Thank you.

assembly tool textmanipulation • 2.7k views
ADD COMMENTlink modified 4.5 years ago by Jennifer Hillman Jackson25k • written 4.5 years ago by ghliu830

Thank you. I will try.

ADD REPLYlink written 4.5 years ago by ghliu830

You state that you want to map paired end reads to a reference. If the reference genome exists, you would probably be well advised to map them as paired end reads to take advantage of the pairing. If the reference genome does not exist, you would have to wonder about the sanity of whoever chose paired end reads - you'd be better off with lots of very long single ended reads to assemble.

As you probably know, most mappers (eg bowtie or bwa) understand paired data and you will gain mapping precision if the mapper deals with each pair as a pair and ensures that both ends of the read map correctly. There may be good reasons to construct a joined file, but mapping to a known reference is not one of them IMHO.

OTOH if you have no reference genome, you might be trying to create a de-novo reference from all available sequence - in which case you may need to try assembling (velvet/abyss etc) all the sequences (all pairs) from all samples - concatenation will work as described below but unless you have a huge amount of sequence, your de-novo reference sequence may have lots of short contigs which don't allow the pairs to map properly.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by fubar1.1k

I guess what the OP tries to do is not to merge the two pairs in one file - it's just that his words are slightly misleading - instead, I think, he simply has fastq input files like this:

r1.01.fastq, r1.02.fastq, r1.03.fastq

r2.01.fastq, r2.02.fastq, r2.03.fastq

and he would like to join all r1 files into one r1.fastq file and all r2 files into one r2.fastq to then pass these two files to an aligner.

@ghliu83: correct me if I'm wrong.

ADD REPLYlink written 4.5 years ago by Wolfgang Maier600

Thank you. That is what I want. I used concatenate datasets tool to join all r1 reads or r2 reads and it worked. Now the mapping is waiting to run.

ADD REPLYlink written 4.5 years ago by ghliu830
gravatar for Wolfgang Maier
4.5 years ago by
Wolfgang Maier600 wrote:

If I am getting you right and what you have is a multi-part fastq for the r1 reads of a paired-end run that you want to join together into a single file, and another multi-part fastq for the r2 reads that you also want to join into a single but separate file, then there is the

Concatenate datasets tool for you under Text Manipulation.

Just make sure that you keep the same order of the parts when joining the r1 files and the r2 files so that reads with the same names end up on the same lines in the two files.



ADD COMMENTlink written 4.5 years ago by Wolfgang Maier600
gravatar for Jennifer Hillman Jackson
4.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:


If you want to just merge together all left reads into one file and then all right reads into one file, use the tool "Text Manipulation -> Concatenate datasets tail-to-head".

I believe I misunderstood originally and have removed that comment, sorry!

Jen, Galaxy team

ADD COMMENTlink written 4.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour