Hi All, how to join multi FASTQ left-hand reads or right-hand reads together? I need map sequence data from different experiments to a common reference, but need first put all left-hand reads or right-hand reads together as one file. Thank you.
If I am getting you right and what you have is a multi-part fastq for the r1 reads of a paired-end run that you want to join together into a single file, and another multi-part fastq for the r2 reads that you also want to join into a single but separate file, then there is the
Concatenate datasets tool for you under Text Manipulation.
Just make sure that you keep the same order of the parts when joining the r1 files and the r2 files so that reads with the same names end up on the same lines in the two files.
If you want to just merge together all left reads into one file and then all right reads into one file, use the tool "Text Manipulation -> Concatenate datasets tail-to-head".
I believe I misunderstood originally and have removed that comment, sorry!
Jen, Galaxy team
Thank you. I will try.
You state that you want to map paired end reads to a reference. If the reference genome exists, you would probably be well advised to map them as paired end reads to take advantage of the pairing. If the reference genome does not exist, you would have to wonder about the sanity of whoever chose paired end reads - you'd be better off with lots of very long single ended reads to assemble.
As you probably know, most mappers (eg bowtie or bwa) understand paired data and you will gain mapping precision if the mapper deals with each pair as a pair and ensures that both ends of the read map correctly. There may be good reasons to construct a joined file, but mapping to a known reference is not one of them IMHO.
OTOH if you have no reference genome, you might be trying to create a de-novo reference from all available sequence - in which case you may need to try assembling (velvet/abyss etc) all the sequences (all pairs) from all samples - concatenation will work as described below but unless you have a huge amount of sequence, your de-novo reference sequence may have lots of short contigs which don't allow the pairs to map properly.
I guess what the OP tries to do is not to merge the two pairs in one file - it's just that his words are slightly misleading - instead, I think, he simply has fastq input files like this:
r1.01.fastq, r1.02.fastq, r1.03.fastq
r2.01.fastq, r2.02.fastq, r2.03.fastq
and he would like to join all r1 files into one r1.fastq file and all r2 files into one r2.fastq to then pass these two files to an aligner.
@ghliu83: correct me if I'm wrong.
Thank you. That is what I want. I used concatenate datasets tool to join all r1 reads or r2 reads and it worked. Now the mapping is waiting to run.