Question: Joining multiple fastq files that are spread across two lanes
2.4 years ago by
Here is a screenshot of my data

I know how to concatenate files. I'm just not sure which files I should link together. As you can see from the screenshot, there are 5 R1's from lane one and 5 R1's from lane two. Same scenario for all of the R2's. Do I combine every R1 from both lanes? Do I combine every R2 from both lanes? How should I run bowtie/BWA with this data? Any help will be greatly appreciated. Thanks!

2.4 years ago by
United States
It appears that replicates were concatenated together in the shared file (assumption). Try leaving these as distinct datasets and join the fastq for the paired sequences between lanes in these first, then proceed.

Depending on the analysis, you may want to combine R1 and R2 into the same read by overlap and/or combine directly (sometimes with a buffer region between the two). Other analyses work best with R1 and R2 entered distinctly on the tool form. Tools in Text Manipulation offer many options for this type of manipulation.

One example of doing this is in a training session that was presented just a few days ago for a Metagenomics pipeline at GCC 2016. See it here:

All training sessions at the conference are available on-line, complete with the video taping of the session. Review others for those that match your analysis goals. More training resources are linked at the top of the Galaxy Support wiki:

Good luck! Jen, Galaxy team

Thank you for your help!

