3.8 years ago by
United States
Hi,
To merge together multiple fastq files from distinct runs belonging to the same exact sample - use the tool Text Manipulation: Concatenate datasets. This tool will actual merge together many datatypes - in any way you want - keeping the logic of the data grouping is for you to organize (and label).
Then run the FastQC tool on each sample. Or you can run it on each run independently - may provide more useful stats if the data are from different runs, as some may have had problems and others not. Then merge by sample after. You definitely want to keep sample datasets together, distinct from each other, when going forward from here, including fwd and rev reads in distinct datasets, through QC and mapping. For example, at the end once merged and quality scores adjusted (as needed), run FastQC as a final QA step to gather metrics on the final inputs. If you do more QC, run again. Then proceed to mapping.
This changes after mapping and read groups have been added to the SAM/BAM datasets. For RNA-Seq analysis, keep these distinct by sample throughout. But for variant analysis, you often want to merge these before performing the calls. See the Picard tool group to merge BAM datasets.
See our Support wiki for help with Fastq formatting and other common analysis Qs:
https://wiki.galaxyproject.org/Support
Thanks! Jen, Galaxy team