I want to assemble a genome using de novo assembly and I am very new to it. The Illumina sequencing produced 3 runs of sequence (https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP064196) I don't know whish one must be usedas input file for assembly. As I learnt from galaxy tutorials, their must be two file for pair end ( forward and reverse) to be used in assembly using Velvet . But there are three files in this study ( SRR2518214, SRR2518215 and SRR2518216). Please advice me which one are the right ones and what pipeline is suitable for assembly of these data.
Each of the accessions in this study derives from paired-end sequencing: https://www.ebi.ac.uk/ena/data/view/PRJNA238063
Depending on which tool you are using to get the data into Galaxy, each of the three might appear as a single dataset (SRA archive or interleaved fastq) or two datasets (forward + reverse fastq reads ). So you'll have either three or six fastq datasets. These can be organized into one, three, or six datasets to enter on the tool form for a single assembly run.
- If an SRA archive, extract the fastq data with the tool NCBI SRA Tools > Download and Extract Reads in FASTA/Q format from NCBI SRA. The first setting on the tool form has an option to extract fastq data from an SRA archive already in your history.
- If an interleaved fastq, Velvet/VelvetOptimiser can use this data directly (it is an input type on that tool's form: "Paired Interleaved") but many other tools will not work with it. Meaning, if you want to do some QA on the data first, you should break it up into distinct forward + reverse fastqsanger datasets. FAQ: Reformatting fastq data loaded with NCBI SRA https://galaxyproject.org/support/ncbi-sra-fastq/
- If you don't have the data in Galaxy yet, it can be retrieved with the NCBI tool above or with Get Data > EBI SRA.
Whatever format you plan to use, once the data is in the working history, multiple datasets can be selected on the Velvet/VelvetOptimiser tool form at the same time. There are three small icons next to the dataset selection to toggle how the selection is to be made based on the input organization. The data can be in individual datasets (three interleaved or three forward + three reverse) or dataset collection(s) (one Paired Collection or two List Collections). FAQ: Understanding the Analysis History https://galaxyproject.org/tutorials/histories/
Support FAQs: https://galaxyproject.org/support/
Galaxy tutorials: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team