15 months ago by
United States
Hello,
Questions 1 & 3 are large in scope, but below I give some general advice and resources. If you need help with usage - please let us know.
1 - This Q&A includes a link to a publication and a summary of differences for these tools: https://www.quora.com/How-is-Bowtie-different-from-BWA-BWA-MEM. There are other resources - a google with the tool names will bring up much prior Q&A, summaries, opinions, publications, and various discussions around specific use cases. You might also consider mapping your own data with each to compare the results. Reviewing what others have determined is a useful place to start, yet running a few tests yourself is the ultimately the best way to determine optimal tools/settings that fit your specific data and goals.
2 - Bowtie2 will accept either two distinct paired fastq datasets (forward and reverse) or one interleaved fastq dataset. The select menu allows you to choose the type of input. If your data is already in two datasets, simply enter it in that format. The output from FASTQ Joiner is not the same content as an interleaved fastq dataset.
Joined = the forward and reverse sequence content is joined (merged) into a single sequence and quality string (single fastq record). All sequences that had a matched pair (based on the sequence identifier) will be included in the output from FASTQ Joiner. This would be entered into tools as an unpaired dataset. There are use cases for this type of input - but in general - if you have paired end data, it is best to input it as two matched paired-end datasets.
Interleaved = the forward and reverse sequence content is concatenated (stacked) into a single dataset with the distinct fastq records retained. The forward read will be included (all original 4 fastq lines) followed by the reverse read (all 4 original fastq lines) - for all records. Interleaved fastq data comes from the data source - it is not produced by a Galaxy tool.
3 - These two tools use a different algorithm to make calls and output slightly different VFC content. The NVC tool is capable of outputting all calls, including stranded calls, and the major/minor alleles can be reviewed/filtered using the tool Variant Annotator. More details are here (also linked from the tool wrapper): https://genomebiology.biomedcentral.com/articles/10.1186/gb4161. For more about Freebayes processing details, the manual is the best place to start.
Thanks! Jen, Galaxy team