Also, I see unknown contigs that are not present in my chosen reference hg38 gtf file (Gencode.v21.annotation.gtf) in my .BAM files.
Any idea what's going on here?
Also, I see unknown contigs that are not present in my chosen reference hg38 gtf file (Gencode.v21.annotation.gtf) in my .BAM files.
Any idea what's going on here?
Hello,
The output is not coordinate sorted. It is generally a good idea to include a post-alignment sort step in your workflow for BAM dataset outputs (from any mapping tool) before using them as inputs to certain other tools. In short, if a downstream tool fails, try resorting the BAM dataset and rerun as the first pass solution. Most downstream tools expect a coordinate-sort, however, some process better with a queryname-sort (these tools note this requirement on the entry form).
Tools that perform a sort as a side-process option on the tool form work best with very genomes/BAM datasets. For larger genomes/BAM datasets, sorting first in an independent step splits out the resources needed per-job and can avoid errors where jobs exceed cluster resources.
How-to: https://galaxyproject.org/support/sort-your-inputs/
The BAM alignments will contain hits to the target reference genome by default. Any included reference annotation is used as a guide for the identification and inclusion of known splice sites. The tool has many advanced settings for filtering the output (including splice sites) on the tool form.
Thanks! Jen, Galaxy team