Question: Is the RNA STAR aligned file already been sorted?
0
gravatar for sung-jo
15 months ago by
sung-jo0
sung-jo0 wrote:

Also, I see unknown contigs that are not present in my chosen reference hg38 gtf file (Gencode.v21.annotation.gtf) in my .BAM files.

Any idea what's going on here?

rna-seq • 354 views
ADD COMMENTlink modified 15 months ago by Jennifer Hillman Jackson25k • written 15 months ago by sung-jo0
0
gravatar for Jennifer Hillman Jackson
15 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The output is not coordinate sorted. It is generally a good idea to include a post-alignment sort step in your workflow for BAM dataset outputs (from any mapping tool) before using them as inputs to certain other tools. In short, if a downstream tool fails, try resorting the BAM dataset and rerun as the first pass solution. Most downstream tools expect a coordinate-sort, however, some process better with a queryname-sort (these tools note this requirement on the entry form).

Tools that perform a sort as a side-process option on the tool form work best with very genomes/BAM datasets. For larger genomes/BAM datasets, sorting first in an independent step splits out the resources needed per-job and can avoid errors where jobs exceed cluster resources.

How-to: https://galaxyproject.org/support/sort-your-inputs/

The BAM alignments will contain hits to the target reference genome by default. Any included reference annotation is used as a guide for the identification and inclusion of known splice sites. The tool has many advanced settings for filtering the output (including splice sites) on the tool form.

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 15 months ago • written 15 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour