5 months ago by
United States
Hi,
I renamed the subject to reflect the question better (to link it to prior Q&A).
If the annotation data source is from the tool Get Data > UCSC main with the output in the Table Browser set to gtf
, then that annotation will technically work. Just be aware that counts will be summarized by transcript, not gene. Why? UCSC data extracted with this method returns annotation with both gene_id and transcript_id to the same value -- "transcript". This is how the UCSC Table Browser works.
The datatype gff
would not be assigned when retrieved by this method/source when directly importing to Galaxy with the UCSC TB.
Perhaps the data was loaded using another method? You can either accept that as a usable summary for your counts and reassign the datatype gtf
, if it matches that datatype specification/mapping database genome build/version, or choose another annotation source with gene_id summarized (strongly recommended).
Wherever you choose to obtain the annotation check it versus the genome used for mapping. You may need to assign the correct database to BAM inputs due to a small bug fixed last week.
The FAQs here explain in more details: https://galaxyproject.org/support/#troubleshooting. Plus you can review prior Q&A at the right sidebar (or search for the term "htseq") to see how others have resolved annotation/database/datatype data conflicts within the context of their RNA-seq analysis.
Tophat is considered deprecated, with HISAT2 as the replacement. Please see the Galaxy RNA-seq tutorials for example workflows: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team