Question: Salmon human GTF or tabular annotation reference input dataset
gravatar for sebylouis
9 months ago by
sebylouis30 wrote:

Hello I would like to use salmon to do RNA seq analysis. I run it successfully with ensembl reference files but I prefer to use NCBI /UCSC files ,,any suggestions..? Please help me to find out the appropriate human reference transcriptome and GTF file. thanks Seby

annotation ucsc gtf salmon rna-seq • 577 views
ADD COMMENTlink modified 9 months ago by Jennifer Hillman Jackson25k • written 9 months ago by sebylouis30
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:


Annotation GTF datasets can be extracted from the UCSC Table browser directly into Galaxy (Get Data > UCSC main). The problem will be that the gene_id and transcript_id attributes will have same content from this source (both will be the transcript_id value). This is true for all GTF datasets extracted from the UCSC Table browser and is not related to the track chosen, the genome, or if the "Send to Galaxy" option is used or not.

Salmon needs distinct values for transcript and gene - whether inputting a GTF or a tabular transcript-gene annotation mapping dataset. There are ways to extract other datasets from UCSC (the gene value is included in other linked tables) and replace the gene_id value in the GTF but the processing is not straightforward.

A better alternative is the iGenomes version of the reference annotation. This is based on the UCSC RefSeq Genes track. Find these linked under Homo sapiens >> UCSC/hg38 or UCSC/hg19 at their website. Pick the genome that you are using in other steps. The data will be a match for the built-in genome indexes available across all tools at Galaxy main that are named hg38 or hg19.


How to upload: Download the target iGenomes tar.gz archive to your computer, uncompress it locally, then upload just the genes.gtf dataset to Galaxy. This version of the annotation also includes extra attributes that are utilized by HISAT2, Cufflinks, Cuffmerge, Cuffdiff -- specifically: tss_id, p_id, and gene_name -- making it the best option if those are also part of your analysis workflow.

Galaxy tutorials:

Support FAQs:

Hope that helps! Jen, Galaxy team

ADD COMMENTlink written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour