Salmon human GTF or tabular annotation reference input dataset

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Salmon human GTF or tabular annotation reference input dataset

2

9 months ago by

sebylouis • 30

sebylouis • 30 wrote:

Hello I would like to use salmon to do RNA seq analysis. I run it successfully with ensembl reference files but I prefer to use NCBI /UCSC files ,,any suggestions..? Please help me to find out the appropriate human reference transcriptome and GTF file. thanks Seby

annotation ucsc gtf salmon rna-seq • 577 views

ADD COMMENT • link •

modified 9 months ago by Jennifer Hillman Jackson ♦ 25k • written 9 months ago by sebylouis • 30

1

9 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Annotation GTF datasets can be extracted from the UCSC Table browser directly into Galaxy (Get Data > UCSC main). The problem will be that the gene_id and transcript_id attributes will have same content from this source (both will be the transcript_id value). This is true for all GTF datasets extracted from the UCSC Table browser and is not related to the track chosen, the genome, or if the "Send to Galaxy" option is used or not.

Salmon needs distinct values for transcript and gene - whether inputting a GTF or a tabular transcript-gene annotation mapping dataset. There are ways to extract other datasets from UCSC (the gene value is included in other linked tables) and replace the gene_id value in the GTF but the processing is not straightforward.

A better alternative is the iGenomes version of the reference annotation. This is based on the UCSC RefSeq Genes track. Find these linked under Homo sapiens >> UCSC/hg38 or UCSC/hg19 at their website. Pick the genome that you are using in other steps. The data will be a match for the built-in genome indexes available across all tools at Galaxy main https://usegalaxy.org that are named hg38 or hg19.

Source: https://support.illumina.com/sequencing/sequencing_software/igenome.html

How to upload: Download the target iGenomes tar.gz archive to your computer, uncompress it locally, then upload just the genes.gtf dataset to Galaxy. This version of the annotation also includes extra attributes that are utilized by HISAT2, Cufflinks, Cuffmerge, Cuffdiff -- specifically: tss_id, p_id, and gene_name -- making it the best option if those are also part of your analysis workflow.

Galaxy tutorials: https://galaxyproject.org/learn/

Support FAQs: https://galaxyproject.org/support/

Hope that helps! Jen, Galaxy team

ADD COMMENT • link written 9 months ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

cDNA in TopHat
Hello: I am new here so first, nice to meet to all of you. I am a PhD Student and I have a que...
Query in uploading data in stringtie
When I am using stringtie in galaxy, the one option is **Reference annotation to use for guiding ...
hg38 GFF3/GTF source
Hi, I am trying to use cuffdiff to compare relative gene expression from some human cell line sa...
Salmon: Error while indexing Human transcriptome file
Hi there! I downloaded what I hope to be a human transcriptome data file from biomart. I select ...
Genome build accession for Galaxy build-in genomes
How can I find exact genome build (NCBI accession number, like GCA_000001405.17) for build-in gen...
Salmon: no built-in index reference transcriptome available
I'm trying Salmon for the first time and when I select "Use a built-in index" for the reference t...
Deseq2 give me sequence instead of gene ID
Hi everyone, I am using deseq2 to test differential expression from salmon files (TPM) with a r...
I can't choose a "reference transcriptome"
I'm trying to use Salmon Transcript quantification and I can't choose a reference transcriptome. ...
Where to download Human (Homo sapiens) (b38): hg38 reference genome file available for your RNA STAR?
I would like to download that same exact reference genome file that is available for everyone to ...
Salmon tool error: libgomp.so.1 missing
I started a salmon job on the main galaxy site and I got the error: Fatal error: Exit code 127 (...
gff to gtf conversion
hi i have uploaded gff format GRCh37.p13 in glaxy. however i am unable to see it on doing any an...
Cufflinks Fpkm
Hi: I gain the SOLiD sequencing data.I used bowtie to map human genome then I sort the s...
RNAseq: RNA STAR ---Mapping to a viral genome
Hi, I am analyzing an infection time course in a human cell line transduced with Vaccinia virus. ...
Question About Formattung Mouse (Mm9) Gtf
I have read in the mailing list that you have a workflow which can modify the human GTF file so ...
Mm9 Reference Gtf File For Cuffcompare
I wonder if some has mouse genome GTF file compatible with Tophat/ Cuffcompare. The contig names ...
RNA STAR Gapped-read mapper for RNA-seq data
Hello Galaxy team, I am trying to map an RNAseq data to human reference genome using the RNA STA...
Replicate Common Gene Names
This is somewhat Galaxy related, but more of a general question. I did all of my mapping and ini...
gtf file error
Dear all, My reference genome is Duck, not index in Galaxy reference genome. So i downloaded fast...
Run Tophat In Galaxy
Hi how can I specify a GTF gene annotation file when running tophat to guide the alignment to hum...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 172 users visited in the last hour