Question: Need help with RNA-seq quantification
0
gravatar for dyn1982
3.1 years ago by
dyn19820
United States
dyn19820 wrote:

Hi Galaxy Support,

 

I want to use cufflinks to quantify my own  bam file, which is not from galaxy tophat. I tried several times but failed to get id. I can get fpkm. I import hg19 gtf file from ucsc. This time I got gene-id  but all fpkm is 0. I am really frustrated with this. Please give me some instruction about setting parameters.

bam • 1.0k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by dyn19820

This is the latest setting. I still got CUFF.id instead of refseq id or gene id. Anyone can help me? 

Input Parameter Value Note for rerun
SAM or BAM file of aligned RNA-Seq reads 10: run0542_lane12_index711=14749_1_IPD_001_14~human_align.bam  
Max Intron Length 300000  
Min Isoform Fraction 0.1  
Pre MRNA Fraction 0.15  
Use Reference Annotation Use reference annotation guide  
Reference Annotation 19: UCSC Main on Human: refGene (genome)  
3prime overhang tolerance 600  
Intronic overhang tolerance 50  
Disable tiling of reference transcripts No  
Perform Bias Correction Yes  
Reference sequence data cached  
Using reference genome Homo_sapiens_nuHg19_mtrCRS  
Use multi-read correct Yes  
Apply length correction Cufflinks Effective Length Correction  
Global model (for use in Trackster) No dataset  
Set advanced Cufflinks options Yes  
Library prep used for input reads Auto Detect  
Mask File 19: UCSC Main on Human: refGene (genome)  
Inner mean distance 45  
Inner distance standard deviation 20  
Max MLE iterations 5000  
Alpha value for the binomial test used during false positive spliced alignment filtration 0.001  
percent read overhang taken as suspiciously small 0.09  
Intronic overhang tolerance 8  
Maximum genomic length of a given bundle 3500000  
Maximum number of fragments per locus 1000000  
Minimal allowed intron size 50  
Minimum average coverage required to attempt 3prime trimming. 10  
The fraction of average coverage below which to trim the 3prime end of an assembled transcript. 0.1  
Job Resource Parameters no  

Inheritance Chain

ADD REPLYlink written 3.1 years ago by dyn19820

The reference GTF was from UCSC? Then it does not contain the attributes that Cuffdiff requires for certain key functions.

The most direct way to obtain a GTF reference annotation file that contains the key attributes is to use the version from iGenomes. Download the tar file locally, unpack it, and then upload the genes.gtf dataset to Galaxy for use with these tools. 

A description of these attributes and how they are used by Cuffdiff is in the manual here: http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#cuffdiff-input-files

 

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

There is probably a mismatch between the chromosome identifiers in the inputs. These Galaxy wiki sections can help:

Reference_genomes

GalaxyNGS101#Reference-based_RNA-seq

Hope this helps, Jen, Galaxy team

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k

Here are the settings below. The fpkm has value but gene-id shows CUFF_id.

Do you know how to map this? What else do I need to set up?

Input Parameter Value Note for rerun
SAM or BAM file of aligned RNA-Seq reads  myown.bam  
Max Intron Length 300000  
Min Isoform Fraction 0.1  
Pre MRNA Fraction 0.15  
Use Reference Annotation Use reference annotation guide  
Reference Annotation 19: UCSC Main on Human: refGene (genome)  
3prime overhang tolerance 600  
Intronic overhang tolerance 50  
Disable tiling of reference transcripts No  
Perform Bias Correction No  
Use multi-read correct Yes  
Apply length correction Cufflinks Effective Length Correction  
Global model (for use in Trackster) No dataset  
Set advanced Cufflinks options No  
Job Resource Parameters no
ADD REPLYlink written 3.1 years ago by dyn19820

See the comment above. The settings look fine - the incomplete results are almost certainly linked to the contents of the GTF input.

But note - the chromosome identifiers still must be a match between the BAM and the GTF. The GTF file is uncompressed, so these can be just visualized. For the BAM, you can convert to SAM and take a look at the header. When these are a match between the two files, and the GTF has the key attributes, then Cuffdiff functions optimally. When there is a mismatch or missing attributes, odd errors or incomplete calculations can result.

This is another wiki section with more about this tool, including additional troubleshooting help and links into the manual and supporting publications. That said, what I have shared as fixes address the most common issues users encounter. These are requirements not just when using this tool suite in Galaxy, but when using the tools line-command as well.

Support#Tools_on_the_Main_server:_RNA-seq

 

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour