Question: Cufflinks output file with many 0 FPKM values
1
gravatar for puneetd
4.2 years ago by
puneetd10
United States
puneetd10 wrote:

Hi,

I have run Cufflinks on my zebrafish RNAseq reads mapped using Tophat2. The Cufflinks output file has several issues that I cannot seem to resolve:

1. Many genes are identified by CUFF IDs instead of gene IDs.

2. Many genes have FPKM values of 0 despite showing numerous alignments in UCSC browser. While this can be a result of multiple mapping reads, some of the genes that we know should be expressed are also showing 0 FPKM values.

3. FPKM status shows 'low data' for many genes.

I have run the jobs twice using different genome and annotation files but the results are the same. Are these issues likely to be a result of inconsistent files used for alignment and annotation? If so:

1. what should be the file for zebrafish gene annotation that we can use together with UCSC danrer7 genome that is used by Tophat for Illumina?

2. what zebrafish genome and annotation file I can use from Ensemble Zv9 assembly?

I will appreciate any help and advice you may have on this.

Thank you.

Puneet

rna-seq zebrafish cufflinks • 5.0k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by puneetd10
0
gravatar for Jennifer Hillman Jackson
4.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

There may be a few issue going on - some scientific, some technical.

A "LOW DATA" result indicates that coverage was low for that gene/transcript. This could represent the actual expression of that transcript in the input conditions, or it could be a result of problems upstream. Too much or too little read QC, poor mapping, etc. all can lead to data loss. Double check that the data was in .fastqsanger format when you began the analysis (quality scores incorrectly scaled are problematic) and review the steps you did prior to Cuffdiff to see if these maximized mapping success. Some testing out of different parameters versus the tool documentation (Tophat?) will inform you when the best choices are made. 

If you think there is a potential data mismatch problem (the chromosome identifiers do not match exactly), you can compare the the inputs to determine that. Ensembl's identifiers differ from UCSC's. Help to translate one to the other is in the RNA-resources link below. But note that direct conversion by simply adding on a "chr" doesn't work for all cases.

I am not sure if you are running Cufflinks or not, but if so, then running Cuffmerge after is needed to pull all the reference annotation together before running Cuffdiff. Only the transcripts included in the reference annotation provided will be considered by Cuffdiff. An example workflow for this is also in the RNA-seq resources link below.

Cuffdiff also makes use of special attributes in the reference annotation to fully populate all statistics. The files from iGenomes contain these, and the Cuffdiff manual documentation itself describes what these are. They are generally not present in many reference annotation files (UCSC, Ensembl, etc.). See the Cuffdiff manual (inputs) to learn more. iGenomes has not created a reference annotation file for Zebrafish in this format, but there may be other sources that do (perhaps someone will post a known source or once you know what to look for, those you are considering can be reviewed). You can run Cuffdiff without these attributes, but the manual will explain what is excluded.

Here are some tips for prepping data with respect to quality scores:
http://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

And RNA-seq resources:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq

Best, Jen, Galaxy team

ADD COMMENTlink written 4.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for puneetd
4.2 years ago by
puneetd10
United States
puneetd10 wrote:

Thanks Jen. That is a very helpful post. I am looking into the mappings and the source annotation files carefully to figure out where exactly the problem is. 

ADD COMMENTlink written 4.2 years ago by puneetd10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour