Cufflinks output file with many 0 FPKM values

Question: Cufflinks output file with many 0 FPKM values

4.2 years ago by

United States

puneetd • 10 wrote:

Hi,

I have run Cufflinks on my zebrafish RNAseq reads mapped using Tophat2. The Cufflinks output file has several issues that I cannot seem to resolve:

1. Many genes are identified by CUFF IDs instead of gene IDs.

2. Many genes have FPKM values of 0 despite showing numerous alignments in UCSC browser. While this can be a result of multiple mapping reads, some of the genes that we know should be expressed are also showing 0 FPKM values.

3. FPKM status shows 'low data' for many genes.

I have run the jobs twice using different genome and annotation files but the results are the same. Are these issues likely to be a result of inconsistent files used for alignment and annotation? If so:

1. what should be the file for zebrafish gene annotation that we can use together with UCSC danrer7 genome that is used by Tophat for Illumina?

2. what zebrafish genome and annotation file I can use from Ensemble Zv9 assembly?

I will appreciate any help and advice you may have on this.

Thank you.

Puneet

rna-seq zebrafish cufflinks • 5.0k views

ADD COMMENT • link •

modified 4.2 years ago • written 4.2 years ago by puneetd • 10

4.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

There may be a few issue going on - some scientific, some technical.

A "LOW DATA" result indicates that coverage was low for that gene/transcript. This could represent the actual expression of that transcript in the input conditions, or it could be a result of problems upstream. Too much or too little read QC, poor mapping, etc. all can lead to data loss. Double check that the data was in .fastqsanger format when you began the analysis (quality scores incorrectly scaled are problematic) and review the steps you did prior to Cuffdiff to see if these maximized mapping success. Some testing out of different parameters versus the tool documentation (Tophat?) will inform you when the best choices are made.

If you think there is a potential data mismatch problem (the chromosome identifiers do not match exactly), you can compare the the inputs to determine that. Ensembl's identifiers differ from UCSC's. Help to translate one to the other is in the RNA-resources link below. But note that direct conversion by simply adding on a "chr" doesn't work for all cases.

I am not sure if you are running Cufflinks or not, but if so, then running Cuffmerge after is needed to pull all the reference annotation together before running Cuffdiff. Only the transcripts included in the reference annotation provided will be considered by Cuffdiff. An example workflow for this is also in the RNA-seq resources link below.

Cuffdiff also makes use of special attributes in the reference annotation to fully populate all statistics. The files from iGenomes contain these, and the Cuffdiff manual documentation itself describes what these are. They are generally not present in many reference annotation files (UCSC, Ensembl, etc.). See the Cuffdiff manual (inputs) to learn more. iGenomes has not created a reference annotation file for Zebrafish in this format, but there may be other sources that do (perhaps someone will post a known source or once you know what to look for, those you are considering can be reviewed). You can run Cuffdiff without these attributes, but the manual will explain what is excluded.

Here are some tips for prepping data with respect to quality scores:
http://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

And RNA-seq resources:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq

Best, Jen, Galaxy team

ADD COMMENT • link written 4.2 years ago by Jennifer Hillman Jackson ♦ 25k

4.2 years ago by

puneetd • 10

United States

puneetd • 10 wrote:

Thanks Jen. That is a very helpful post. I am looking into the mappings and the source annotation files carefully to figure out where exactly the problem is.

ADD COMMENT • link written 4.2 years ago by puneetd • 10

Please log in to add an answer.

Similar posts • Search »