4.2 years ago by
There may be a few issue going on - some scientific, some technical.
A "LOW DATA" result indicates that coverage was low for that gene/transcript. This could represent the actual expression of that transcript in the input conditions, or it could be a result of problems upstream. Too much or too little read QC, poor mapping, etc. all can lead to data loss. Double check that the data was in .fastqsanger format when you began the analysis (quality scores incorrectly scaled are problematic) and review the steps you did prior to Cuffdiff to see if these maximized mapping success. Some testing out of different parameters versus the tool documentation (Tophat?) will inform you when the best choices are made.
If you think there is a potential data mismatch problem (the chromosome identifiers do not match exactly), you can compare the the inputs to determine that. Ensembl's identifiers differ from UCSC's. Help to translate one to the other is in the RNA-resources link below. But note that direct conversion by simply adding on a "chr" doesn't work for all cases.
I am not sure if you are running Cufflinks or not, but if so, then running Cuffmerge after is needed to pull all the reference annotation together before running Cuffdiff. Only the transcripts included in the reference annotation provided will be considered by Cuffdiff. An example workflow for this is also in the RNA-seq resources link below.
Cuffdiff also makes use of special attributes in the reference annotation to fully populate all statistics. The files from iGenomes contain these, and the Cuffdiff manual documentation itself describes what these are. They are generally not present in many reference annotation files (UCSC, Ensembl, etc.). See the Cuffdiff manual (inputs) to learn more. iGenomes has not created a reference annotation file for Zebrafish in this format, but there may be other sources that do (perhaps someone will post a known source or once you know what to look for, those you are considering can be reviewed). You can run Cuffdiff without these attributes, but the manual will explain what is excluded.
Here are some tips for prepping data with respect to quality scores:
And RNA-seq resources:
Best, Jen, Galaxy team