Question: Blank Cufflinks output
0
gravatar for sv112
3.5 years ago by
sv11210
United States
sv11210 wrote:

Hi!

I am conducting an RNA-seq analysis with Saccharomyces cerevisiae-derived data. In my workflow, I am using the 'TopHat accepted hits' BAM output files as input for a downstream Cufflinks analysis on the main Galaxy server online. All of my Cufflinks outputs, however, are coming out blank(i.e nothing in the skipped, or assembled transcript files, or the transcript/gene expression files)! Curiously, when I do a reference-free analysis, I get an output; however, it isn't very useful for an RNA-seq analysis. To me, this suggests that the BAM files are OK, and that there is some issue with the reference annotation GFF file. However, I have previously used the same GFF file successfully for a similar analysis. The said GFF file was downloaded from the Saccharomyces Genome Database, wherein the only change I made was to keep just the annotations and remove the FASTA sequence for all the chromosomes.

Any suggestions?

Thanks!

Sri

rna-seq s.cerevisiae cufflinks • 1.4k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by sv11210
1
gravatar for sv112
3.5 years ago by
sv11210
United States
sv11210 wrote:

Hi Jen,

Thanks for the response. I went back and replaced my .gff  annotation file with a different one bearing a .gtf extension. Strangely, that seems to have fixed the problem! Maybe the gff file was corrupted, or wasn't being read properly by the newer version of Cufflinks. I am a little surprised because I did not think there were too many fundamental differences between GFF and GTF formats.

Regards,

Sri

ADD COMMENTlink written 3.5 years ago by sv11210
1

Hello,

This tool suite requires a reference annotation file/dataset in either GTF or GFF3 format.

Was your original file really in GFF format? Or was it in GFF3 format? (GTF can be thought of as GFF2, but no one calls it that). If you removed fasta sequences at the end, then the format was GFF3, although these are often labeled with a ".gff" extension. GFF and GTF format are very closely aligned, but with important differences in the last field (attributes). GFF/GFT versus GFF3 have many more differences. 

Slightly confusing, but a google about file specifications or reviewing common datatypes in the Galaxy wiki can help clarify. (I expect that you know this already, but others reading may not, so seems worth explaining).

If reviewed, the content of the original GFF3 dataset and the latter used GTF dataset likely had different attributes included. Or the format wasn't to strict specification. Or was corrupted in transfer or during manipulations to strip off the fasta content, as you suggest. Certain attributes in reference annotation datasets are important in order to generate the full compliment of statistics with these tools, in particular with Cuffdiff. iGenomes is a good resource for GTF files with the right ones. The manual for the tools explains more about the attributes to pay the most attention to.

Other things to watch for: 
* The chromosome identifiers are an exact match between all inputs (reference genome & reference annotation)
* Use the same genome and annotation for all steps in your workflow.

Glad you have this worked out so far, and maybe this helps to pinpoint where the root issue was! Jen, Galaxy team

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Sri,

I am wondering if there is an issue with the Cufflinks parameters. Start by examining the settings under "Use Reference Annotation". Each is described on the tool form itself, with more details in the Cufflinks manual

Given the rest of the information provided, this seems like the most likely root-cause of the problem, but please post back if testing with alternative paramaters does not work. A few more details in that case would help - and if needed, I can examine your history and provide specific feedback (privately if it cannot be generalized). 

Best, Jen, Galaxy team

ADD COMMENTlink written 3.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour