Question: Issue running Cuffdiff using Cuffmerge gtf output
0
gravatar for mat12
3.5 years ago by
mat1210
United States
mat1210 wrote:

I am trying to use Cuffdiff to find significant changes in mRNA transcript expression between two conditions (1. silenced gene, 2. overexpressed gene) but for some reason have been having difficulties using the Cuffmerge output gtf as input into Cuffdiff. This seems to be a basic question with most likely an easy answer, but I was hoping someone could help me as I have not been able to figure it out on my own and have been trying different things with no success.

This is what I have done so far:

1. Mapped the reads for the two samples individually using Tophat.

2. Ran Cufflinks on each dataset produced by Tophat using a reference annotation downloaded from UCSC to assemble the reads. 

3. Ran Cuffmerge by uploading the two GTF "assembled transcript" files from the two samples. The one produced output file consisted of merged data from the two cufflink runs.

4. Attempted to run Cuffdiff by uploading the Cuffmerge GTF output and assigning Conditions #'s 1 and 2 their respective Tophat assembly files. Cuffdiff could not run.

 

Instead I have uploaded the reference annotation I downloaded from UCSC (in place of the Cuffmerge output file) to Cuffdiff and assigned Conditions #'s 1 and 2 the same file produced by Tophat on condition #1 (silenced). A tabular file was produced that contained the same quantities for values 1 and 2 for the two conditions per gene as expected. This meant that the log2(fold_change) = 0 and there was no significant difference in the t-test results.

I then uploaded the same reference annotation from UCSC and assigned Conditions  #'s 1 and 2 the same file produced by Tophat on condition #2 (over-expressed). A tabular file was produced. Again, the quantities for values 1 and 2 were the same per gene.

Since I could not get both condition files uploaded to Cuffdiff, I was planning to look at the two different tabular output files and do my own comparison using EXCEL. But I would prefer to run Cuffdiff's t-test significant difference test and follow the simple flow Tophat --> Cufflinks --> Cuffmerge --> Cuffdiff?

Can someone identify what I did wrong? Please let me know if I should clarify any of my steps.

Thank you so much. I really appreciate the help.

ADD COMMENTlink modified 3.4 years ago • written 3.5 years ago by mat1210
1
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Where are you using Galaxy? At http://usegalaxy.org or another public server (please share URL) or perhaps your own local/cloud?

There are no known issues with Cuffdiff at this time. And your protocol seems fine.

If anything, I would advise including an iGenomes GTF reference dataset along with the Cufflinks GTFs when running Cuffmerge as this will help Cuffdiff to produce the full compliment of statistics. Not all reference GTFs contains the attributes that Cuffdiff uses for some calculations - see the manual for details.

We can help more when we know where you are working. Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.5 years ago by Jennifer Hillman Jackson25k

Hi Jen,

After uploading my replicates I no longer have an issue with Cuffdiff. I am now having issues with my reference annotation because my Cuffdiff output file lists chromosome locations.. but not gene names.

Were you suggesting that I input an iGenomes GTF reference dataset as a "reference annotation file" for Cuffmerge or rather include it amongst my other Cufflink assembly files to be merged with the others?

Do you have any other suggestions for reference annotation hg19 files to download that will help me be able to see gene names? Does it have to be these iGenome GTF reference datasets? Because they are huge and are taking me a few days to download and transfer to my history.

Thanks so much.

ADD REPLYlink written 3.4 years ago by mat1210

For the iGenomes data, download the tar file locally, unpack it, then just upload the genes.gtf file (use FTP). This file is much smaller than the entire tar bundle and Galaxy does not load tar files anyway (it is not a supported compression datatype).

These really are the best for these tools, although others exist. Watch out for genome mismatch problems - the genome build AND the chromosome identifiers must be an exact match for all inputs.

And yes, use Cuffmerge if you are performing discovery. If not, just the iGenomes file can be used. See the tool manual for protocol details (and expected Cuffdiff inputs - including GTF/GFF attributes the tools make use of): Github is problematic right now, but once cleared up on their side - this is the manual link: http://cole-trapnell-lab.github.io/

Jen

ADD REPLYlink written 2.1 years ago by Jennifer Hillman Jackson25k
0
gravatar for mat12
3.4 years ago by
mat1210
United States
mat1210 wrote:

Hi Jennifer,

Thanks for the reply. I am using the public server http://usegalaxy.org. If you have any more advice I would greatly appreciate it!

I will try including the reference dataset that you recommended.

Thanks!

ADD COMMENTlink written 3.4 years ago by mat1210

Hi I have a similar problem and used your same pipe line , where you able to debug and get your differential expression ??

I would appreciate your input 

 

Michelle 

ADD REPLYlink written 3.4 years ago by michimaurin20
1

Did you upload replicates for your conditions? Run Cufflinks on each of your replicates and merge them all together in Cuffmerge. This way Cuffdiff has more alignments to work with. Let me know if that works for you!

ADD REPLYlink written 3.4 years ago by mat1210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour