Cuffcompare successful but generate empty files.

Question: Cuffcompare successful but generate empty files.

19 months ago by

roxy.zhang • 0 wrote:

Hello friends

I am currently doing RNA-seq for 3 datasets (obtained publicly through NCBI-SRA), each dataset is for 1 individual and all three individuals have different conditions. (c9ALS, sALS, Control). I have successfully ran tophat, and cufflinks. I am now trying to use Cuffcompare/ cuffmerge to examine all the transcripts. However, I am having some difficulties. Cuffmerge would not run and sends out a message that says "Fatal error: Matched on Error Error running cuffmerge. The output file is empty, there may be an error with your input file or settings.". And with cuffcompare, although it will run successfully and turn green, the files will be empty. This is really strange to me, because my tophat and cufflink files are large and I am able to view splice junctions on IGB. (Though, through looking at splice junctions, I am under the impression that there should be multiple assembled transcripts but there is only 1, again this is looking at the files using IGB, maybe there is something wrong there as well?).
I am not very familiar with RNA-seq; very confused. Thank you very much in advance for all your help!

Thank you, Roxy Zhang

rna-seq cufflinks cuffcompare galaxy • 493 views

ADD COMMENT • link •

modified 18 months ago by Jennifer Hillman Jackson ♦ 25k • written 19 months ago by roxy.zhang • 0

18 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The Cuffmerge error is likely because the inputs are not sorted, including BAMs and GTF datasets. This is how: https://galaxyproject.org/support/sort-your-inputs/

Cuffcompare should also be given sorted inputs and those cannot be empty.

Any reference annotation used with any tools should have the same exact chromosome identifiers as the reference genome used for or with the mapping step (optional input for Tophat or HISAT). All inputs should be based on hg19 for your case, with iGenomes as one source.

iGenomes reference annotation for hg19 (have chromosome identifiers that match the hg19 genome)
http://cole-trapnell-lab.github.io/cufflinks/getting_started/#using-pre-built-annotation-packages
or http://cole-trapnell-lab.github.io/cufflinks/igenome_table/index.html.
Download the .tar file, uncompress it locally, and upload just the genes.gtf file to Galaxy for use. Compressed data in .tar format cannot be loaded directly and would be very large for this genome plus the complete archive includes data you won't need that will use up much of your quota.
Checking for mismatched identifiers https://galaxyproject.org/support/chrom-identifiers/
Galaxy RNA-seq tutorials https://galaxyproject.org/learn/
Manual with sample protocols and a link to the google forum for the tool suite: http://cole-trapnell-lab.github.io/cufflinks/manual/

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 18 months ago • written 18 months ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen

Thank you very much for your prompt replies. I have encountered another problem however. I have permanently deleted some datasets in my history but it is still showing in my work space/ quota? Is it possible for you guys to take a look and perhaps recalculate it? Thanks a lot!

Best, Roxy

ADD REPLY • link written 18 months ago by roxy.zhang • 0

Similar posts • Search »