Question: Cufflinks assembled transcripts (format gtf) returns "empty" file; why?
1
gravatar for mmccra4
4.4 years ago by
mmccra410
United States
mmccra410 wrote:

I have three separate .bam files created in the Ion Torrent Suite, all aligned to the same genome file.  I ran Cufflinks v0.0.7 on all three files with default parameters in Galaxy. One of the files is 16.9GB and cufflinks returned an assembled transcripts file with ~120,000 lines.  The second file is 22.3GB and cufflinks returned an assembled transcripts file with ~170,000 lines. 

The third file is 26.4GB but this time cufflinks returned an assembled transcripts file labeled "empty" and in the preview line it says "no peek".  When I run cufflinks on this file again, this time using a reference annotation as guide, it returns 90,888 lines; however, while these lines have unique start and end positions, the data that follows is the same for all lines:

{ FPKM "0.0000000000"; frac "0.000000"; conf_lo "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000"; conf_hi "0.000000"; cov "-nan"; full_read_support "no"; }. 

Lastly, I aligned this data set against a different genome file (but same organism) using the Ion Torrent Suite, and cufflinks still returns an empty assembled transcripts file.

I'm fairly new to this, so any information as to why this could be happening, what it means, and what I might do to fix it would be greatly appreciated.

rna-seq gtf cufflinks • 2.4k views
ADD COMMENTlink modified 4.4 years ago by Jennifer Hillman Jackson25k • written 4.4 years ago by mmccra410
1

Are you using http://usegalaxy.org or an own Galaxy instance?

ADD REPLYlink written 4.4 years ago by Bjoern Gruening5.1k

I am using http://usegalaxy.org

ADD REPLYlink written 4.4 years ago by mmccra410
1
gravatar for Jennifer Hillman Jackson
4.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi, 

Great question to get this started Bjoern Gruening and thanks mmccra4 for clarifying.

I am wondering now if there is some format issue with the BAM file (not technical, but content). Confirming the gtf is a match for the reference genome is secondary (and unlikely to be the root cause given the other info you provided, but worth a look). The line you share has that odd long string in the middle.

This may be solved quickest by sharing the history. We can also post back a generalized "what caused the issue" once resolved, preserving data privacy.

To do this, generate a history share link and paste it into an email sent to galaxy-bugs@bx.psu.edu (our team's internal, private, help list). Include this Galaxy Biostar link so the email can be matched up. Here is how to generate the share link:
https://wiki.galaxyproject.org/Learn/Share

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 4.4 years ago by Daniel Blankenberg ♦♦ 1.7k • written 4.4 years ago by Jennifer Hillman Jackson25k

Hey just checking back in...

I emailed the share link to the bug address on July 1, but I haven't heard anything back yet. Can you verify for me if the email was received?

Thanks much,

--Michael

ADD REPLYlink written 4.4 years ago by mmccra410

We have received the email. However number of Galaxy Team members are on vacation so please bare with us, we will get back to you.

ADD REPLYlink written 4.4 years ago by Martin Čech ♦♦ 4.9k

Hey guys, it's been nine months without a reply. What gives?

ADD REPLYlink written 3.6 years ago by mmccra410
1

Hello, The bug report was sent in some time after this post during a time when our team was at our annual conference. When I came back and reviewed, the data was the same that was reported through a bug report on 4/11/14 (but at a later stage) where a fairly long reply explained the issues with quality and potential ways to resolve them. This result is related - almost certainly related to quality and the setting used in this other tool (Cufflinks, instead of Tophat as in the first question). Because of this, I thought that you understood what was going on already. If it is still unclear, I see FastQC runs in the history - these are a good place to start. In short: review the quality of the mapping vs the minimum quality set in the tool (parameters). Also check for properly mapped pairs (are there any?) and that the reference GTF dataset is a match for the reference genome used (the chromosome identifiers must be an exact match). If you have a specific question that is not about these items we have already covered, please let us know. Thanks, Jen, Galaxy team

ADD REPLYlink written 3.6 years ago by Jennifer Hillman Jackson25k

Small update: I ran Bam->Sam on the uploaded dataset used as input and an error was produced stating that the dataset is truncated. This means that the upload was either not successful or that the file was corrupted prior to loading into Galaxy. This is a very large dataset - close to the upper limit on dataset size that can be upload (50 GB). If it is larger than 50 GB locally, then you will need to move to a local or cloud, as suggested in the original bug report reply. I should also mention that Cufflinks expects a particular tag for spliced datasets, specifically "XS:". I cannot check this for you, but when you examine locally (after obtaining a complete BAM dataset, if needed), convert to SAM format (using samtools) and confirm this is present. Or you can upload again (if the BAM is confirmed to be complete locally, and upload is possible) and do the conversion in Galaxy to examine the tags. Hopefully this advice helps, Jen, Galaxy team

ADD REPLYlink written 3.6 years ago by Jennifer Hillman Jackson25k

Hello again,

Thanks so much for the quick reply.  I will follow your advise in investigating this file.  

Also, could you please tell me how to access the long reply you mentioned above, or possibly resend it? If it was an email to me, I did not receive it.  Thanks again for all of your help.

 

 

 

ADD REPLYlink written 3.6 years ago by mmccra410

The email was sent on 4/11/14 in reply to a bug report with your email in the subject line. Best, Jen, Galaxy team

ADD REPLYlink written 3.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour