Cufflinks and inner mean distance

Question: Cufflinks and inner mean distance

2.1 years ago by

CF • 0

CF • 0 wrote:

Greetings,

I am attempting to use Cufflinks on alignments produced by running Tophat on some paired-end data. When I try to set advanced Cufflinks options and change the inner mean distance to -20 (estimated previously using Bowtie and Picard, etc.), Cufflinks returns the following error:

Fatal error: Exit code 1 () Error running cufflinks. return code = 1 -m/ --frag-len-mean arg must be at least 0

When I set the distance to 0, Cufflinks runs just fine. I know that Tophat is able to accommodate negative inner distances, but I don't know about Cufflinks. Does anyone know how I should incorporate this into my analysis?

Thanks for your time.

options tuxedo cufflinks • 703 views

ADD COMMENT • link •

modified 2.1 years ago by Jennifer Hillman Jackson ♦ 25k • written 2.1 years ago by CF • 0

2.1 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The tool does not accept negative inner mean distance values (describing overlapping paired reads). However, the good news is that setting these values manually is no longer necessary. The latest versions of the tool now interpret the properly paired reads from the input BAM dataset alignments to calculate the values at run-time. http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html#advanced-abundance-estimation-options

For all cases I can think of, using the actual alignment data to estimate the insert size/inner distance would be preferred. This is my opinion only, primarily because even carefully executed library construction protocols do not always produce the targetted/expected insert sizes/read lengths.

If you wish to test and compare, run a job with the values set (with inner mean == 0) and a job where Cufflinks interprets the BAM alignments (does not make use of this advanced setting). Then visualize a few example gene bounds to see which produces results that better suit your analysis goals. I suggest reviewing at least one well-characterized region and at least one region that contains novel data from your samples (novel transcripts from the "discovery" protocol, e.g. an analysis that includes a Cuffmerge GTF as the reference annotation). In the visualization, including the reference annotation GTFs - both the base-line known transcripts (public GTF) and the known+novel transcripts identified by Cufflinks that include your reads (the output from Cuffmerge) - will aid by adding context for the examined regions.

Others are welcome to offer their opinions and/or experiment advice!

Take care, Jen, Galaxy team

ADD COMMENT • link modified 2.1 years ago • written 2.1 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »