2.1 years ago by
The tool does not accept negative inner mean distance values (describing overlapping paired reads). However, the good news is that setting these values manually is no longer necessary. The latest versions of the tool now interpret the properly paired reads from the input BAM dataset alignments to calculate the values at run-time.
For all cases I can think of, using the actual alignment data to estimate the insert size/inner distance would be preferred. This is my opinion only, primarily because even carefully executed library construction protocols do not always produce the targetted/expected insert sizes/read lengths.
If you wish to test and compare, run a job with the values set (with inner mean == 0) and a job where Cufflinks interprets the BAM alignments (does not make use of this advanced setting). Then visualize a few example gene bounds to see which produces results that better suit your analysis goals. I suggest reviewing at least one well-characterized region and at least one region that contains novel data from your samples (novel transcripts from the "discovery" protocol, e.g. an analysis that includes a Cuffmerge GTF as the reference annotation). In the visualization, including the reference annotation GTFs - both the base-line known transcripts (public GTF) and the known+novel transcripts identified by Cufflinks that include your reads (the output from Cuffmerge) - will aid by adding context for the examined regions.
Others are welcome to offer their opinions and/or experiment advice!
Take care, Jen, Galaxy team