Question: Tophat can run negative r
4.1 years ago by
hudiejie20 wrote:

Regarding to Mean Inner Distance between Mate Pairs,

I tried run ~1000000 reads twice:

(1) if r='fragment length - pair end reaength'= 280-100*2=80 bp.

I have got 350,000 lines of tophat: accepted hits.

(2) if r='fragment size - paired end read length - adaptor length = = 280 - 202 - 121(58bp+63bp) = -42.

I have got 81,000 lines of tophat: accepted hits.


Does this mean the (1) is better?

4.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Option one is better - a reported insert size (from a protocol) generally would not include sequencing artifact unless specifically stated for some reason (and/or retained). The insert is what is between the biological "tools" used to isolate and sequence the sample RNA/DNA (any source: genome, transcriptome, exome). That said, there are many newer technologies, methods are shared in publications in great variety, and there is nearly always some variability in reported/expected insert size and actual insert size.

As far as I know, overlapping reads are OK in Tophat2. But as you found out, if they are not truly overlapping and the parameters are set as if they were, not many will pass the criteria for "successful paired mapping".

Corrections or more about this are welcome from the community! I have not personally worked much with overlapping reads using Tophat2. Love to hear what has worked for others.

Jen, Galaxy team

