4.1 years ago by
Option one is better - a reported insert size (from a protocol) generally would not include sequencing artifact unless specifically stated for some reason (and/or retained). The insert is what is between the biological "tools" used to isolate and sequence the sample RNA/DNA (any source: genome, transcriptome, exome). That said, there are many newer technologies, methods are shared in publications in great variety, and there is nearly always some variability in reported/expected insert size and actual insert size.
As far as I know, overlapping reads are OK in Tophat2. But as you found out, if they are not truly overlapping and the parameters are set as if they were, not many will pass the criteria for "successful paired mapping".
Corrections or more about this are welcome from the community! I have not personally worked much with overlapping reads using Tophat2. Love to hear what has worked for others.
Jen, Galaxy team