Question: Paired-end insert size upper limit
4.0 years ago by
United States
lukacdm0 wrote:

I am mapping paired-end reads using Bowtie2 and setting "maximum insert size for valid paired-end alignments" to 500 bases. However, when I calculate the insert sizes of the resulting bam files using Picard Insert Size Metrics I frequently see one or two inserts per file that exceed 20000 bases.  Is the Bowtie2 mapping incorrectly returning very large inserts, or is the Picard software mis-analyzing the bam file?  If these large inserts are really coming from incorrect mapping, how can I remove them from the bam files so my downstream analyses are not affected?

4.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:


This is just how the data mapped first-pass. Downstream analysis/summary tools will not consider the data as valid - so you can just leave these in the input.

Best, Jen, Galaxy team

4.0 years ago by
United States
lukacdm0 wrote:

Thanks, Jennifer.  Will peak calling programs also ignore the long inserts?

The two peak calling tools on the public Main Galaxy instance at only accept single-end input. So you will be running a tool on a local/cloud for this analysis.

There should be an option for this with most tools, but this is more educated guess than fact. If you are in doubt, there is usually a link to the 3rd party tool documentation on the execution/help form that will help you to determine the proper usage for the tool you are using.

Best, Jen, Galaxy team

4.0 years ago by
United States
lukacdm0 wrote:

Thanks so much.  I usually use other galaxy instances that have appropriate Peak Calling programs.

