Question: How to set 'mean inner distance' for paired-end reads?
0
gravatar for hudiejie
3.2 years ago by
hudiejie0
Singapore
hudiejie0 wrote:

Hi all,

If I use Paired-end sequences for 'Tophat for illumina' or 'Tophat 2', I need to type in a 'mean inner distance'. According to the definition of this parameter,

-r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp.

I only know the average length of reads is 101bp but am not sure how long is fragments selected for sequencing (I did ask for it and waiting for the answer from the sequencing company).

And how about the standard deviation for the distribution on inner distances between mate pairs?

--mate-std-dev <int> The standard deviation for the distribution on inner distances between mate pairs. The default is 20bp.

Anyway, I tried to use the default parameters in my local Galaxy, but it seems not working. I cannot attach a error image here.

rna-seq paired-end • 4.9k views
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by hudiejie0

The error:

 

Settings: Output files: "/tmp/5447.1.batch/tmpqSneiZ/dataset_1720.*.ebwt" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 5 (one in 32) FTable chars: 10 Strings: unpacked Max bucket size: default Max buck

ADD REPLYlink written 3.2 years ago by hudiejie0
0
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The general rule is to take the length of the insert and subtract the length of both ends of the pair, to obtain the "mean inner distance". How variable values are the deviate from the mean can inform you about how to set the standard variation. Sometimes scientists will just test out values, if the insert size is an unknown, and then review alignments after to obtain the optimal values - then do full re-run.

That all said, this error doesn't appear at first look to be associated with this parameter setting or I am understanding it incorrectly. Have you tried a re-run to eliminate a cluster error? Reviewed all inputs to make sure that they are correctly formatted? These are good places to start when troubleshooting.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.2 years ago by Jennifer Hillman Jackson23k
0
gravatar for hudiejie
3.2 years ago by
hudiejie0
Singapore
hudiejie0 wrote:

Hi all,

I have check with Tophat team, but since now there is no any replies.

I have found this (url: http://ccb.jhu.edu/software/tophat/faq.shtml#mate_inner_dist):

'If you want to find a good approximation of this distance for your reads you can try running Bowtie2 on a small sample (subset) of the paired reads (both mates) and  taking a look at their mapped positions. The SAM output of Bowtie2 for paired reads is especially helpful as the 9th field in the SAM alignment lines should show the estimated fragment length, from which you should subtract twice the read length to get the value of the "inner distance" that can be used with the -r parameter (obviously large absolute values for that field should be ignored as for this estimate we only want to consider mates aligned to the same exon).'

So it should be 'fragment length - pair end reaength'= 280-100*2=80 bp.

However, the reply from my sequencing company is that: 

fragment size - paired end read length - adaptor length = insert size (mean inner distance )= 280 - 202 - 150 = -72.

Besides, I have tried bowtie2 to find the fragment length, it makes me more confused because the numbers in 9th colume are quite variable, from -369 to 369.

ADD COMMENTlink written 3.2 years ago by hudiejie0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 90 users visited in the last hour