Insert sizes in Galaxy

Question: Insert sizes in Galaxy

16 months ago by

devbt15 • 30 wrote:

I am new to RNASeq analysis. I used tophat to generate BAM files from FASTQ files but in the parameter options set "Mean Inner Distance between Mate Pairs" to 150 assuming insert size to be around 300-400 and the read lengths were 100. Eventually, I input the BAM file to "CollectInsertSizeMetrics" from Picard in Galaxy which suggested an average insert size of 195.129108 with a standard deviation of 80.053801. Do you think I need to run tophat again and if yes with what value of Mean Inner Distance between Mate Pairs?

Or use bowtie first to map the forward and reverse fastq files and then use it to find the insert size and use it for tophat.

Thank you in advance, D Das.

rna-seq tophat bowtie galaxy • 584 views

ADD COMMENT • link •

modified 15 months ago • written 16 months ago by devbt15 • 30

15 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Tophat has been deprecated in favor of HISAT (updated RNA-seq mapping tool). If you really want to use Tophat anyway, the length of the reads - both - is subtracted from the total insert size to calculate the "Mean inner distance".

Yet please review HISAT. There are many more tool form options for controlling how pair-end reads are handled by the tool. Expand the option groups to view setting choices. Help is on the tool form for each and the options also align with the HISAT manual if you need more details about what each is doing. Plus, you can test options and compare different runs/options after to find the optimal settings for your particular input.

Tutorials: https://galaxyproject.org/learn/

Thanks, Jen, Galaxy team

ADD COMMENT • link written 15 months ago by Jennifer Hillman Jackson ♦ 25k

15 months ago by

devbt15 • 30

devbt15 • 30 wrote:

Dear Jen, Thank you for responding to my query. I have already gone through answers to similar questions posted on Biostar.

1) And also tried it on my dataset which was from DRA dataset Japan. It mentioned the nominal length as 300 which in their database tutorial is termed equivalent to insert size. In addition, it was paired end seq with 2 x 100bp reads. Using these parameters, I calculated the gap as 100 and kept std deviation as 20 default. 2) However, in addition to this, I mapped my reads for 2 different samples to reference genome and transcriptome separately using Bowtie in galaxy and then used picard tool to check insert size statistics which gave mean insert size around 180-190 which gives us gap between the end of reads as -10 to -20 (negative value to tophat). Unfortunately, this problem is in place for a long time but I could not find any consensus solution to this.

I tried using RNA-STAR but it gave low memory errors as my machine has only 8GB RAM. Anyways, I am now using HISAT2 with featureCounts followed by DESeq2 or edgeR. I will play around with the parameters in HISAT2 once I successfully establish this pipeline, as I think alignment is the most important step.

Regards, D Das.

ADD COMMENT • link written 15 months ago by devbt15 • 30

Please log in to add an answer.

Similar posts • Search »