Question: TopHat align summary
14 months ago by
Diana Afonso10
Diana Afonso10 wrote:


I am using galaxy to analyze RNA seq of 100bp sing end data, sequenced with Illumina 2500. I started by uploading bam files and using the tool "Convert from BAM to FastQ" to convert my data into FastQ format. Next I used "FastQ groomer" to convert my data into one format that contains Sanger-scaled quality values with ASCII. Finally I used TopHat with the default settings and the align summary is the following:

Reads: Input : 14520796 Mapped : 6298973 (43.4% of input) of these: 1080332 (17.2%) have multiple alignments (0 have >20) 43.4% overall read mapping rate.

I think the percentage of mapped sequences is very low. Could you please give me some tips on how I should alter the TopHat parameters to improve my results?

Thank you, Diana

14 months ago by
United States
Jennifer Hillman Jackson23k wrote:


Double check the fastq quality score formatting. This is how:

It would be a good idea to run FastQC again if you make adjustments to the reads. That report will alert you to quality problems within the data.

After that, this RNA-seq tutorial might be useful, along with the Tophat manual:

Thanks, Jen, Galaxy team

