TopHat splice junctions

Question: TopHat splice junctions

19 months ago by

athbal • 10

athbal • 10 wrote:

Dear all,

I'm trying to detect alternative splicing events from RNA-seq experiments. I have used the NCBI SRA Tools to extract the single-end reads I want to analyse in fastqsanger format and, then, the TopHat (Galaxy version 2.1.0) to map these reads to Arabidopsis thaliana TAIR 10 genome. However, TopHat splice junctions output file has no data produced and, as a result, the software that I run cannot detect any alternative splicing event using the output .bam file. Do I need to change something in the input parameters of TopHat? I would appreciate any help.

The input parameters of TopHat run are:

Use a built in reference genome or own from your history indexed
Select a reference genome Arabidopsis_thaliana_TAIR10
TopHat settings to use full
Max realign edit distance 1000
Max edit distance 2
Library Type FR Unstranded
Final read mismatches 2
Use bowtie -n mode No
Anchor length (at least 3) 4
Maximum number of mismatches that can appear in the anchor region of spliced alignment 0
The minimum intron length 40
The maximum intron length 2000
Allow indel search Yes
Max insertion length. 3
Max deletion length. 3
Maximum number of alignments to be allowed 20
Minimum intron length that may be found during split-segment (default) search 40
Maximum intron length that may be found during split-segment (default) search 2000
Number of mismatches allowed in each segment alignment for reads mapped independently 2
Minimum length of read segments 25
Output unmapped reads False
Do you want to supply your own junction data No
Use coverage-based search for junctions Yes
Minimum intron length that may be found during coverage search 40
Maximum intron length that may be found during coverage search 2000
Use Microexon Search No
Do Fusion Search No
Set Bowtie2 settings No
Specify read group? no
Job Resource Parameters no

Best,

Thanasis

rna-seq tophat unexpected-results mapping • 704 views

ADD COMMENT • link •

modified 19 months ago by Jennifer Hillman Jackson ♦ 25k • written 19 months ago by athbal • 10

19 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

I wasn't able to find your account at Galaxy Main http://usegalaxy.org, so I cannot check your history for input and other possible usage problems. If working at Galaxy Main or can reproduce the issue there and want to share the history for feedback, this is how:

Before sending in a bug report, please first check these items on your own. Other issues may be present, but eliminating these common problems as factors is where to begin troubleshooting.

Are the Tophat mapping rates OK? (check output dataset "align_summary") If rates are low, maybe there was something wrong with the input format or parameters versus the data content. See the next items for the most common usage problems others report when using this tool (and other tools that accept fastq input).
Make sure the data is in fastqsanger format (not just assigned this datatype, but actually has the quality scores scaled appropriately). This is how to check: https://galaxyproject.org/support/fastqsanger/
Are you certain the reads are RNA (not DNA), are NGS reads, and are from an Arabidopsis source? Tophat is designed to map RNA-seq reads to the same species genome. Full-length transcript mapping will be problematic as will cross-species alignments. Poor quality sequence data will also be problematic. Run FastQC on the input fastq data (after confirming or converting to fastqsanger format) to interpret base call quality and to learn what may need to be trimmed (example: low quality ends), if sequencing artifact is present, and the like. This is the how-to: https://galaxyproject.org/tutorials/ngs/#fastq-manipulation-and-quality-control
Are the reads are less than 50 bases long? If so, you'll want to adjust this parameter "Minimum length of read segments" = "25" to be one-half of the minimum read length or there can be mapping bias plus reduced mapping rates overall. This option is under the section "TopHat settings to use > Full parameter list" (click to expand and review all options on the tool form).

How to interpret and adjust Tophat parameters is in the manual here: https://ccb.jhu.edu/software/tophat/index.shtml

If these all appear to be correct and you do send in a shared history link, please include what you checked along with a link to this post in the email so we can link the two. Make sure that all inputs and outputs in the analysis history leading up to the Tophat mapping job with problems are undeleted.

Thanks! Jen, Galaxy team

ADD COMMENT • link written 19 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »