Question: TOPHAT align half of the reads
gravatar for vit.filippo
9 weeks ago by
vit.filippo0 wrote:

Since I am using a laptop, I'd like to exploit usegalaxy to perform an alignment using tophat. I succeeded in doing the alignment, but bowtie2 read half of the reads. The reads were generated by an Illumina Hiseq 2000, the protocol wasn't strand specific and it was a single end experiment.

  • I performed a Fastqc test and the quality is ok;
  • I use both the default index and one of my .fasta files to see if it is an index problem;

Do I have to change some setting to have all my reads read? Thanks in advice

rna-seq tophat alignment • 61 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by vit.filippo0
gravatar for Jennifer Hillman Jackson
9 weeks ago by
United States
Jennifer Hillman Jackson22k wrote:


Using HISAT instead of Tophat is the current recommendation. Tophat is being deprecated by many in favor of the other tool for RNA-seq data mapping. You might want to try HISAT to just avoid the issue completely.

But if you want to stick with Tophat, we may be able to help. Could you explain more? Is the job completing but only some/half the reads are mapping? Or is the job ending with an error and the log has some error messages that aborts at some point that seems to be related to reading the inputs?

If the latter, sharing the error message could help. Enter this as a comment with a link to a gist (or similar) or as text with block formatting, so it is easy to read.

So you know - there is a known issue with Tophat itself (not with the Galaxy wrapper) in some versions that will trigger a failure due to a software bug that sounds a bit like the latter problem.

Thanks and let us know, Jen, Galaxy team

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Jennifer Hillman Jackson22k
gravatar for vit.filippo
9 weeks ago by
vit.filippo0 wrote:

Thank you Ms Hillman for your precious help. I've just tried hisat but I've had the very same problem. The total number of reads is 20*10^6 but the aligner is able to read only half of them. There are no errors suggesting that everithing is fine. Since the problem is not the quality, I've thought that could be a kind of strand specific protocol and I didn't know it. Is there any way to know the "strandedness"? Thank you in advice for your help

ADD COMMENTlink written 9 weeks ago by vit.filippo0

If only half of the data is aligning, with no errors, this suggests an input problem.

Start by double checking that the fastq files are actually in fastqsanger format. This is the most common reason for low mapping rates.

How to check inputs:

Strand seems unlikely to be an issue if the library prep protocol was not targetted, so it is probably OK to leave it unspecified. But you can check these other help areas. In particular, examine the tutorial for NGS Logistics here:

Other possible problems to consider:

  • There is a mismatch between the genome the RNA-seq reads were derived from and the genome chosen as a mapping target. For example, mouse RNA-seq reads will not map well to a human genome.
  • The target genome is a custom genome but the format is incorrect (see input help above).
  • The target genome has a problematic index. Possible on a local Galaxy but none are known to have issues at for Tophat or HISAT.

Thanks! Jen

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Jennifer Hillman Jackson22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 56 users visited in the last hour