TOPHAT align half of the reads

Question: TOPHAT align half of the reads

18 months ago by

Since I am using a laptop, I'd like to exploit usegalaxy to perform an alignment using tophat. I succeeded in doing the alignment, but bowtie2 read half of the reads. The reads were generated by an Illumina Hiseq 2000, the protocol wasn't strand specific and it was a single end experiment.

I performed a Fastqc test and the quality is ok;
I use both the default index and one of my .fasta files to see if it is an index problem;

Do I have to change some setting to have all my reads read? Thanks in advice

rna-seq tophat alignment • 543 views

ADD COMMENT • link •

modified 18 months ago • written 18 months ago by vit.filippo • 0

18 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Using HISAT instead of Tophat is the current recommendation. Tophat is being deprecated by many in favor of the other tool for RNA-seq data mapping. You might want to try HISAT to just avoid the issue completely.

But if you want to stick with Tophat, we may be able to help. Could you explain more? Is the job completing but only some/half the reads are mapping? Or is the job ending with an error and the log has some error messages that aborts at some point that seems to be related to reading the inputs?

If the latter, sharing the error message could help. Enter this as a comment with a link to a gist (or similar) or as text with block formatting, so it is easy to read.

So you know - there is a known issue with Tophat itself (not with the Galaxy wrapper) in some versions that will trigger a failure due to a software bug that sounds a bit like the latter problem.

Thanks and let us know, Jen, Galaxy team

ADD COMMENT • link modified 18 months ago • written 18 months ago by Jennifer Hillman Jackson ♦ 25k

18 months ago by

vit.filippo • 0

vit.filippo • 0 wrote:

Thank you Ms Hillman for your precious help. I've just tried hisat but I've had the very same problem. The total number of reads is 20*10^6 but the aligner is able to read only half of them. There are no errors suggesting that everithing is fine. Since the problem is not the quality, I've thought that could be a kind of strand specific protocol and I didn't know it. Is there any way to know the "strandedness"? Thank you in advice for your help

ADD COMMENT • link written 18 months ago by vit.filippo • 0

If only half of the data is aligning, with no errors, this suggests an input problem.

Start by double checking that the fastq files are actually in fastqsanger format. This is the most common reason for low mapping rates.

How to check inputs:

Strand seems unlikely to be an issue if the library prep protocol was not targetted, so it is probably OK to leave it unspecified. But you can check these other help areas. In particular, examine the tutorial for NGS Logistics here:

Similar posts • Search »