quality score length differs from sequence length

Question: quality score length differs from sequence length

3.6 years ago by

Australia

I have a fastq file that tophat2 can't process because one of the reads seems to have a sequence length far shorter than the corresponding quality scores (154 for sequence, 434 for quality scores). The entry with the mismatch doesn't seem to have a name so I can't find it and get rid of it or mask it from the analysis that way. Is there any way that I can instruct tophat2 to ignore any reads where this kind of mismatch occurs? Or some other solution?

Thanks

rna-seq tophat alignment • 1.8k views

ADD COMMENT • link •

modified 3.6 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.6 years ago by eloise.greenland • 0

3.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

First check the end of your file with the tool "Text manipulation: Select last lines from a dataset". Usually this problem is due to a truncated dataset upload and the problem will be in the last few lines. You will almost certainly want to reload the data. But if you do want to remove this line and keep the rest for some reason, there are other line manipulation functions in this same tool group.

This could be more complicated if you uploaded files, merged them, etc. And the truncated data could come from an transfer that is upstream of Galaxy.

Tools like "Fastq Groomer" will report problems by line numbers plus print out the contents of those lines. So that is also an option for finding where problems are within a dataset.

More troubleshooting help:
http://wiki.galaxyproject.org/Support#Error_from_tools

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.6 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen,

Thanks for your suggestions. I checked the last lines of the file and it doesn't appear truncated. I get the same problem using tophat2 locally on the command line. I haven't merged the data or anything, it is the raw sequencing file I received from our sequencing provider.

I did try the fastq groomer in galaxy and it failed to execute on this set of reads. I deleted it from my history when things were getting a bit messy. With any luck I haven't purged it yet and I can go back and check the info from the failed run. Otherwise I will run it again and get back to you.

Thanks,

Eloise.

ADD REPLY • link written 3.6 years ago by eloise.greenland • 0

Similar posts • Search »