Question: Invalid quality scores
0
gravatar for sarahs
4.0 years ago by
sarahs0
United States
sarahs0 wrote:

I'm trying to demultiplex some older 454 sequences (fasta + qual) so that I have individual fastq files for each barcode. I'm not looking to do a lot of quality filtering. I combined my fasta and qual files into a fastq file and then used the barcode splitter. However, the number of bases don't match the number of quality scores. When I try and filter the fastq file to remove low quality scores and limit the length of the sequences, I get the following error:

AssertionError: Invalid FASTQ file: quality score length (361) does not match sequence length (360)

I then thought maybe I need to trim the adaptors (though I really just wanted to demultiplex the data), I get the following error:

fastx_clipper: Error: invalid quality score data on line 4 (quality_tok = "FFFFFFFFFFFHHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFF~FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEEEFFFFFF~FFFFFFFFFFFFFDDD;;;;;FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFCCCC<~DDDD???DD:::555555/==::<?DDF=???FFFFFFFFC???DDAAAAAB999946~;44/24<>>>>>>??955223??D66::DDGCCAAA??>><<001419<???=???=?~"

Any suggestions of what I'm doing wrong? This is an older data file, so I suppose its also possible that it didn't transfer from the hard drive properly.

 

Thanks,

Sarah

454 format dataprep • 1.3k views
ADD COMMENTlink modified 4.0 years ago by Jennifer Hillman Jackson25k • written 4.0 years ago by sarahs0
0
gravatar for Jennifer Hillman Jackson
4.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

You could start with reviewing the load of the the original fastq datasets, but this would be an odd error for an incomplete upload (exactly one less base on the sequence dataset - it would be on the last line of the file).

When running the tool "Combine FASTA and QUAL" did you use defaults? And run this as a first step? If not, please see if that corrects the issue.

One less quality score is not related to adaptor, unless it was trimmed when it shouldn't be. When an adaptor base is present the sequence string would be one longer, not the quality score string. And that would be unexpected in your case anyway, if I am understanding the description correctly.

Best, Jen, Galaxy team

 

ADD COMMENTlink written 4.0 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour