Question: Fastq Groomer
gravatar for arabidopsis
7.1 years ago by
arabidopsis20 wrote:
Hi all, Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I asked at the sequencing facility about their machine and output and they said their format was Illumina 1.8+ (the newest). I tried to convert my fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input option and got all reads with quality of around 10... Does it mean that Galaxy cannot be used on a dataset with 1.8+ encoding or something else was wrong? Thanks, Slon
galaxy • 5.0k views
ADD COMMENTlink modified 7.1 years ago by Jennifer Hillman Jackson25k • written 7.1 years ago by arabidopsis20
gravatar for Peter Cock
7.1 years ago by
Peter Cock1.4k
European Union
Peter Cock1.4k wrote:
Illumina 1.8+ is already using the Sanger FASTQ encoding, so you don't need to convert it with the groomer. I think the Galaxy team might still recommend it as it doubles as a sanity test for corrupt FASTQ files. Peter
ADD COMMENTlink written 7.1 years ago by Peter Cock1.4k
If Illumina 1.8+ is already using the Sanger FASTQ encoding, the file should be recognized by downstream applications, like Quality statistics computer, quality filter etc. However, my file is not visible by those programs and when I click on it, only "uploaded fastq file" is displayed, without encoding details. S.
ADD REPLYlink written 7.1 years ago by arabidopsis20
Have you told Galaxy it is fastqsanger? My guess is the upload tool has defaulted to the generic fastq. Look with the "pencil" icon to edit the attributes of the uploaded FASTQ file in your Galaxy history. Peter
ADD REPLYlink written 7.1 years ago by Peter Cock1.4k
actually Illumina 1.8+ has one more quality value higher than fastqsanger (see ) my question now I guess is if I use fastqsanger would it break anything when it encounters the 'J' in the qual values?
ADD REPLYlink written 7.1 years ago by Kevin150
The Sanger FASTQ format has always allowed J (PHRED 41), the issue is some tools might treat that as an error as it is unusually high for a raw read. For instance, you need at least FASTX v0.0.13 to cope with this - older versions didn't like it. Peter
ADD REPLYlink written 7.1 years ago by Peter Cock1.4k
gravatar for Jennifer Hillman Jackson
7.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Slon, In case you are still having issues, the best use case for Illumina 1.8+ data is to run the FASTQ Groomer tool with the option "Sanger". As Peter noted, this assigns the expected datatype plus verifies content before investing time in downstream analysis. Please let us know if more help is needed, Best, Jen Galaxy team -- Jennifer Jackson
ADD COMMENTlink written 7.1 years ago by Jennifer Hillman Jackson25k
Hi,   So, I am getting a fastq groomer error on some illumina data, with the following error.  any ideas? There was an error reading your input file. Your input file is likely malformed. It is suggested that you double-check your original input file for errors -- helpful information for this purpose has been provided below. However, if you think that you have encountered an actual error with this tool, please do tell us by using the bug reporting mechanism.   The reported error is: 'Invalid fastq header: lab/solexa_public/Zon/111021_WICMT- SOLEXA_64KF7AAXX/QualityScore/s_3_1_sequence.txt rich ________________________________ To: arabidopsis <> Cc: Subject: Re: [galaxy-user] fastq groomer Hello Slon, In case you are still having issues, the best use case for Illumina 1.8+ data is to run the FASTQ Groomer tool with the option "Sanger". As Peter noted, this assigns the expected datatype plus verifies content before investing time in downstream analysis. Please let us know if more help is needed, Best, Jen Galaxy team -- Jennifer Jackson ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at  Please keep all replies on the list by using "reply all" in your mail client.  For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: To manage your subscriptions to this and other Galaxy lists, please use the interface at:
ADD REPLYlink written 7.1 years ago by Richard Mark White240
Howdy, Rich, My interpretation of the error report is that the fastq file you are trying to groom contains the indicated text (lab/solexa_public/Zon/ 111021_WICMT-SOLEXA_64KF7AAXX/QualityScore/s_3_1_sequence.txt) on a line where it expects a valid fastq header. I believe a valid header line would begin with an at sign ("@"). So perhaps somewhere along the way, your fastq file's contents were replaced by a filename. Bob H
ADD REPLYlink written 7.1 years ago by Bob Harris190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour