Question About Fastq Groomer

Question: Question About Fastq Groomer

8.6 years ago by

To Whom It May Concern: I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed there is an option for "Input FASTQ quality scores type". My question is what different conversions you will do when I choose "Sloexa" or "Illumina 1.3+". I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5. It seems like, based on your options, I should have chosen other conversion (e.g., your "Illumina 1.3+") to convert data generated by Illumina 1.5 Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the quality score calculation. But, when I use BWA and SAMtools to do mapping and call SNPs, I notice the size of the bam or pileup files are very different between those two different conversions. Also, it looks like even the coverage for some of the bases are different when choosing different conversions. Can you tell me how the conversion can affect the final result in terms of coverage? All your help will be greatly appreciated! -Jianchao Yao

bwa alignment samtools bam • 1.2k views

ADD COMMENT • link •

modified 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k • written 8.6 years ago by Yao, Jianchao • 10

8.6 years ago by

Daniel Blankenberg ♦♦ 1.7k

United States

Daniel Blankenberg ♦♦ 1.7k wrote:

Hi Jianchao Yao, The FASTQ Groomer functions according to the format variants described in Cock et al, 2009 (http://www.ncbi.nlm.nih.gov/pubmed/20015970). You are correct that the difference in the output created by the Groomer based upon choosing Solexa or Illumina 1.3+ is the result of converting the Quality Score values to the PHRED scale and Sanger ASCII ranges; Solexa scores go from -5 to 62 (decimal) and are on the Solexa score-scale, whereas Illumina 1.3+ scores go from 0 to 62 (decimal) and are on a PHRED score scale (e.g. a decimal Solexa score of 0 corresponds to a PHRED score of 3). You will need to check with your data generator/provider to determine the source encoding of your FASTQ files. It seems reasonable that reads with different quality score values (a direct result of what you specify as the source encoding of your FASTQ file during grooming) could have different mapping and SNP calling results, depending upon the algorithms used by the mapping software. I would suggest that you consult the BWA (http://bio- bwa.sourceforge.net/) and SAMTools (http://samtools.sourceforge.net/) documentation to determine how quality scores could affect mapping and SNP calling results. Thanks for using Galaxy, Dan

ADD COMMENT • link written 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k

Similar posts • Search »