Question: Question About Fastq Groomer
0
gravatar for Yao, Jianchao
8.6 years ago by
Yao, Jianchao10 wrote:
To Whom It May Concern: I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed there is an option for "Input FASTQ quality scores type". My question is what different conversions you will do when I choose "Sloexa" or "Illumina 1.3+". I am asking this question because I used to use Maq's sol2sanger (I guess it is just similar to your "Solexa") to convert all data generated by Illumina 1.5. It seems like, based on your options, I should have chosen other conversion (e.g., your "Illumina 1.3+") to convert data generated by Illumina 1.5 Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the quality score calculation. But, when I use BWA and SAMtools to do mapping and call SNPs, I notice the size of the bam or pileup files are very different between those two different conversions. Also, it looks like even the coverage for some of the bases are different when choosing different conversions. Can you tell me how the conversion can affect the final result in terms of coverage? All your help will be greatly appreciated! -Jianchao Yao
bwa alignment samtools bam • 1.2k views
ADD COMMENTlink modified 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k • written 8.6 years ago by Yao, Jianchao10
0
gravatar for Daniel Blankenberg
8.6 years ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:
Hi Jianchao Yao, The FASTQ Groomer functions according to the format variants described in Cock et al, 2009 (http://www.ncbi.nlm.nih.gov/pubmed/20015970). You are correct that the difference in the output created by the Groomer based upon choosing Solexa or Illumina 1.3+ is the result of converting the Quality Score values to the PHRED scale and Sanger ASCII ranges; Solexa scores go from -5 to 62 (decimal) and are on the Solexa score-scale, whereas Illumina 1.3+ scores go from 0 to 62 (decimal) and are on a PHRED score scale (e.g. a decimal Solexa score of 0 corresponds to a PHRED score of 3). You will need to check with your data generator/provider to determine the source encoding of your FASTQ files. It seems reasonable that reads with different quality score values (a direct result of what you specify as the source encoding of your FASTQ file during grooming) could have different mapping and SNP calling results, depending upon the algorithms used by the mapping software. I would suggest that you consult the BWA (http://bio- bwa.sourceforge.net/) and SAMTools (http://samtools.sourceforge.net/) documentation to determine how quality scores could affect mapping and SNP calling results. Thanks for using Galaxy, Dan
ADD COMMENTlink written 8.6 years ago by Daniel Blankenberg ♦♦ 1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour