Question: What Is The Quality Score Type For The Solid Datasets Downloaded From Sra Of Ncbi?
0
gravatar for Gene Genome
5.7 years ago by
Gene Genome20
United States
Gene Genome20 wrote:
Hi all, Please help with the quality score type for the downloaded Solid datasets. I downloaded RNA-seq datasets, which were generated by AB Solid system, as base space and at FastQ format from SRA of NCBI. I uploaded the datasets onto the online sever Galaxy and change the datatype directly into "fastqsanger" and then test the quality by running FastQC. The output "per base quality" of solid dataset (please take look at the attached figure "per_base_quality-Solid") is quite different from the output "per base quality" of Illumina dataset (please compare with the attached figure "per base quality-Illumina"). The top score for Solid dataset is about 31, however the top score for Illumina dataset is 38. What is the quality score type for the downloaded Solid datasets when downloaded as base space and at FastQ format from SRA of NCBI? Please help me solve this problem. Thanks. Best regards. Jianguang Du
galaxy • 1.7k views
ADD COMMENTlink modified 5.7 years ago by Jennifer Hillman Jackson25k • written 5.7 years ago by Gene Genome20
0
gravatar for Jennifer Hillman Jackson
5.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Jianguang, The tool "NGS: QC and manipulation -> FASTQ Groomer" has some information about this, including a link to a wikipedia entry with more details specifically about the SRA: http://en.wikipedia.org/wiki/FASTQ_format http://en.wikipedia.org/wiki/FASTQ_format#NCBI_Sequence_Read_Archive And here is the SRA submission form, although the experimental record you downloaded from is the best place to find details: https://www.ebi.ac.uk/ena/about/sra_data_format SRA accepts CS and Fastq. In Galaxy these translate to: Color space reads: - datatype "Color Space Sanger" - annotated as "fastqcssanger" Fastq reads: - datatype with Phred quality offset 64 "Illumina 1.3-1.7" - annotated as "fastqillumina" and - datatype with Phred quality offset 33 "Illumina 1.8+" - annotated as "fastqsanger" Many tools require "fastqsanger". Use the "FASTQ Groomer" to transform as needed, but double check with FastQC just like you are doing. I have seen data labeled as Illumina 1.5 that was really already scaled to Phred+33, or at least appeared to be. In the end this is a judgement call or you can try to contact SRA/data authors for a definitive answer if there are no processing notes in the experiment (often the case). Hopefully this helps, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 5.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour