Hi All, I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which "Input FASTQ quality scores type" I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as "read 1" and "read 2". 1) Sequence and quality score displayed in Galaxy @read 1 length=51 NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC +read 1 length=51 #1=ADADEHHHHHIIGIHJGJJJHJIIJJJH@HEGBFH;FHEH>@HIJJJJ @read 2 length=51 NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT +read 2 length=51 #1=DDDEDHHFHHJJJJJIJJHIIIJJJIJJJJJJJIJIJJJJJJIJJJJI 2) Sequence and one chanel quality score shown in SRA of NCBI when I downloaded the dataset. NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC One channel quality score 2 16 28 32 35 32 35 36 39 39 39 39 39 40 40 38 40 39 41 38 41 41 41 39 41 40 40 41 41 41 39 31 39 36 38 33 37 39 26 37 39 36 39 29 31 39 40 41 41 41 41 NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT One channel quality score 2 16 28 35 35 35 36 35 39 39 37 39 39 41 41 41 41 41 40 41 41 39 40 40 40 41 41 41 40 41 41 41 41 41 41 41 40 41 40 41 41 41 41 41 41 40 41 41 41 41 40 Looks like the dataset is generated by illumina that is later than version 1.8 because some of the reads are at score quality of 41. Can I choose "sanger" as "Input FASTQ quality scores type" when I run FASTQ Groomer? Thanks. Jianguang Du
Hi Jianguang, I agree - already Sanger Phred +33 offset quality scores, meaning you want datatype .fastqsanger (with near certainty). To double check, take a sample and run "FastQC" on it to be exact, or run this tool on the entire dataset if you plan on doing quality checks anyway (potential trimming, etc). You also don't need to run the groomer - just assign the datatype by clicking on the pencil icon. Help is here and the screencast FASTQ Prep walks through a how-to (using SRA data as an example): Hope this helps - but you are really already on the right track, I'm just agreeing! Jen Galaxy -- Jennifer Hillman-Jackson
