Question: Filtering fastq (by quality score and length), optimum criteria?
gravatar for dina.hesham139
3.1 years ago by
dina.hesham1390 wrote:

For filtering fastq files (of RNAseq data) (by quality score and length) in galaxy, what are the optimum criteria?

i.e. the min and max size, the min and max quality and Maximum number of bases allowed outside of quality range.

My datasets are from human samples, Hiseq2000, paired end experiment (2 separate files).

rna-seq galaxy • 1.4k views
ADD COMMENTlink modified 3.1 years ago by Jennifer Hillman Jackson25k • written 3.1 years ago by dina.hesham1390
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:


For RNA-seq data, the minimum QA needed in order to map the data successfully is the goal (to avoid introducing bias into the experiment).

Run the tool FastQC first to determine if there are regions of the sequence that would benefit from trimming (low quality ends that would interfere with mapping success). Then use this tool or one of the others that directly clip/trim regions of sequence in the same tool group (NGS: QC and manipulation).

Then map the data. You could take a sample, run the QA a few different ways, map and then compare mapping rates to determine the best QA for your particular datasets. 

Also see: GalaxyNGS101#Fastq_manipulation_and_quality_control

Thanks, Jen, Galaxy team 

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour