Filtering fastq (by quality score and length), optimum criteria?

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Filtering fastq (by quality score and length), optimum criteria?

0

3.1 years ago by

dina.hesham139 • 0

dina.hesham139 • 0 wrote:

For filtering fastq files (of RNAseq data) (by quality score and length) in galaxy, what are the optimum criteria?

i.e. the min and max size, the min and max quality and Maximum number of bases allowed outside of quality range.

My datasets are from human samples, Hiseq2000, paired end experiment (2 separate files).

rna-seq galaxy • 1.4k views

ADD COMMENT • link •

modified 3.1 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.1 years ago by dina.hesham139 • 0

0

3.1 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

For RNA-seq data, the minimum QA needed in order to map the data successfully is the goal (to avoid introducing bias into the experiment).

Run the tool FastQC first to determine if there are regions of the sequence that would benefit from trimming (low quality ends that would interfere with mapping success). Then use this tool or one of the others that directly clip/trim regions of sequence in the same tool group (NGS: QC and manipulation).

Then map the data. You could take a sample, run the QA a few different ways, map and then compare mapping rates to determine the best QA for your particular datasets.

Also see: GalaxyNGS101#Fastq_manipulation_and_quality_control

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

Filtering Fastq File According To Qual Score
Hi, I am trying to filter my fastq file with the condition of if quality score of reads is less ...
Question Regarding Quality Filtering Of 454 Amplicons
Hi, I have a question for you guys regarding quality filtering. I have a data set of double MID...
quality score length differs from sequence length
I have a fastq file that tophat2 can't process because one of the reads seems to have a sequence ...
Invalid quality scores
I'm trying to demultiplex some older 454 sequences (fasta + qual) so that I have individual fastq...
This job was terminated because it used more memory than it was allocated
Hello, I have always encountered such problem "This job was terminated because it used more memo...
find mean, standard deviation, max, and min for insert lengths in BWA
Hello how can I find mean, standard deviation, max, and min for insert lengths in BWA?
Building Bowtie index, failed
I encountered a problem when trying to mapping RNA-seq reads to the genome. Fatal error: Tool ex...
Using filtered data for TopHat
Hi, When I extract DRR data using "Extract reads in FASTQ/A format from NCBI SRA," for TopHat, ...
Filter FASTQ by length job "currently running" for 24+ hrs.
Hello! I am using Galaxy. I have a library of small RNA in FASTQ format that I am trying to filt...
Problem with TopHat in Galaxy using data from Nature Protocols
Hi, I am a graduate student who began RNA-seq data analysis recently. I am having a trouble with...
Making paired-end reads the same length
I have paired-end data (in two separate files). I have groomed my files, and would like to filte...
Problems With The Groomer
Hi, I'm experiencing some strange problems with the fastq groomer. Trying to groom my files I get...
Tophat Error: segment-based junction search failed with err =1
Hi Everybody, I have this error message when I try to do Top Hat algorithm on my Groomer FASTQ f...
Tophat Error: Segment-Based Junction Search Failed With Err
Hello, I have run several analysis with Tophat 2 on my local instance of galaxy and I get this er...
Tophat Error: Segment-Based Junction Search Failed With Err
Hello, I don't know why I still have this problem.. I have run tophat2 with different dataset, so...
Fastq Joiner Fails To Join Pe Data.
Hi, I have HiSeq2000 paired end sequence data in two separate FASTQ files. I need to filter t...
TOPHAT error: segment-based junction search failed with err =-6 what(): std::bad_alloc
Anyone know what this error is about or how to troubleshoot? Most samples work fine, but some thr...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 172 users visited in the last hour