Filter Fastq By Percentage Of Ambiguous (N) Bases

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Filter Fastq By Percentage Of Ambiguous (N) Bases

0

5.3 years ago by

Anto Praveen Rajkumar Rajamani • 80

Denmark

Anto Praveen Rajkumar Rajamani • 80 wrote:

Hello, I like to filter my fastq files (50 bp single end Illumina RNA seq reads) by a maximum threshold (10%) of ambiguous (N) bases. I can see that the "CLIP" tool removes all reads with one or more N bases. Is there a way to remove only the reads with five or more N bases using Galaxy? Thank you. Best wishes, Anto

galaxy • 2.6k views

ADD COMMENT • link •

modified 5.3 years ago by Jennifer Hillman Jackson ♦ 25k • written 5.3 years ago by Anto Praveen Rajkumar Rajamani • 80

0

5.3 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello Anto, There is no specific tool that I know of to do this based off read content, but you could use the very low quality score (2) assigned to ambiguous bases and the tool 'Filter by quality' to do a filter by percentage. Be aware that other bases may have scores assigned to this lower value, but these would very likely not be of practical usage anyway. You could clip these end first, then do the filter, discarding any that have very short usable sequence left. If the data is Illumina, is likely a sign of a sequence that failed vendor quality checks, and these are no longer removed by default as of Casava 1.8+. Creating regular expression with the Select tool is another option, but this probably more effort than it is worth to construct. But, your choice. A google will bring up syntax advice. Ideally the first will do the job, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

ADD COMMENT • link written 5.3 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

mapping RNA-seq reads with "N" in the middle of each read
Hi all, I am performing differential gene expression analysis using the Tophat-Cuffdiff protocol...
What Is The Minimum Quality Should I Set For Filter Fastq?
Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. ...
Data from NextGen sequencing uploaded, concatenated, but not in appropriate FASTA format for next step?
Hello, I received my raw NextGen sequencing files and am following a lab mate's protocol based on...
Error with "Trim leading or trailing characters" tool
Hi all, I have been trying to trim my fastq files, and I am running into problems. I used the "T...
Remove "Unpaired" Reads From Quality-Filtered Pared-End Fastq Files.
Hi there, I obtained two fastq files from GA paired end run. I filtered each file by quality usi...
Error using stringtie - AttributeError: 'NoneType' object has no attribute"
Hi, I have **RNA-seq data** and I am interested in whole gene expression results but also transcr...
snpsift filter fatal error
Hi all, I want to use SnpSift Filter, but each time an error occurred. any idea? thanks in advanc...
htseq-count obtains zero counts
I am using the following command: htseq-count -s no -a 0 FourA.sam hg19.gtf > FourA.count an...
Preprocessing Gdna Illumina Paired End Data For Mapping/Snp Calling
This question is w/ regards to pre-processing whole genome resequencing data for mapping data to ...
Bowtie on Galaxy - -v or -n ?
Good morning, I'm using Galaxy with Bowtie for Illumina to map smallRNA sequencing (illumina) on ...
May 20, 2011 Galaxy Development News Brief
May 20, 2011 Galaxy Development News Brief http://bitbucket.org/galaxy/galaxy- central/wiki/Fea...
Fastq Joiner, "Bases to insert between joined reads"
Hi, In Fastq Joiner, how is the "Bases to insert between joined reads" determined? Can it be le...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 169 users visited in the last hour