How to count unique short sequences in FASTQ

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: How to count unique short sequences in FASTQ

0

2.3 years ago by

yjoechen • 0

yjoechen • 0 wrote:

Hi, I wonder whether there is any tool that I could use to output the number of unique reads. I don't need any alignment to reference genome. Thanks!

bowtie galaxy • 1.0k views

ADD COMMENT • link •

modified 2.3 years ago by Devon Ryan • 1.9k • written 2.3 years ago by yjoechen • 0

1

2.3 years ago by

Devon Ryan • 1.9k

Germany

Devon Ryan • 1.9k wrote:

FastQC does that as part of its output. There's a graph of the number of reads that exist only once, twice, and so on.

ADD COMMENT • link written 2.3 years ago by Devon Ryan • 1.9k

Thanks for the advice, Devon Ryan! It seems that FastQC could be helpful. However, the default setting only reports overrepresented sequences with p >0.01. I wonder whether it is possible that we could extract the number counts of ALL unique sequences. Thanks.

ADD REPLY • link written 2.3 years ago by yjoechen • 0

You're looking in the wrong section. You're looking for something that looks like this:

FastQC duplication levels example

You can get more details about that plot here

ADD REPLY • link written 2.3 years ago by Devon Ryan • 1.9k

Please log in to add an answer.

Similar posts • Search »

Count intervals of non-uniquely mapped reads overlapping the genome
Hello everyone, I have some SAM/BAM files containing the alignments of small RNA-seq reads to mm...
How To Use Collapsed Sequence Files In Mapping And Displaying
I found that there is a "collapse" tool under FASTA manipulation, which will significantly shorte...
Troubles With Batch Download
Hi everybody, sorry for bother youwith a silly question. I have a list with a LOT of rice loci (L...
Salmon unique reads
I have found Salmon a very good tool to run a set of RNA-seq reads against a fasta file and get m...
simple qu: amalgamate and/or transpose mutational and clinical data
Hi Probably a beginners question for most of you so apologies! I have excel spreadsheet of 400+...
How To Combine Two Reference Genome (Files) In Galaxy?
Hi all, I have two reference (genome) files. Let's say EAB_FB_MG.fa(total37972 sequences/contigs...
Adding read group information in a bam header
Please help, I have tried everything. So I have 200 unique samples, I have processed them accordi...
Alternative To Fpkm From Cufflinks
Hi all, Is there any way to find out the number of reads aligning to a transcript rather than the...
extract only unique mapped paired end reads
Hello, I aligned paired end reads with bwa 0.5.9. Next, I would like to extract only paired end...
How to get only paired end or single end uniquely mapped reads with Bowtie2?
Hello, I am using bowtie2 on galaxy to map ChIP-seq single end and paired end libraries to a ref...
Transformation Fastq Files
Hi there, I am not quite sure if Galaxy can help me but I am looking for way to transform a fast...
Creating a Workflow for Variant Detection for 200 samples
I have posted this question before and did not receive any responses. Let me try to articulate th...
Rna Seq Analysis
Hi I have a couple of questions regarding RNA seq analysis. My questions are 1.I need to use a v...
Is Tss_Id Unique For Each Transcript?
Dear All, I am reading and comparing the outputs of Cuffdiff. I found there is a "tss_id" column...
problems with PE reads upload
Hi. I am new in galaxy and I am having problems when uploading samples. I am uploading PE reads (...
Job id uniqueness across different galaxy server
When I run a tool, I get history_id and job_id which I believe are unique for my galaxy server. H...
Counting number of times multiple defined values appear in the dataset
I have gotten SNP data on a number of regions via UCSC genome browser. I have assigned all SNPs w...
Lift-Over: Unique Baboon Locations Convert to Single Human Location
I am currently trying to see which SNPs from a GBS project I'm working on convert and overlap kno...
Uploading Large Files/Repeat Sequences
I am trying to see if there are known repeat sequences in my chip seq data set, which are not uni...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 168 users visited in the last hour