Question: How to count unique short sequences in FASTQ
2.3 years ago
yjoechen wrote:

Hi, I wonder whether there is any tool that I could use to output the number of unique reads. I don't need any alignment to reference genome. Thanks!

modified 2.3 years ago by Devon Ryan1.9k • written 2.3 years ago by yjoechen0
2.3 years ago
Devon Ryan
Devon Ryan wrote:

FastQC does that as part of its output. There's a graph of the number of reads that exist only once, twice, and so on.

written 2.3 years ago by Devon Ryan1.9k

Thanks for the advice, Devon Ryan! It seems that FastQC could be helpful. However, the default setting only reports overrepresented sequences with p >0.01. I wonder whether it is possible that we could extract the number counts of ALL unique sequences. Thanks.

written 2.3 years ago by yjoechen0

You're looking in the wrong section. You're looking for something that looks like this:

FastQC duplication levels example

You can get more details about that plot here

written 2.3 years ago by Devon Ryan1.9k
