Question: FASTQC Overrepresented Sequences
4 months ago
wrote:

Hey all,

after running Trimmomatic and clipping Illumina adapters, I always run a FASTQC to have a look at the quality of my data.

This time I received for 80 % of my samples the info that there are overrepresented sequences. I blasted them, they are no adapters but pre-40s rRNA and mitochondria sequences. Their abundance is around 0.17 %.

My question is, do I have to remove these sequences from my RNA Seq data before calculating differentially expressed genes? If yes, do I have to remove them from all samples, even in those where they are not highlighted as overrepresented?

Thanks for any help!

4 months ago
United States
wrote:


Much higher levels of rRNA sequence can indicate that something went wrong during library preparation, however, your rate is pretty low if only around 0.17 % abundance. 17% abundance would be much more significate.

Duplications in this module can be associated with contamination but not always, and low rates shouldn't impact an analysis (most public annotation sources do not include rRNA, so these reads will drop out in latter steps). For more details, please review:

Thanks! Jen, Galaxy team

