Question: FastQC Kmer in centre of sequence, quality trim?
gravatar for reubenmcgregor88
2.2 years ago by
reubenmcgregor8850 wrote:

I have run a FastQC on some fastqc sequences and the report tells me I have contaminating Kmers. Normally this would be fine as I could quality trim to get rid of them before downstream analysis but these seem to be present at position 10-18 in my sequence. See [1] below.

Does anyone know a)why this would happen and b)how they can be or even if they should be removed before mapping etc?

Thanks so much

Result of FastQC

fastq fastqc kmer quality galaxy • 1.3k views
ADD COMMENTlink modified 2.2 years ago by Jennifer Hillman Jackson25k • written 2.2 years ago by reubenmcgregor8850
gravatar for Jennifer Hillman Jackson
2.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:


FastQC performs the analysis on a sample of the data (first 200k reads). The reads are pretty short to begin with and trimming these out would result in very short reads that could be difficult to map. So, it is probably OK to not worry about or remove these embedded Kmer regions.

I would instead focus on detected adaptor sequence in the FastQC report. It is possible that adaptors are present in the first 10 bases and the kmers follow those, representing some other artifact from library prep. However, since this would consume so much of the read (about half), these reads would fall out during the alignment step anyway, even if trimmed, especially if the data is RNA and a spliced alignment tool like Tophat/HISAT is used. 15 bases is pretty short to map unless it is DNA and the mapper (BWA, etc) has the parameters tuned well.

Others are still welcome to offer their options on this.

Best, Jen, Galaxy team

ADD COMMENTlink written 2.2 years ago by Jennifer Hillman Jackson25k

Hello Jen,

Thanks for the quick reply. Sorry I should have mentioned it is a ChIP-Seq experiment. So you would suggest just aligning the reads anyway after adaptor trimming? I realise it is hard to say without knowing more info etc. But I am fairly new to this so any advice is very welcome :)



ADD REPLYlink written 2.2 years ago by reubenmcgregor8850

OK - so DNA. I would go forward and give mapping the sequences a try. Thanks! Jen

ADD REPLYlink written 2.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour