2.2 years ago by
United States
Hello,
FastQC performs the analysis on a sample of the data (first 200k reads). The reads are pretty short to begin with and trimming these out would result in very short reads that could be difficult to map. So, it is probably OK to not worry about or remove these embedded Kmer regions.
I would instead focus on detected adaptor sequence in the FastQC report. It is possible that adaptors are present in the first 10 bases and the kmers follow those, representing some other artifact from library prep. However, since this would consume so much of the read (about half), these reads would fall out during the alignment step anyway, even if trimmed, especially if the data is RNA and a spliced alignment tool like Tophat/HISAT is used. 15 bases is pretty short to map unless it is DNA and the mapper (BWA, etc) has the parameters tuned well.
Others are still welcome to offer their options on this.
Best, Jen, Galaxy team