Question: Removing sequence duplicates
21 months ago by
msumaira360 wrote:

I have used top hat in galaxy to map my files. After mapping i am interested in removing sequence duplicates from mapped .bam files. For this purpose i have used samtools - Rmdup. My problem is that when i checked sequence duplication levels in .bam file (processed by Rmdup) using FASTQC, it is again showing high sequence duplication level.
Same is the case with Picard:Markduplicates. It is also not removing duplicates (although remove_duplicates has been set to yes).

Please help me what i am doing wrong.


tophat bowtie samtools • 2.3k views
21 months ago by msumaira360
21 months ago by
United States
Jennifer Hillman Jackson25k wrote:


The duplicates do not meet the criteria for removal (are not exact duplicates, potentially partial). This can result from deep sequencing in targetted regions based on how the library was prepared, for example, targetted enrichment (common for RNA-seq). For QC, also check overrepresented sequences as this is where contamination is often identified. This prior post has good details about the "why":


Removing duplicates is explained here:

FastQC help is here:

RNA-seq tutorials can be found here:

Hopefully this helps! Jen, Galaxy team

21 months ago by Jennifer Hillman Jackson25k
21 months ago by
msumaira360 wrote:

Thanku for your help ..

21 months ago by msumaira360
