Question: Removing sequence duplicates
0
gravatar for msumaira36
9 months ago by
msumaira360
msumaira360 wrote:

I have used top hat in galaxy to map my files. After mapping i am interested in removing sequence duplicates from mapped .bam files. For this purpose i have used samtools - Rmdup. My problem is that when i checked sequence duplication levels in .bam file (processed by Rmdup) using FASTQC, it is again showing high sequence duplication level.
Same is the case with Picard:Markduplicates. It is also not removing duplicates (although remove_duplicates has been set to yes).

Please help me what i am doing wrong.

Thanks

tophat bowtie samtools • 557 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by msumaira360
0
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The duplicates do not meet the criteria for removal (are not exact duplicates, potentially partial). This can result from deep sequencing in targetted regions based on how the library was prepared, for example, targetted enrichment (common for RNA-seq). For QC, also check overrepresented sequences as this is where contamination is often identified. This prior post has good details about the "why": https://www.biostars.org/p/150076/

Resources:

Removing duplicates is explained here:

FastQC help is here:

RNA-seq tutorials can be found here:

Hopefully this helps! Jen, Galaxy team

ADD COMMENTlink written 9 months ago by Jennifer Hillman Jackson23k
0
gravatar for msumaira36
9 months ago by
msumaira360
msumaira360 wrote:

Thanku for your help ..

ADD COMMENTlink written 9 months ago by msumaira360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 118 users visited in the last hour