Question: Removing sequence duplicates
0
gravatar for msumaira36
21 months ago by
msumaira360
msumaira360 wrote:

I have used top hat in galaxy to map my files. After mapping i am interested in removing sequence duplicates from mapped .bam files. For this purpose i have used samtools - Rmdup. My problem is that when i checked sequence duplication levels in .bam file (processed by Rmdup) using FASTQC, it is again showing high sequence duplication level.
Same is the case with Picard:Markduplicates. It is also not removing duplicates (although remove_duplicates has been set to yes).

Please help me what i am doing wrong.

Thanks

tophat bowtie samtools • 2.3k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by msumaira360
0
gravatar for Jennifer Hillman Jackson
21 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The duplicates do not meet the criteria for removal (are not exact duplicates, potentially partial). This can result from deep sequencing in targetted regions based on how the library was prepared, for example, targetted enrichment (common for RNA-seq). For QC, also check overrepresented sequences as this is where contamination is often identified. This prior post has good details about the "why": https://www.biostars.org/p/150076/

Resources:

Removing duplicates is explained here:

FastQC help is here:

RNA-seq tutorials can be found here:

Hopefully this helps! Jen, Galaxy team

ADD COMMENTlink written 21 months ago by Jennifer Hillman Jackson25k
0
gravatar for msumaira36
21 months ago by
msumaira360
msumaira360 wrote:

Thanku for your help ..

ADD COMMENTlink written 21 months ago by msumaira360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour