Question: when is repeatmasker used in data analysis pipelines?
gravatar for eforchielli
6 weeks ago by
eforchielli0 wrote:

Hi everyone,

I am totally new to sequencing data analysis, and I apologize if this question has been asked and answered ad nauseam... but I can't seem to figure this out.

I know that repeatmasker is commonly used to remove reads that contain repetitive elements from sequencing datasets. My question is, at what stage in an analysis pipeline is it commonly used, if at all? Would you apply repeatmasker in ChIP- and RNA-seq analysis or just during de novo genome assembly? Am I completely missing the point and correct usage of repeatmasker? I've read the repeatmasker documentation and have a sense for what it does, but I'm not sure when it's actually used.

I'm asking because I'm specifically interested in these discarded reads, and I'm not sure how to tell if certain existing datasets in public repositories are likely to have had this information removed.

Thanks for your help! Elena

ADD COMMENTlink modified 6 weeks ago by Jennifer Hillman Jackson23k • written 6 weeks ago by eforchielli0
gravatar for Jennifer Hillman Jackson
6 weeks ago by
United States
Jennifer Hillman Jackson23k wrote:


Genome data sources often will use repeatmasker to soft mask (lower case bases) or hard mask (NNN replacement) nucleotide databases they release. The name of the file and/or the readme associated with the data should tell you if used (including which db choices) and how (sort or hard masking). Often sources will release several versions: unmasked, soft masked, hard masked.

If your genome is at UCSC, they have a RepeatMasker track for any genome with available rm databases.

For example Galaxy workflows, please see our tutorials here:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 6 weeks ago by Jennifer Hillman Jackson23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 102 users visited in the last hour