Question: when is repeatmasker used in data analysis pipelines?
0
gravatar for eforchielli
10 months ago by
eforchielli0 wrote:

Hi everyone,

I am totally new to sequencing data analysis, and I apologize if this question has been asked and answered ad nauseam... but I can't seem to figure this out.

I know that repeatmasker is commonly used to remove reads that contain repetitive elements from sequencing datasets. My question is, at what stage in an analysis pipeline is it commonly used, if at all? Would you apply repeatmasker in ChIP- and RNA-seq analysis or just during de novo genome assembly? Am I completely missing the point and correct usage of repeatmasker? I've read the repeatmasker documentation and have a sense for what it does, but I'm not sure when it's actually used.

I'm asking because I'm specifically interested in these discarded reads, and I'm not sure how to tell if certain existing datasets in public repositories are likely to have had this information removed.

Thanks for your help! Elena

rna-seq repeatmasker chip-seq • 305 views
ADD COMMENTlink modified 10 months ago by Jennifer Hillman Jackson25k • written 10 months ago by eforchielli0
0
gravatar for Jennifer Hillman Jackson
10 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Genome data sources often will use repeatmasker to soft mask (lower case bases) or hard mask (NNN replacement) nucleotide databases they release. The name of the file and/or the readme associated with the data should tell you if used (including which db choices) and how (sort or hard masking). Often sources will release several versions: unmasked, soft masked, hard masked.

If your genome is at UCSC, they have a RepeatMasker track for any genome with available rm databases.

For example Galaxy workflows, please see our tutorials here: https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENTlink written 10 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 96 users visited in the last hour