Hi everyone,
I am totally new to sequencing data analysis, and I apologize if this question has been asked and answered ad nauseam... but I can't seem to figure this out.
I know that repeatmasker is commonly used to remove reads that contain repetitive elements from sequencing datasets. My question is, at what stage in an analysis pipeline is it commonly used, if at all? Would you apply repeatmasker in ChIP- and RNA-seq analysis or just during de novo genome assembly? Am I completely missing the point and correct usage of repeatmasker? I've read the repeatmasker documentation and have a sense for what it does, but I'm not sure when it's actually used.
I'm asking because I'm specifically interested in these discarded reads, and I'm not sure how to tell if certain existing datasets in public repositories are likely to have had this information removed.
Thanks for your help! Elena