Question: Get Only Repeatmasked Exons
7.1 years ago by
Dear Galaxy expert(s), I have .BED file of regions from mouse. I guess many of them can span whole genes i.e. many exons; might even span over the gene flanks. I need to get the REPEATMASKED sequences of only the annotated exons of these regions. I see that If I use the tool "Fetch Sequences->Extract Genomic DNA" on these regions, it returns sequences with mixed small and capital letters. Question I: what are the small letters and what are the capitals here? Are these already masked, exons/introns or what? (I downloaded some of these sequences and repeatmasked myself. My pasked sequences overlap with some of "yours" written in small letters.) Question II: Is the strand "honored" by these tool? I guess I remember from my old experience that there was an issue although I can not recall what exactly. Thank you in advance, David
7.1 years ago by
Anton Nekrutenko1.7k wrote:
David: In case of mouse the sequences are extracted from softmasked genomic builds retrieved from UCSC. So, small lettres = repeats, capital letters = no repeats. Yes, if the strand is explicitly specified. If it is not specified it is assumed to be +. Thanks for using Galaxy. anton
