Hello everyone,
I have some SAM/BAM files containing the alignments of small RNA-seq reads to mm9 that are created by BWA. I'm interested in calculating where they are mapped to in the genome. I noticed that there are a lot of reads that are mapped to multiple loci (multi-mappers) in genome. Therefore, I first separated the unique-mappers from multi-mappers and counted intervals of unique-mappers overlapping different regions of the genome using the bedtool in the galaxy. Now here comes the issue: as multi-mappers are mapped non-uniquely to various regions in the genome, if I simply use the same method as the uniquely-mappers, I will overestimate the number of counts, how can I count the normalized intervals of multi-mappers overlapping different regions of the genome? Is there any tool in the galaxy or R package that I can use? Thank you for your help in advance!
The procedure you need generally depends on the biology you want to explore - eg http://www.rna-seqblog.com/small-rna-read-alignment-for-accurate-quantification/ shows that different levels of mapping stringency give results with different uses - there's no one-size-fits-all approach but low multiplicity seems useful for t-rna whereas unique mappings are good if you only care about microRNA.
I don't think there are any specific tools in Galaxy for this kind of exploration but if you can find a working R script it can probably be easily turned into a Galaxy tool - eg using the tool factory.