Question: Count intervals of non-uniquely mapped reads overlapping the genome
0
gravatar for sunkissbean
3.6 years ago by
United States
sunkissbean0 wrote:

Hello everyone,

I have some SAM/BAM files containing the alignments of small RNA-seq reads to mm9 that are created by BWA. I'm interested in calculating where they are mapped to in the genome. I noticed that there are a lot of reads that are mapped to multiple loci (multi-mappers) in genome. Therefore, I first separated the unique-mappers from multi-mappers and counted intervals of unique-mappers overlapping different regions of the genome using the bedtool in the galaxy. Now here comes the issue: as multi-mappers are mapped non-uniquely to various regions in the genome, if I simply use the same method as the uniquely-mappers, I will overestimate the number of counts, how can I count the normalized intervals of multi-mappers overlapping different regions of the genome? Is there any tool in the galaxy or R package that I can use? Thank you for your help in advance!

bed galaxy bam • 2.0k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by sunkissbean0

The procedure you need generally depends on the biology you want to explore - eg http://www.rna-seqblog.com/small-rna-read-alignment-for-accurate-quantification/ shows that different levels of mapping stringency give results with different uses - there's no one-size-fits-all approach but low multiplicity seems useful for t-rna whereas unique mappings are good if you only care about microRNA.

I don't think there are any specific tools in Galaxy for this kind of exploration but if you can find a working R script it can probably be easily turned into a Galaxy tool - eg using the tool factory.

ADD REPLYlink written 3.6 years ago by fubar1.1k
0
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

For RNA-seq data, BWA is probably not the optimal mapping tool, if working with a genome that contains spliced transcripts. Tophat/2 would be a better choice.

Have you examined the Tuxedo suite to see if it will meet your analysis needs? Cufflinks may provide you with the desired results. See the tool group "NGS: RNA-seq". Help for the tool package is here:
Section 2.13.2 http://wiki.galaxyproject.org/Support

Best, Jen, Galaxy team

ADD COMMENTlink written 3.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for sunkissbean
3.6 years ago by
United States
sunkissbean0 wrote:

Thank you for your reply, Jen! Mine is 50-bp single-end sequences from small RNAs rather than RNA-seq. I mapped my sequences to mm9 using BWA and more than 80% of the reads can be mapped to the genome while the non-mappers cannot be mapped to the genome using Tophat2. The reason why I didn't use the Tuxedo suite is because for quantification of small RNA reads, RPKM is usually not a good option due to their small size (20-30 nt). 

ADD COMMENTlink written 3.6 years ago by sunkissbean0

Sorry, I didn't read the question carefully enough! Ross's advice above is good - and these tools are not on the public Main Galaxy instance. There are some other wrapped tools for Galaxy that may be a fit. These are for use in a cloud Galaxy. See the Tool Shed. Thanks! Jen, Galaxy Team

http://usegalaxy.org/toolshed
http://usegalaxy.org/cloud

ADD REPLYlink written 3.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for sunkissbean
3.6 years ago by
United States
sunkissbean0 wrote:

Thank you all for your suggestions and advices! The link really shows the importance of dealing with the multi-mappers for accurate quantification of small RNA seq. I wonder whether I can use RSEM or eXpress, both of which can be found in the tool shed and seem to give more accurate quantification of multi-mappers by normalizing the number of times that they are mapped to the genome. 

ADD COMMENTlink written 3.6 years ago by sunkissbean0
0
gravatar for sunkissbean
3.6 years ago by
United States
sunkissbean0 wrote:

Update: I tried RSEM on the cloud but somehow running the RSEM but always got an error message reading "tool error". I feel that the error probably came from when I prepared the reference as the reference file that I generated was very small (<1 MB). Specifically, I used the "RSEM prepare reference" function to generate the reference from RefSeq DNA fasta file. Then, I used the RSEM calculate expression to quantitate the expression level of small RNAs by using a SAM/BAM file that was generated by BWA and the reference file. I am really confused what went wrong now.

ADD COMMENTlink written 3.6 years ago by sunkissbean0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 89 users visited in the last hour