using mPileup to examine allelic imbalance in hybrids

Question: using mPileup to examine allelic imbalance in hybrids

4.5 years ago by

Canada

Suzanne Gomes • 120 wrote:

Hi,

I have some RNA-seq data for two different species and their hybrids. I want to look at the expression levels of the allele from each parent in the hybrids. I am trying to use mPileup for this. First, I want to find all positions with fixed differences between the two parent species.

My problem is, I want to select only positions supported by a certain minimum number of reads. But mPileup outputs a whole bunch of > and <, which I read represents large gaps (so presumably, introns that are spanned by a read). These symbols are counted towards the final read mapping total, which causes me to get a lot of extra positions reported after filtering, that actually have fewer reads mapping there than the minimum I want. Is there a way to get mPileup not to report these gaps? Or can anyone think of a way to filter them out after the fact?

My plan after mPileup and filtration for quality/min read count is to then filter the file to find positions where there is a unique base represented in the reads. I'll do this for each parent, and then compare what read occurs in each parent to find any fixed differences between the two.

After I have the list of fixed differences between the parents, I want to do a pileup of just those positions in the hybrids. I see there is an option 'List of regions or sites on which to operate' in mPileup, which seems to require a BED format file. How would I convert the list of positions in the parents to a BED file?

I'd appreciate any help/suggestions!

rna-seq snp • 1.5k views

ADD COMMENT • link •

modified 4.4 years ago • written 4.5 years ago by Suzanne Gomes • 120

4.4 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

This is probably not be the best tool for the job. Have you considered the tools "Naive Variant Caller" and "Variant Annotator"? Other tools in this same group, NGS: Variant Analysis, may also be of interest. But working in VCF format, with tools that have the option to retain all data (including sites *without* varitation) and provide depth information is what you are looking for. These two will do that, so are a good place to start.

Others are welcome to add to the advice & recommend alternate tools. There are almost certainly many solutions to this question, even just using the tools on the public Main Galaxy server (http://usegalaxy.org) and even more in the Tool Shed (http://usegalaxy.org/toolshed) for local/cloud use.

Jen, Galaxy team

ADD COMMENT • link written 4.4 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »