I have some RNA-seq data for two different species and their hybrids. I want to look at the expression levels of the allele from each parent in the hybrids. I am trying to use mPileup for this. First, I want to find all positions with fixed differences between the two parent species.
My problem is, I want to select only positions supported by a certain minimum number of reads. But mPileup outputs a whole bunch of > and <, which I read represents large gaps (so presumably, introns that are spanned by a read). These symbols are counted towards the final read mapping total, which causes me to get a lot of extra positions reported after filtering, that actually have fewer reads mapping there than the minimum I want. Is there a way to get mPileup not to report these gaps? Or can anyone think of a way to filter them out after the fact?
My plan after mPileup and filtration for quality/min read count is to then filter the file to find positions where there is a unique base represented in the reads. I'll do this for each parent, and then compare what read occurs in each parent to find any fixed differences between the two.
After I have the list of fixed differences between the parents, I want to do a pileup of just those positions in the hybrids. I see there is an option 'List of regions or sites on which to operate' in mPileup, which seems to require a BED format file. How would I convert the list of positions in the parents to a BED file?
I'd appreciate any help/suggestions!