Hi, I am using use "BWA for illumina" on galaxy main server, I am looking for the frequency of reads reads matching a specific region that is present in multiple copies in Hg19 on a few chromosomes. When I run BWA with in silico reads made from the original hg19 sequence I get a MapQ of 0 along with the WT:A:R tag and generally two other locations indicated by the XA: tag. Therefore when I run my actual reads, all the reads from that region are removed from the results as they are tagged unmapped and fall below the quality filter used.
Is there a way to :
1.Extract all the XA:locations obtained by running a synthetic library generated from the original target region sequence and create a BEDfile covering all these regions.(it is a large region, I have a synthetic library with 100k read, mimicking all 30bp or 80bp reads that could be derived from it).
That way, I guess when running my actual library, I can keep all the reads from the SAM output that fall within that BED and pool them back with all the other reads reads that matched to other location in HG19 with a correct MAPQ.(By the way is there a general agreement on a acceptable MAPQ score ?)
Do you think this a good strategy, or maybe is there a better way to deal with that issue ?
Thank you for your help,
Edouard DD
Hello, I don't know the answer for this one. Perhaps someone else on this forum will answer, although most Q&A here is with respect to Galaxy usage (not the details of 3rd party algorithms). Because of this, I would also suggest asking this question at the BWA help forum since their sole focus is this tool and use cases. Thanks! Jen