Question: Are all the starting BED coordinates generated by BWA matching the first base of the reads ?
gravatar for ed.dreuzy
2.5 years ago by
ed.dreuzy10 wrote:

I am currently using "Map with BWA for illumina" (in galaxy main server) with default settings to map DNA PCR amplicons (sequenced on a Miseq) that can have any size ranging from 20nt to about 100nt after adaptor trimming. I need my alignment to start precisely at base 1 as the information I am looking for is precisely the base 1 position of my reads on the genome (human/hg19). I have heard of "soft clipping" and of possible gap/MM extension at read ends, but I am not fully familiar with these concepts and with the advanced settings of BWA in general ... (NGS analysis is something new to me...). I have seen SAM outputs with results such as:

MQscore : 37 CIGAR: 57M ; MD:Z:0A0C55., meaning the first two nt are mismatches if I am correct.

Can I consider the starting coordinate I get in the BED (generated from the SAM file) as the first base of my read ,or is it possible to have it wrong because of tolerated mismatches at the 5' end ? If that is the case, up to how many consecutive mismatches at the 5' end would be tolerated by BWA with default settings ?

Thanks a lot for your help,

Edouard DD.

bed bwa galaxy • 686 views
ADD COMMENTlink modified 2.5 years ago by Jennifer Hillman Jackson25k • written 2.5 years ago by ed.dreuzy10
gravatar for Jennifer Hillman Jackson
2.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:


I do not think that one can require the start of the read to align without mismatches specifically, but the entire alignment can be required to be a perfect match by adjusting the aln -n value (edit distance). If set to "0", no mismatches are permitted. 

Another related parameter is aln -i (restricting indels at sequence ends). Default is 5 bases, so that should cover your case.

The help at the bottom of the tool form has links to the BWA support forums where the fine details of the algorithm and parameters can be searched for in prior Q&A or a new question can be asked for feedback from the tool authors. 

For the BAM-to-BED conversion, there is no guarantee that the first base in the aligned sequence includes the first base (unless aln -n is set to "0"). Just be aware that making that choice could reduce the number of reads mapped due to other mismatches internal to the read.

Best, Jen, Galaxy team

ADD COMMENTlink written 2.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 65 users visited in the last hour