I am currently using "Map with BWA for illumina" (in galaxy main server) with default settings to map DNA PCR amplicons (sequenced on a Miseq) that can have any size ranging from 20nt to about 100nt after adaptor trimming. I need my alignment to start precisely at base 1 as the information I am looking for is precisely the base 1 position of my reads on the genome (human/hg19). I have heard of "soft clipping" and of possible gap/MM extension at read ends, but I am not fully familiar with these concepts and with the advanced settings of BWA in general ... (NGS analysis is something new to me...). I have seen SAM outputs with results such as:
MQscore : 37 CIGAR: 57M ; MD:Z:0A0C55., meaning the first two nt are mismatches if I am correct.
Can I consider the starting coordinate I get in the BED (generated from the SAM file) as the first base of my read ,or is it possible to have it wrong because of tolerated mismatches at the 5' end ? If that is the case, up to how many consecutive mismatches at the 5' end would be tolerated by BWA with default settings ?
Thanks a lot for your help,
Edouard DD.