2.0 years ago by
United Kingdom
Hi Dannon
This is the process I followed:
1) I generated a SAM file after mapping two different reference sequences, named as RS16000389_V3_Ref_1 and Ref_2, simultaneously to my FASTQ files and removed unmapped reads and secundary/suplementary alignments.
2) I used the tool "Filter data on any column using simple expressions" to generate two different SAM files with the reads mapping to each reference sequence.
3) I compared these two SAM files to find the unique reads in each file (since I am not interested in the reads mapping both reference sequences).
4) I convert SAM to BAM, but only it worked for the Reference 1.
I have checked the Reference 1 SAM file and I have found these expressions in the column OPT for all the reads:
NM:i:0 MD:Z:151 AS:i:151 XS:i:0
NM:i:0 MD:Z:151 AS:i:151 XS:i:113
However, I have found the following expressions very often in the Reference 2 SAM file:
NM:i:1 MD:Z:1A90 AS:i:90 XS:i:82 XA:Z:RS16000389_V3_Ref_1,+531,92M,2;
NM:i:1 MD:Z:1A90 AS:i:90 XS:i:82 XA:Z:RS16000389_V3_Ref_1,-531,92M,2;
and sometimes these one:
NM:i:0 MD:Z:66 AS:i:66 XS:i:37
NM:i:0 MD:Z:66 AS:i:66 XS:i:37
I guess "XA:Z:RS16000389_V3_Ref_1,+531,92M,2;" means that that read also matches 92 nucleotides to the reference 1, but what do the other parameters mean?
Is this the reason I can not generate a BAM file for the reference 2?
Is there any way of filtering unique reads for each reference sequences?
Thank you for your help
Juan