Question: error with sam to bam conversion
0
gravatar for Juan Ledesma
2.0 years ago by
United Kingdom
Juan Ledesma0 wrote:

Hi

I am trying to convert my SAM file to BAM file but I always get this error message

Error extracting alignments from (dataset_487178.dat), [sam_header_read2] 2 sequences loaded. Parse error at line 3: missing colon in auxiliary data

Does any one know what it means?

Thanks in advance

Juan

sam bam • 829 views
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Juan Ledesma0
0
gravatar for Dannon Baker
2.0 years ago by
Dannon Baker3.7k
United States
Dannon Baker3.7k wrote:

Hi Juan,

The additional header data in your sam file seems to be malformed. There's more info in the spec, here: https://samtools.github.io/hts-specs/SAMv1.pdf, but in short, you may have a line like:

@SQ SN:ref LN:45

See the colons, denoting attributes? That's what it's talking about that your file is missing. What generated this sam file? If you'd like to share the relevant lines, we might be able to give more advice about what to do next.

ADD COMMENTlink written 2.0 years ago by Dannon Baker3.7k
0
gravatar for Juan Ledesma
2.0 years ago by
United Kingdom
Juan Ledesma0 wrote:

Hi Dannon

This is the process I followed: 1) I generated a SAM file after mapping two different reference sequences, named as RS16000389_V3_Ref_1 and Ref_2, simultaneously to my FASTQ files and removed unmapped reads and secundary/suplementary alignments. 2) I used the tool "Filter data on any column using simple expressions" to generate two different SAM files with the reads mapping to each reference sequence. 3) I compared these two SAM files to find the unique reads in each file (since I am not interested in the reads mapping both reference sequences). 4) I convert SAM to BAM, but only it worked for the Reference 1.

I have checked the Reference 1 SAM file and I have found these expressions in the column OPT for all the reads:

NM:i:0 MD:Z:151 AS:i:151 XS:i:0 NM:i:0 MD:Z:151 AS:i:151 XS:i:113

However, I have found the following expressions very often in the Reference 2 SAM file:

NM:i:1 MD:Z:1A90 AS:i:90 XS:i:82 XA:Z:RS16000389_V3_Ref_1,+531,92M,2; NM:i:1 MD:Z:1A90 AS:i:90 XS:i:82 XA:Z:RS16000389_V3_Ref_1,-531,92M,2;

and sometimes these one: NM:i:0 MD:Z:66 AS:i:66 XS:i:37 NM:i:0 MD:Z:66 AS:i:66 XS:i:37

I guess "XA:Z:RS16000389_V3_Ref_1,+531,92M,2;" means that that read also matches 92 nucleotides to the reference 1, but what do the other parameters mean?

Is this the reason I can not generate a BAM file for the reference 2?

Is there any way of filtering unique reads for each reference sequences?

Thank you for your help

Juan

ADD COMMENTlink written 2.0 years ago by Juan Ledesma0
2

Hi Juan, Do you retain the SAM header after you use the Filter tool?

It seems like you are getting reads mapping partially to both reference sequences that may be an issue. You could try aligning your data in two separate runs (one for each reference) and comparing the outputs based on the read ID with the 'Compare Two Datasets' tool work to get uniquely mapping reads.

ADD REPLYlink written 2.0 years ago by Mo Heydarian830

Hi Mo I have tried the tool that you have suggested and it seems that I get unique mapping reads. Thank you However, i think it will be very difficult to use this approach to analyse viral quasispecies or close related viral populations in the same sample using Galaxy.

ADD REPLYlink written 2.0 years ago by Juan Ledesma0

Hi Juan, That is great to hear.

Feel free to expand on how performing your analysis will be difficult within Galaxy. We value the feedback of our users.

If you are concerned with having to manually launch each alignment job for an individual reference, don't be. If you have ten reference sequences and one FASTQ file you would like to have aligned, you can launch the one FASTQ file and all ten reference sequences from one tool form by entering your reference sequences in batch mode (the middle button to the left of the input box). This will launch one alignment job per reference provided. Once the jobs have run you can capture this (potentially complex) array of alignment jobs in a workflow by using the 'Extract workflow' feature in the history menu.

You could use a combination of tools from here to resolve the reads that align uniquely to each reference provided. Here is an example workflow (use this link to import the workflow to your Galaxy):https://raw.githubusercontent.com/MoHeydarian/Workshed/master/Galaxy-Workflow-2016.11.17_Alignment_and_resolution_of_uniquely_mapped_reads_from_viral_populations.ga

Hope this is helpful!

Cheers,

Mo Heydarian, Galaxy Team

ADD REPLYlink written 2.0 years ago by Mo Heydarian830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour