Question: Filter SAM or BAM: "unknown reference name" when trying to select multiple chromossomes
1
gravatar for eurioste
10 months ago by
eurioste40
eurioste40 wrote:

Hello, I aligned my reads to reference genome hg19 using BWA. I had to align to the full genome version because of errors with the hg19 canonical version. Now I want to filter out from my BAM all the reads aligned to non canonical chromosomes, unplaced contigs, and mitochondrial chromosome.

I wish to use "Filter SAM or BAM bam" tool, but I'm having problem specifying the multiple chromosomes to be selected, using the option "Select regions (only used when the input is in BAM format)"

The instructions for the option are given as follow:

If regions are specified, only alignments overlapping the specified regions will be output. An alignment may be given multiple times if it is overlapping several regions. A region can be presented, for example, in the following format:

chr2 (the whole chr2) chr2:1000000 (region starting from 1,000,000bp) chr2:1,000,000-2,000,000 (region between 1,000,000 and 2,000,000bp including the end points).

Note: The coordinate is 1-based.

Multiple regions may be specified, separated by a space character:

chr2:1000000-2000000 chr2:1,000,000-2,000,000 chrX

if I fill the field with:

chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY

I get a small 1.8k file with just a header and the message:

[main_samview] region "chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY" specifies an unknown reference name. Continue anyway.

I googled it a little and tried to fill the option with:

"chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr20" "chr21" "chr22" "chrX" "chrY"

chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY

1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   X   Y

chr1: chr2: chr3: chr4: chr5: chr6: chr7: chr8: chr9: chr10: chr11: chr12: chr13: chr14: chr15: chr16: chr17: chr18: chr19: chr20: chr21: chr22: chrX: chrY:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: X: Y:

@_@

Is there any way I can work around this?

ADD COMMENTlink modified 10 months ago by Jennifer Hillman Jackson25k • written 10 months ago by eurioste40
1
gravatar for Jennifer Hillman Jackson
10 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This is odd - separating regions by a space is how to use that input field. I'll do some testing to see if I can replicate this. If it is a bug, we'll want to fix it. I'll post back about that. A tool fix would not be immediately available on the public server https://usegalaxy.org.

Meanwhile, this tool and a few others (example: NGS: Peak Calling > BAM filter) can filter BAM datasets using a BED dataset that contains the chromosomes names, starts, and ends. You can extract and format the results from the chromInfo table for the target genome (hg19) to create the proper BED input to filter with. How-to: Go to Get Data > UCSC Main, select the hg19 genome, then "All Tables". The table chromInfo will be in the list. Extract the entire file. Once in Galaxy, the Cut tool can be used to isolate just the columns you need to prepare the BED input. Assign the datatype/fields by clicking on the pencil icon to reach the Edit Attributes forms.

Hope that helps and sorry you had problems with the other function - and glad you reported it!

Thanks, Jen, Galaxy team

ADD COMMENTlink written 10 months ago by Jennifer Hillman Jackson25k

Update: I can confirm the problem with this filter function and am creating a ticket to fix it. I'll post it back here once complete.

ADD REPLYlink written 10 months ago by Jennifer Hillman Jackson25k

Here is the link to the ticket: https://github.com/galaxyproject/tools-devteam/issues/501

I credited you with finding the problem. There were some recent changes to this function, and while it may have worked during original testing, it isn't now. How long a fix will take depends on a few factors.

Please follow the ticket above updates on the fix and here for the update at https://usegalaxy.org: https://github.com/galaxyproject/usegalaxy-playbook/issues/77

Thanks again for reporting the problem!

ADD REPLYlink modified 10 months ago • written 10 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour