Question: Faulty BAM/SAM files after Bowtie2, filtering and sorting
1
gravatar for nash.claire
2.9 years ago by
nash.claire10
Canada
nash.claire10 wrote:

Hi,

I have some ChIP-seq data which I have aligned with Bowtie2 on Galaxy which gave me BAM files. I have since run a couple of filtering modules on Galaxy without any problems (inparticular when I used Filter SAM/BAM module to keep mapped reads I selected include header in the output). However, if I then go on to download these BAM files and use them at the command line, I keep getting error messages saying that the files are missing an EOF marker and may be truncated. So I then went back to Galaxy and converted BAM to SAM which is supposed to keep headers (in case that was the issue) and sort by co-ordinates. In Galaxy the files look fine but again if I download them and try to use them at the command line, I'm getting error messages back saying the files are truncated. When I have looked at forums, people suggest that the BAM/SAM files may be damaged.

So my question is, what is Galaxy doing to the files if they are fine and error free in Galaxy but un-usable out of Galaxy? Is there some sort of Galaxy specific format for the files???

My specific reason for wanting to download and use the files at the command line is because I have a nice command that really effectively gets rid of chrUn and random contig reads. When I have tried this in Galaxy in the past, I've found that I can't get rid of them. If there is an effective Galaxy solution for this, I'll happily stay in Galaxy for my analysis and I guess the above isn't a problem. I'd still like to know what's going on though.

galaxy samtools bam • 1.4k views
ADD COMMENTlink modified 2.9 years ago by Jennifer Hillman Jackson25k • written 2.9 years ago by nash.claire10
0
gravatar for Jennifer Hillman Jackson
2.9 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If a variant of the genome is available for the tools with the added label Canonical, then that is one way to avoid chrUn and random/haplotype alignments from the start. You could also upload a genome with this modification and use it with tools as a custom reference genome.

Filtering BAM files after mapping within Galaxy is also completely possible. Samtools: Filter SAM or BAM (last tool in group) using the form option Select regions. Once configured and executed the first time, save it into a workflow for re-use. This avoids having to type in all the chromosomes on the form each time the tool is used.

For downloading BAMs, I suspect the connection is failing at some point. Line command tools can be better than browser initiated downloads - either wget or curl work. Remember to download both the .bam and .bai for BAM datasets (both are listed in the disc-icon download pull-down menu). Adjust the dataset names to match if needed (and shorten - some downstream applications/tool might complain about the length and/or included hyphens).

Here is how for downloads: https://wiki.galaxyproject.org/Support#Downloading_data

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.9 years ago by Jennifer Hillman Jackson25k

Hi Jen,

As always thanks for your reply. I tried using the Filter SAM/BAM module like you sugeested by typing in all the individual chromosomes in the select regions box like this chr1 chr2 chr3. However I got an error saying that metadata could not be set for the datasets. When I looked a bit closer it seems that the files could not be sorted and therefore a bai file couldn't be produced. I then went back and co-ordinate sorted the bam files with samtools and tried the filter module again but got the same error. I'm not sure what's wrong. I ran Bowtie2 on Galaxy to get my bam files and after that I ran the same filter SAM/BAM module to remove unmapped reads with a low MAPQ score and didn't have any problems at all. I wonder if there is something wrong with the filter module??

ADD REPLYlink written 2.9 years ago by nash.claire10

Hi Claire,

Try entering the chromsomes in this format:

`chr1', 'chr2', 'chr3' .... etc

If that is still problematic, please send in a bug report and I'll take a closer look. Include a link to this post in the comments, please. We can determine if this is a usage error or a true problem with the tool.

Thanks! Jen

ADD REPLYlink written 2.9 years ago by Jennifer Hillman Jackson25k

Hi Jen,

I tried it again and got the same error. I've reported a bug with one of my datasets from the same run that gave me a new error because the other datasets did seem to run and therefore didn't have the report bug symbol with it.

ADD REPLYlink written 2.9 years ago by nash.claire10

I'll be taking a look, thanks!

ADD REPLYlink written 2.9 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour