Question: sam to bam - Galaxy version 17.01
1
gravatar for bryan.hepworth
19 months ago by
bryan.hepworth30 wrote:

Hi All

First of all I am not a bioinformatician, I'm looking at someone else's test workflow. I have successfully run it on usegalaxy to completion however on my local instance that I've set up I've bumped into an error at the sam to bam conversion process.

Comparing the workflows side by side they look identical, and each steps output goes to the next steps input.

It has a git installation of galaxy on to centos 7.3 with postgresql as the database. I've used the data managers to get the fast reference genomes and built the bwa, bwa-mem, etc indexes from them. I've installed all the tools and all the dependencies appear to be fulfilled in manage installed tools.

These are the data managers installed: -

data_manager_bwa_index_builder - blankenberg repo

data_manager_bowtie2_index_builder

data_manager_bwa-mem_index_builder

data_manager_fetch_genome_dbtags_all_fasta

data_manager_picard_index_builder

data_manager_sam_fasta_index_builder

Workflow has steps to take fasta files through FASTQ Groomer, FASTQ Summary Statistics, Map with BWA for illumina, SAM to BAM taking the data produced from each previous step.

I can see the sam file the previous step created as input with Map with BWA for Illumina.

This is what the SAM-to-BAM information comes back as: -

ln -s /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa input.fa && ln -s /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa.fai input.fa.fai && samtools view -b -@ ${GALAXY_SLOTS:-1} -t input.fa.fai "/galaxy/database/files/000/dataset_59.dat" | samtools sort -O bam -@ ${GALAXY_SLOTS:-1} -o "/galaxy/database/files/000/dataset_60.dat" -T temp

Traceback (most recent call last): File "/galaxy/lib/galaxy/jobs/runners/local.py", line 130, in queue_job job_wrapper.finish( stdout, stderr, exit_code ) File "/galaxy/lib/galaxy/jobs/__init__.py", line 1354, in finish dataset.datatype.set_meta( dataset, overwrite=False ) File "/galaxy/lib/galaxy/datatypes/binary.py", line 391, in set_meta exit_code = subprocess.call( args=command, stderr=open( stderr_name, 'wb' ) ) File "/usr/lib64/python2.7/subprocess.py", line 524, in call return Popen(popenargs, *kwargs).wait() File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

I'd be grateful if someone could tell me what I'm missing.

Thanks

Bryan

job-error local samtools galaxy • 1.2k views
ADD COMMENTlink modified 19 months ago by Jennifer Hillman Jackson25k • written 19 months ago by bryan.hepworth30
1

Hi Devon

I just did cat ../../seq/hg19.fa from the symlink directory and got output as expected.

Not sure what to look for now.

Thank you.

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth10
0
gravatar for Devon Ryan
19 months ago by
Devon Ryan1.9k
Germany
Devon Ryan1.9k wrote:

Do the following files exist?

  • /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa
  • /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa.fai
  • /galaxy/database/files/000/dataset_59.dat

I expect the last one exists, but presumably the fasta and fasta index files don't and you need to update all_fasta.loc.

ADD COMMENTlink written 19 months ago by Devon Ryan1.9k

Hi Devon

I've had a quick look and they all exist although the first one is a symlink to ../../seq/hg19.fa

I haven't changed any of the .loc locations from a default install and test from sh run.sh the galaxy.ini has only been changed to allow admin users and to point to the postgresql database

Thanks for looking

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30

Can you then ensure that ../../seq/hg19.fa exists? The error is for a file not existing, so at some point something is missing.

ADD REPLYlink written 19 months ago by Devon Ryan1.9k

Hi Devon

I did cat ../..seq/hg19.fa in the directory with the symlink in and got the output from the file as expected., not sure where else to look for clues.

Thank you

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30

Hmm, I'm at a loss at this point then. Something is missing somewhere...but I don't know where to look. I do see stuff about metadata in the error message (btw, if you put 4 spaces at the beginning of lines the error messages will get formatted correctly) so my only guess is that some metadata file is missing. I've never really understood exactly how those work, though.

If you still don't get a reply in another day or so then post this to the galaxy-dev email list (make sure to reference this post so they know you did your due diligence). That'll get more eye balls.

ADD REPLYlink written 19 months ago by Devon Ryan1.9k

just double checking: is samtools installed?
sometimes 'samtools' need to be even in the path.

Regards, Hans-Rudolf

ADD REPLYlink written 19 months ago by Hotz, Hans-Rudolf1.8k

Hi Hans-Rudolf

Yes package_samtools_1_2 is installed it got pulled in as a dependency somewhere I believe, but can see it in the manage installed tools menu, ticked and green.

Thanks Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30

...but is it in your path? Have a look at this similar (?) case:

https://biostar.usegalaxy.org/p/19938/

ADD REPLYlink written 19 months ago by Hotz, Hans-Rudolf1.8k

Hi Hans

Interesting one, I've only just spied this reply. Earlier this afternoon I ran the commands from the command line directly: -

[root@chiti ~]# . /galaxy/database/dependencies/samtools/1.2/devteam/sam_to_bam/881e16ad05c6/env.sh
[root@chiti ~]# samtools --version 
samtools 1.2 Using htslib 1.2.1

[root@chiti ~]# ln -s /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa input3.fa && ln -s /galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa.fai input3.fa.fai && samtools view -b -@ -1 input3.fa.fai "/galaxy/database/files/000/dataset_92.dat" | samtools sort -O bam -@ -l -o "/galaxy/database/files/000/dataset_93.dat" -T temp
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.
ADD REPLYlink modified 19 months ago • written 19 months ago by bryan.hepworth10

BAM data (.bam) uploaded to Galaxy automatically has an index (.bam.bai) created by default. The BAM dataset is also sorted. However, Samtools and Picard, in particular, can be picky about that sort order/annotation, requiring a resort using Picard or Samtools from the tool panel.

Other tools that produce BAM datasets may output sorted BAMs or not, but a resort is sometimes necessary (for example, always resort Tophat output, even though it is annotated as "sorted"). Again, these tool and some others are very strict about the input formats. When in doubt or there is an error, sort. Errors from unsorted inputs can widely vary - some report a cluster error, some report missing inputs, some report the job exceeded resources (memory or walltime), and others.

Two troubleshooting tests related to these factors:

  • Have you tried sorting the BAM inputs before using the tool? If not, try that. Sort help: https://galaxyproject.org/support/sort-your-inputs/

  • Is the metadata for the dataset intact? Double check by clicking on the pencil icon for the input BAMs and on the Edit Attributes page use the button to "detect metadata". This can help for certain cases and is worth doing - it won't be a problem if no changes are made (this does not produce a new dataset that takes up quota and the like).

The option to input unsorted data and have the tool do the sorting is problematic on certain BAM datasets. I suggest not using that option, or if the job fails when used, go back to sorting as a distinct data preparation step.

It would be great to us know either of these troubleshooting tests resolves the current problem.

Cheers, Jen, Galaxy team

ADD REPLYlink modified 19 months ago • written 19 months ago by Jennifer Hillman Jackson25k
1

Hi Jen

It might be I'm missing a trick here with not being a bioinformatician and not knowing some of the intricacies.

I have two fastq files that have been generated from an Illumina run. I've brought these in to the Galaxy instance with the Get Data tool. The workflow I was given just has steps coming out as sam files. I've inserted a SortSam that has output as either by the looks of it, and with you mentioning the bam and bai being indexed on upload is there a step in between the illumina import and running it through the workflow I should be doing? I know this will sound like a fairly basic question..

I can see the data each step has produced along the way up until the failure point by clicking the pencil icon.

Here's a with the inserted SortSam which wasn't there before, this step runs if it's set to Queryname, the SAM-to-BAM still fails.

Thanks

Bryan

ADD REPLYlink modified 19 months ago • written 19 months ago by bryan.hepworth30

Bryan - Sorry, I didn't clearly look at exact Samtools' tool run and the content of the workflow. The problem is almost certainly technical, and not the workflow or tools used/ordering.

First, I am wondering if the default paths in your loc files actually match where the data/indexes are located (that is a configurable option).** Could you please send back the results of a grep using "hg19" against all_fasta.loc, fasta_indexes.loc, sam_fa_indexes and maybe even picard_indexes for comparison. Duplicated entries are possible and could cause problems. As can inconsistent paths. Fix those and test the workflow again (after restarting Galaxy - needed if the paths are adjusted for any loc files).

If the problem persists after that, these two quick tests might be informative. Run this directly in the History, not with a Workflow.

Test1

  1. Re-detect the metadata on one of the previously generated SAM BWA
    datasets (click on pencil icon, there is a button for the function).
  2. Run "SAMtools SAM-to-BAM" on that dataset

Does the SAMTools tool still error the same way? It looks as if Picard tools are working if SortSam is. What happens when running other SAMTools' tools besides SAM-to-BAM? There is another sorting tool in that tool package that would be simple to test.

Test2

  1. Do step 1 above again, or just use the same SAM again
  2. Sort the SAM file with SAMTools "Sort BAM dataset"
  3. Optionally try other SAMTools' tools
  4. What if those tools are run against a BWA SAM output that hasn't had the metadata reassigned?

Do these other SAMTools' tools still error the same way?

If there was a usage data mismatch, the error would be different, so I am assuming that the same exact reference genome (hg19) was used to both map against with BWA and was input into the SAM-to-BAM tool.

ADD REPLYlink modified 19 months ago • written 19 months ago by Jennifer Hillman Jackson25k

Hi Jen

First one is easily answered, I haven't adjusted any of the .loc files from a stock github install. All the index files have been built with the datamanager tools and show as: -

Data Manager: all_fasta
/galaxy/tool-data/hg19/seq/hg19.fa
/galaxy/tool-data/hg18/seq/hg18.fa
/galaxy/tool-data/hg17/seq/hg17.fa
/galaxy/tool-data/hg16/seq/hg16.fa
/galaxy/tool-data/hg15/seq/hg15.fa

Data Manager: bowtie2_indexes
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19
/galaxy/tool-data/hg18/bowtie2_index/hg18/hg18
/galaxy/tool-data/hg17/bowtie2_index/hg17/hg17
/galaxy/tool-data/hg16/bowtie2_index/hg16/hg16
/galaxy/tool-data/hg15/bowtie2_index/hg15/hg15

Data Manager: bwa_indexes
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa
/galaxy/tool-data/hg18/bwa_index/hg18/hg18.fa
/galaxy/tool-data/hg17/bwa_index/hg17/hg17.fa
/galaxy/tool-data/hg16/bwa_index/hg16/hg16.fa
/galaxy/tool-data/hg15/bwa_index/hg15/hg15.fa

Data Manager: bwa_mem_indexes
/galaxy/tool-data/hg19/bwa_mem_index/hg19/hg19.fa
/galaxy/tool-data/hg18/bwa_mem_index/hg18/hg18.fa
/galaxy/tool-data/hg17/bwa_mem_index/hg17/hg17.fa
/galaxy/tool-data/hg16/bwa_mem_index/hg16/hg16.fa
/galaxy/tool-data/hg15/bwa_mem_index/hg15/hg15.fa

Data Manager: fasta_indexes
/galaxy/tool-data/hg19/sam_indexes/hg19/hg19.fa
/galaxy/tool-data/hg18/sam_indexes/hg18/hg18.fa
/galaxy/tool-data/hg17/sam_indexes/hg17/hg17.fa
/galaxy/tool-data/hg16/sam_indexes/hg16/hg16.fa
/galaxy/tool-data/hg15/sam_indexes/hg15/hg15.fa

Data Manager: picard_indexes
/galaxy/tool-data/hg19/picard_index/hg19/hg19.fa
/galaxy/tool-data/hg18/picard_index/hg18/hg18.fa
/galaxy/tool-data/hg17/picard_index/hg17/hg17.fa
/galaxy/tool-data/hg16/picard_index/hg16/hg16.fa
/galaxy/tool-data/hg15/picard_index/hg15/hg15.fa

Data Manager: tophat2_indexes
hg19    hg19    hg19    /galaxy/tool-data/hg19/bowtie2_index/hg19/hg19
hg18    hg18    hg18    /galaxy/tool-data/hg18/bowtie2_index/hg18/hg18
hg17    hg17    hg17    /galaxy/tool-data/hg17/bowtie2_index/hg17/hg17
hg16    hg16    hg16    /galaxy/tool-data/hg16/bowtie2_index/hg16/hg16
hg15    hg15    hg15    /galaxy/tool-data/hg15/bowtie2_index/hg15/hg15

I've done a quick locate hg19 and that came up with these entries: -

[root@chiti ~]# locate hg19
/galaxy/database/dependencies/_conda/envs/__bowtie2@2.3.0/bin/scripts/make_hg19.sh
/galaxy/database/dependencies/_conda/envs/__bowtie@1.2.0/bin/scripts/make_hg19.sh
/galaxy/database/dependencies/_conda/envs/mulled-v1-d95627189237fac550a9b53956b040c455a7f3682aea67e941022bbc115bc78a/bin/scripts/make_hg19.sh
/galaxy/database/dependencies/_conda/pkgs/bowtie-1.2.0-py35_0/bin/scripts/make_hg19.sh
/galaxy/database/dependencies/_conda/pkgs/bowtie2-2.3.0-py35_1/bin/scripts/make_hg19.sh
/galaxy/tool-data/hg19
/galaxy/tool-data/hg19/bowtie2_index
/galaxy/tool-data/hg19/bwa_index
/galaxy/tool-data/hg19/bwa_mem_index
/galaxy/tool-data/hg19/picard_index
/galaxy/tool-data/hg19/sam_indexes
/galaxy/tool-data/hg19/seq
/galaxy/tool-data/hg19/bowtie2_index/hg19
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.1.bt2
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.2.bt2
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.3.bt2
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.4.bt2
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.fa
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.rev.1.bt2
/galaxy/tool-data/hg19/bowtie2_index/hg19/hg19.rev.2.bt2
/galaxy/tool-data/hg19/bwa_index/hg19
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.amb
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.ann
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.bwt
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.pac
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.rbwt
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.rpac
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.rsa
/galaxy/tool-data/hg19/bwa_index/hg19/hg19.fa.sa
/galaxy/tool-data/hg19/bwa_mem_index/hg19
/galaxy/tool-data/hg19/bwa_mem_index/hg19/hg19.fa
/galaxy/tool-data/hg19/bwa_mem_index/hg19/hg19.fa.amb
/galaxy/tool-data/hg19/bwa_mem_index/hg19/hg19.fa.ann

...

The list goes on, but I've run out of characters for this post. From what you are saying these entries need to be in the /galaxy/tool-data/all_fasta.loc as per the commented out examples?

Thanks Bryan

ADD REPLYlink modified 19 months ago • written 19 months ago by bryan.hepworth30

Jen

I've spied the wiki documentation that talks about longterm galaxy data setup. I'll have a read through and implement what it say's. I think that will be where my problem lies. https://galaxyproject.org/admin/data-preparation/ If I run in to any problems I'll ask again, but will also let you know if this sorts it out and document it at the end for anyone else to follow.

Thanks for your help Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth10

Hi Jen

Following on from my initial post I thought I'd write and ask for a best practice installation for a local installation. By trying to follow examples I've been dotting around pages that aren't in the wiki any more, suggesting going off elsewhere to read something that was originally there.

I've tried this several times over same result, so if someone could say whether this is correct or not I'd be grateful.

Fresh installation of Centos 7.3 86_64, Postgresql, epel repo installed, yum update, galaxy user.

git clone -b release_17.01 https://github.com/galaxyproject/galaxy.git

cd into /galaxy run sh run.sh once with everything as is to get the required tools installed and happy ctrl -c to stop that instance

become an admin user by changing the galaxy.ini.sample and putting your email details in and saving as galaxy.ini

sh run.sh

register as admin

At what point do I start adding data managers? do I need to alter any of the example files at this point or do I need to start altering the galaxy.ini file now to reflect how I'd like it to be and data manager files will be in their permanent home? It's at this point where there seems to be some unclear conflicting information.

Thanks Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30
1

Hi All

After several tries at plain vanilla installation and failure at the same sam to bam step I resorted to Google and a few other suggestions, one here too - is it in your $PATH

When I'd tried a which samtools previously it had come back with an answer, so I'd thought that wasn't the issue. however Googling the error message brought back some other peoples thoughts, so I grabbed samtools from github along with some other prerequisites, built it and placed it in /usr/local/bin restarted galaxy, re-ran the sam to bam instance that was failing - success that step has completed, but not in an ideal way. I'd really like galaxy to be a self-contained instance without resorting to building packages outside of it.

The clue that gave this away was someone saying samtools was hardwired in one of the steps, and this had been a problem in the past.

I can list out all the packages I installed and in what order if that is any help for tracking down where it's going wrong within galaxy itself. I don't mind going through this at a deeper level either if it helps anyone else along the way too.

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30

Hi Jen

Thanks for the reply - one for me to try in the morning. I can show you the current workflow too if that would help.

No problem in letting you know, you'll probably hear the cheering from Tyneside :-)

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth10

Hi Devon

I'm just looking through the output from the sh run.sh to see if I can spot other things happening before the failure, then I'll post in the galaxy-dev

Thanks for taking the trouble to respond, much appreciated.

Bryan

ADD REPLYlink written 19 months ago by bryan.hepworth30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour