Question: Gatk Best Practices With Local Installation Of Galaxy
0
gravatar for Camille Stephan
7.3 years ago by
Camille Stephan30 wrote:
Hello guys, I'm trying to run a pipeline of the best practices for snp and indel discovery as described by the people at Broad and I'm running into troubles with the GATK tools in a local installation of Galaxy. The main problem I have is that merging bam files with the samtools merge tool doesn't keep read group for each sample, causing "Count Covariates" to crash. The pipeline works fine with a single bam file, but I need to realign at least two files at a time. Is there a way to set the read group of a merged bam inside Galaxy? Are there plans to include the "merge" tool from Picard in Galaxy? Is there an easy way for me to do this locally? (Although I would like to run this in the cloud later on when the workflow is ready). Thanks! Camille -- *** Camille Stephan-Otto Attolini, PhD Senior Research Officer, Bioinformatics and Biostatistics unit IRB Barcelona Tel (+34) 93 402 0553
samtools bam • 1.5k views
ADD COMMENTlink modified 7.2 years ago by Jennifer Hillman Jackson25k • written 7.3 years ago by Camille Stephan30
0
gravatar for fubar
7.3 years ago by
fubar1.1k
Australia
fubar1.1k wrote:
Camille, thanks for reporting this - I think you have found a bug. We definitely need to be able to preserve metadata when we merge bams. Thanks for your suggestion of using mergeSamFiles - yes, I think it might be a good fix for this problem - but it will take a little while and won't reach the Main site for a few weeks once it's done. It is possible to write your own wrapper locally if you need it fast. Sorry for the inconvenience and thanks again. On Wed, Aug 3, 2011 at 6:15 PM, Camille Stephan -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
ADD COMMENTlink written 7.3 years ago by fubar1.1k
Hi Ross, thanks for your answer. I found a dirty fix for merging pairs of bam files, had to change a couple of things in my local installation though. - Add group reads to each BAM file separately using Picard's Add or Replace Groups <http: localhost:8080="" tool_runner?tool_id="picard_ARRG"> (with ID=s1 and ID=s2 for each file) - Create the "rg.txt" file containing something like this: @RG ID:s1 SM:s1 LB:s1 PL:Illumina @RG ID:s2 SM:s2 LB:s2 PL:Illumina Modify sam_merge.py to call: "samtools merge -rh path/to/rg.txt %s %s..." It works. The problem is all (pairs of) files will end up with the same IDs and labels, unless the rg.txt file is changed every time. Would it be very difficult to add to the Galaxy wrapper the option of creating rg.txt on the fly and adding the -h option to the samtools call? I'm not familiar with creating wrappers for Galaxy, any suggestion as to where to start? Thanks again, Camille -- *** Camille Stephan-Otto Attolini, PhD Senior Research Officer, Bioinformatics and Biostatistics unit IRB Barcelona Tel (+34) 93 402 0553
ADD REPLYlink written 7.3 years ago by Camille Stephan30
Hi, Camille, I can see this really needs a 'proper' fix - preferably taking advantage of the automated header merge. Preserving the metadata from each bam automatically is safer and less error-prone but you could use the existing "Replace sam/bam header" tool to do the surgery once you have a correct header in SAM format in your history? I'm currently testing changes which replace the current samtools merge code with a call to Picard MergeSamFiles. I'll add a switch to control whether all input headers are merged in case there are situations where it's not wanted. I'll let you know when you can try it out on our test instance and which revision of the galaxy-central repository contains the changes so you can get it working on your local installation. On Wed, Aug 3, 2011 at 11:49 PM, Camille Stephan -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
ADD REPLYlink written 7.3 years ago by fubar1.1k
Hi, Camille. If you can find some time to upload some of your bam files, could you please test the revised bam merge tool on http://test.g2.bx.psu.edu/ and let me know how you go. This won't be on the main site until the next scheduled update in a few weeks. If you need this locally, the changes are in galaxy-central from where anyone can grab them - the key file you need to update is tools/samtools/sam_merge.xml and you'll also need MergeSamFiles.jar from a recent Picard release to be available in your tool-data/shared/jars directory. Hope this helps - thanks for pointing out the bug. -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
ADD REPLYlink written 7.3 years ago by fubar1.1k
0
gravatar for Jennifer Hillman Jackson
7.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Aarti, Larger data can be loaded onto the public Galaxy main instance at http:/usegalaxy.org using FTP as described: 1 - on the "Get Data -> Upload" tool form, lower section 2 - in this wiki example document http://galaxyproject.org/wiki/Learn/Upload%20via%20FTP 3 - in the screencast example "Tool tutorials -> Using FTP" http://galaxyproject.org/wiki/Screencasts More help can be obtained starting from: http://galaxyproject.org/wiki/Learn You are correct, the mailing list is the best place to ask questions. I will forward this question & reply to the galaxy-user mailing list, as others may encounter similar issues and will benefit from the reply. In the future, using a mailing list to asks questions would be appreciated. http://galaxyproject.org/wiki/Support Take care, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD COMMENTlink written 7.2 years ago by Jennifer Hillman Jackson25k
Hi Aarti, I also tried to upload big files (10 Gb) to the Galaxy through FTP, according to the instructions, but my internet is too slow and it didn't succeed (I left the computer open for days, but it always crashed in the middle). Therefore, I want to install Galaxy on a single computer, and needs to know which computer to buy. I didn't help you much, I know. Lilach 2011/9/12 Jennifer Jackson <jen@bx.psu.edu>
ADD REPLYlink written 7.2 years ago by Lilach F190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour