4.2 years ago by
Australia
Walls of text are fine and here's one in response! In a sense, you asked the wrong question because bam/bai are handled internally and transparently to the user - you don't need multiple outputs but you need to know something about how the bam datatype object transparently indexes itself using a converter as part of setting metadata.
Suggestion: you might save yourself a lot of time and effort by looking at existing tools that seem to do the kinds of things you want and cloning that code. You do not need or want composite files to manage (eg) bam index files because the bam object knows how to index itself. Unfortunately for programmers new to Galaxy internals but fortunately for users, the relationship between bam and bai is managed under the hood.
For example, there are existing tools (sam to bam comes to mind) that write bam files and Galaxy will take care of indexing them for you routinely as part of setting the new dataset's metadata - same with uploading a bam - no point in uploading the bai because new bams will be autoindexed when they appear in a Galaxy history. I'm not sure if you can stop that - it's designed to ensure that the bai matches Galaxy's reference genomes - which is kind of why the whole process is hidden from users and thus mysterious and not easily finagled by programmers as you're learning :)
The Galaxy generated bai file is just another file as far as Galaxy is concerned but its' path is hidden - stored as part of the bam file's metadata. It can be recovered when it's needed to be passed to a tool - it cannot be found easily otherwise. Fubar's htseq tool illustrates how to locate and pass the bai files for a bam - eg if a user has selected $bamf, then passing $bamf.metadata.bam_index to your tool will allow it to access the index - but be warned, MOST tools require the galaxy path (which ends in .dat) to be, ahem, adjusted so it sees something ending in .bai - but that's another story!
Finally but not recommended: You could write your own index to the path at $bamf.metadata.bam_index to replace the autogenerated bai if you really want, but be warned that may not work well if the index was generated with a different version of the reference containing additional contigs not present in the Galaxy reference data..
•
link
modified 4.2 years ago
•
written
4.2 years ago by
fubar ♦ 1.1k