Question: Dataflow Using Galaxy On Amazon Ec2
0
gravatar for Luqman Hodgkinson
7.7 years ago by
Luqman Hodgkinson10 wrote:
Dear Galaxy developers, I have a collection of Java classes linked by a custom dataflow architecture. All classes are in a single project but some of these classes call executables written in languages other than Java. I am investigating the possibility of transitioning to Galaxy. Essentially my desires are to link these Java classes in a DAG representing the dataflow and to execute the dataflow in Amazon EC2. The data flowing along the edges are arbitrary custom Java classes. Additionally it is important to cache intermediate results. The data is acquired from a few web services: iRefIndex, IntAct, UniProt, and Gene Ontology. There are complex software dependencies so after setting up the dataflow I would like to save the entire system as an abstract machine image (AMI). How difficult would this transition be, and would it be worth the effort? Sincerely, with best wishes, Luqman Hodgkinson, Ph.D. student, UC-Berkeley
galaxy • 833 views
ADD COMMENTlink modified 7.7 years ago by Enis Afgan680 • written 7.7 years ago by Luqman Hodgkinson10
0
gravatar for Enis Afgan
7.7 years ago by
Enis Afgan680
Enis Afgan680 wrote:
Hi Luqman, Were you planning on using Galaxy CloudMan usegalaxy.org/cloud and integrating your tool (i.e., Java classes) into the Galaxy that it deploys or simply starting a new EC2 instance and setting up a Galaxy instance from scratch? Either way, I would suggest trying the process out on your local system first. Adding new tools to Galaxy is pretty straightforward once you have the tool installed on the system, see https://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial. That will also allow you to test the overall functionality offered by Galaxy in the context of your own tool before trying to deploy the whole thing on the cloud. Once you transition to the cloud though, you would have to repeat the process of installing the tool on the created instance as you have done on the local system followed by copying the tool wrapper created to integrate it with Galaxy. If you started with a clean instance (i.e., not Galaxy CloudMan), after you've installed your tool and integrated it with Galaxy, you could simply use the AWS web console to create an AMI automatically. Then, you would start the newly created AMI, start Galaxy and start processing your data. Note that any data you upload to an instance will be lost once you terminate the instance though, unless you associate an EBS volume with it and have Galaxy store analysis data there (this is easily configured in Galaxy's universe_wsgi.ini file). Alternatively, you could use CloudMan and add your tool to the set of already existing tools as described here: https://bitbucket.org/galaxy/galaxy- central/wiki/Cloud/CustomizeGalaxyCloud If using CloudMan, all of the details regarding data persistance and Galaxy setup are automatically managed for you (excluding the addition of your own tool). Hope this helps, Enis
ADD COMMENTlink written 7.7 years ago by Enis Afgan680
0
gravatar for Enis Afgan
7.7 years ago by
Enis Afgan680
Enis Afgan680 wrote:
Hi Luqman, Take a look a my comments below. your Java classes by simply executing a single command (i.e, invoking a single main class), the same can be achieved through Galaxy. You would create a galaxy tool wrapper that allows Galaxy to invoke that same command and the rest (i.e.., executing the tool) is going to be the same (i.e., the same set of methods will get invoked and you should get the same output). Once Galaxy invokes a tool, any form of data flow within that tool is up to the tool itself so if the classes share data between each other, the same will happen once they are invoked through Galaxy. or a workflow so, once the tool is integrated with Galaxy, keeping up with the details is trivial - that's a big part of why Galaxy exists to begin with. 3. Have you seen the new Conveyor paper? If you have not yet, I would suggest you give Galaxy Main usegalaxy.org) a shot and try to run some jobs and create a few workflows to get an idea of what can be done with tools once they are integrated with Galaxy as well as the type of data and information Galaxy keeps. That should give you a good indication of whether available functionality can be applied in your scenario as well. Enis
ADD COMMENTlink written 7.7 years ago by Enis Afgan680
0
gravatar for Enis Afgan
7.7 years ago by
Enis Afgan680
Enis Afgan680 wrote:
Hi Luqman, I was suggesting to visit Galaxy Main to get an idea of what is possible within Galaxy and get familiar with how tools interact with each other, what kind of input/output the expect/provide so that you could draw parallels and get a better idea of how to integrate your own tools in a local instance of Galaxy. As I mentioned earlier, you will have to create a tool wrapper as described on this page https://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial that describes (in terms Galaxy can understand) how to run the tool. Then, if the tool handles multiple steps within itself (i.e., a single command results in multiple transformations of the data), all of the handling of the dataflow will be handled just as if the tool was invoked from the command line. If one command is all that is needed to run multiple java classes (i.e., the tool is equivalent of a workflow), you may now need to create a Galaxy workflow but run your set of java classes a single Galaxy tool. In principle, if a Galaxy workflow is to be created, the data between individual steps within a workflow are passed as files that Galaxy manages. Hope this helps, Enis
ADD COMMENTlink written 7.7 years ago by Enis Afgan680
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour