Hello there! I have been trying to use launch a Galaxy CloudMan cluster for the past several days, but I just cannot get it to work correctly. I have primarily been using the new interface, and can get it to launch the instances no problem. The problem comes in that it won't scale up the worker nodes.
Every time CloudMan launches a new node, it creates the instance (confirmed from EC2 interface), but then doesn't seem to be able to communicate with it. It will then try to reboot the node several times over ~20 minutes, before finally giving up and killing the node. Worse, the main node then has already stopped work and tried to offload some of the compute, and doesn't pick it back up again properly. This becomes a permanent problem for the whole cluster.
I tried launching multiple times with various configurations, but just cannot figure out what I am doing wrong. I also tried launching from the old interface, but that doesn't seem to be communicating with my AWS account at all. I tried different keypairs with full admin access, creating a new subnet, running an older version of Cloudman, etc. Nothing works.
I finally just spun up 1 very large instance to try to churn through the data, but it is very slow going!! I desperately need to be able to scale my nodes to process this data!