Question

Question: Archived cloud- Cloudman

1

6 months ago by

c2c12 • 10

NYC

c2c12 • 10 wrote:

Dear all, I'd be grateful for help with the following issue. I recently did a big RNAseq analysis using Cloudman Galaxy. My plan was to shut down the nodes to avoid generating the costs of processing and keep an access to the data. I don't know if I did it properly. After terminating the nodes (without deleting the cluster) I had a popup The cluster is terminating please wait for all services to stop and all of the nodes to be removed. Then if not terminate the master instance from the console.

I shut down the nodes but the device was still active for a whole day so I archived an instance and now I'd like to access it just to download the data. How can I do this?

the log file was as follows 20:05:25 - Initializing 'Galaxy' cluster type with storage type 'volume'. Please wait... 20:05:30 - Completed the initial cluster startup process. Configuring a predefined cluster of type Galaxy. 20:05:37 - Nginx service prerequisites OK; starting the service. 20:05:37 - Migration service prerequisites OK; starting the service. 20:05:37 - Supervisor service prerequisites OK; starting the service. 20:05:38 - Adding volume vol-05c9c0dabd91aa494 (galaxy FS)... 20:05:55 - Extracting archive url https://s3.amazonaws.com/cloudman-gvl-430/filesystems/gvl-galaxyfs-4.3.0.tar.gz to /mnt/galaxy. This could take a while... 20:08:45 - MD5 checksum for archive https://s3.amazonaws.com/cloudman-gvl-430/filesystems/gvl-galaxyfs-4.3.0.tar.gz is OK: 7c252fd983b52fcc358c62895b22eed4==7c252fd983b52fcc358c62895b22eed4 20:08:46 - Slurmctld service prerequisites OK; starting the service. 20:08:53 - NodeJSProxy service prerequisites OK; starting the service. 20:08:59 - Postgres service prerequisites OK; starting the service. 20:09:00 - ProFTPd service prerequisites OK; starting the service. 20:09:01 - Slurmd service prerequisites OK; starting the service. 20:09:03 - Galaxy service prerequisites OK; starting the service. 20:09:37 - Galaxy service state changed from 'Starting' to 'Running' 20:09:37 - GalaxyReports service prerequisites OK; starting the service. 20:09:52 - Found local directory /opt/gvl/scripts/triggers'; executing all scripts therein (note that this may take a while) 20:09:52 - Done running PSS scripts in /opt/gvl/scripts/triggers 20:09:52 - Found local directory /mnt/galaxy/gvl/poststart.d'; executing all scripts therein (note that this may take a while) 20:09:52 - Done running PSS scripts in /mnt/galaxy/gvl/poststart.d 20:09:53 - All cluster services started; the cluster is ready for use. 20:10:14 - Initiating galaxy FS file system expansion. 20:10:15 - Stopping NodeJS Proxy service 20:10:15 - Removing 'Postgres' service 20:10:15 - Shutting down ProFTPd service 20:10:15 - Removing 'GalaxyReports' service 20:10:15 - Shutting down Galaxy Reports... 20:10:16 - Removing 'Galaxy' service 20:10:16 - Shutting down Galaxy... 20:10:27 - Stopping PostgreSQL from /mnt/galaxy/db on port 5950... 20:10:56 - Created snapshot snap-05993268ab1fe78a2 from volume vol-05c9c0dabd91aa494 (galaxy FS). Check the snapshot for status. 20:22:38 - Adding volume vol-022ed48148db9076a (galaxy FS)... 20:22:54 - Mount point /mnt/galaxy already exists and is not empty!? (['tmp', 'home']) Will attempt to mount volume vol-022ed48148db9076a 20:22:55 - Successfully grew file system galaxy FS 20:23:01 - NodeJSProxy service prerequisites OK; starting the service. 20:23:07 - Postgres service prerequisites OK; starting the service. 20:23:12 - ProFTPd service prerequisites OK; starting the service. 20:23:12 - Galaxy service prerequisites OK; starting the service. 20:24:17 - Galaxy daemon not running. 20:24:17 - Galaxy service state changed from 'Starting' to 'Unstarted' 20:24:18 - Galaxy service prerequisites OK; starting the service. 20:24:33 - Galaxy service state changed from 'Starting' to 'Running' 20:24:34 - GalaxyReports service prerequisites OK; starting the service. 20:30:29 - The master instance is set to not execute jobs. To manually change this, use the CloudMan Admin panel. 20:30:29 - Adding 2 on-demand instance(s) 20:32:08 - Instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' reported alive 20:32:08 - Instance 'i-05804fe774f2f452a; 18.232.96.54; w2' reported alive 20:32:35 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'scontrol: error: slurm_receive_msg: Zero Bytes were transmitted or received slurm_reconfigure error: Zero Bytes were transmitted or received ' and stdout: '' 20:32:35 - Could not get a handle on job manager service to add node 'i-05804fe774f2f452a; 18.232.96.54; w2' 20:32:35 - Waiting on worker instance 'i-05804fe774f2f452a; 18.232.96.54; w2' to configure itself. 20:32:45 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'slurm_reconfigure error: Unable to contact slurm controller (connect failure) ' and stdout: '' 20:32:45 - Could not get a handle on job manager service to add node 'i-03a2adfbf62787d60; 52.23.250.210; w1' 20:32:45 - Waiting on worker instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' to configure itself. 20:32:45 - Slurm error: slurmctld not running; setting service state to Error 20:32:50 - Instance 'i-05804fe774f2f452a; 18.232.96.54; w2' ready 20:32:51 - Instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' ready 16:17:16 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'scontrol: error: slurm_receive_msg: Zero Bytes were transmitted or received slurm_reconfigure error: Zero Bytes were transmitted or received ' and stdout: '' 16:17:16 - Terminating instance i-03a2adfbf62787d60 16:17:16 - Initiated requested termination of instance. Terminating 'i-03a2adfbf62787d60'. 16:17:20 - Instance 'i-03a2adfbf62787d60' removed from the internal instance list. 16:17:23 - Slurm error: slurmctld not running; setting service state to Error 16:17:23 - ---> PROBLEM, running command '/usr/bin/scontrol update NodeName=w2 Reason="CloudMan-disabled" State=DOWN' returned code '1', the following stderr: 'slurm_update error: Invalid node name specified ' and stdout: '' 16:17:24 - Terminating instance i-05804fe774f2f452a 16:17:24 - Initiated requested termination of instance. Terminating 'i-05804fe774f2f452a'. 16:17:24 - Initiated requested termination of instances. Terminating '3' instances. 16:17:27 - Instance 'i-05804fe774f2f452a' removed from the internal instance list. 16:17:27 - The master instance is set to execute jobs. To manually change this, use the CloudMan Admin panel. 16:17:55 - Stopping all '0' worker instance(s) 16:17:55 - No idle instances found 16:17:55 - Did not terminate any instances. 16:17:55 - Stopping NodeJS Proxy service 16:17:55 - Removing 'Postgres' service 16:17:55 - Shutting down ProFTPd service 16:17:55 - Removing 'GalaxyReports' service 16:17:55 - Shutting down Galaxy Reports... 16:17:56 - Removing 'Galaxy' service 16:17:56 - Shutting down Galaxy... 16:18:00 - Removing 'Galaxy' service 16:18:00 - Shutting down Galaxy... 16:18:08 - Stopping PostgreSQL from /mnt/galaxy/db on port 5950... 16:18:09 - Removing Slurmd service 16:18:09 - Stopping Nginx service 16:18:09 - Stopping Supervisor service 16:18:09 - Removing Slurmctld service 16:18:12 - ---> PROBLEM, running command '/sbin/start-stop-daemon --retry TERM/5/KILL/10 --stop --exec /usr/sbin/slurmctld' returned code '1', the following stderr: '' and stdout: 'No /usr/sbin/slurmctld found running; none killed. ' 16:18:13 - Initiating removal of 'galaxyIndices FS' data service with: volumes [], buckets [], transient storage [], nfs server None and gluster fs None 16:18:13 - Initiating removal of 'transient_nfs FS' data service with: volumes [], buckets [], transient storage [Transient storage @ /mnt/transient_nfs], nfs server None and gluster fs None 16:18:13 - Initiating removal of 'galaxy FS' data service with: volumes [vol-022ed48148db9076a (galaxy FS)], buckets [], transient storage [], nfs server None and gluster fs None 16:18:19 - Error removing unmounted path /mnt/galaxy: [Errno 39] Directory not empty: '/mnt/galaxy' 16:18:23 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:26 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:29 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:32 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:35 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:38 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:41 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:44 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:47 - Initiating removal of 'galaxy FS' data service with: volumes [vol-022ed48148db9076a (galaxy FS)], buckets [], transient storage [], nfs server None and gluster fs None 16:18:47 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:47 - Could not unmount file system at '/mnt/galaxy'

aws cloudman galaxy • 311 views

ADD COMMENT • link •

modified 6 months ago by Enis Afgan • 690 • written 6 months ago by c2c12 • 10

Similar posts • Search »