Question: Archived cloud- Cloudman
1
gravatar for c2c12
6 months ago by
c2c1210
NYC
c2c1210 wrote:

Dear all, I'd be grateful for help with the following issue. I recently did a big RNAseq analysis using Cloudman Galaxy. My plan was to shut down the nodes to avoid generating the costs of processing and keep an access to the data. I don't know if I did it properly. After terminating the nodes (without deleting the cluster) I had a popup The cluster is terminating please wait for all services to stop and all of the nodes to be removed. Then if not terminate the master instance from the console.

I shut down the nodes but the device was still active for a whole day so I archived an instance and now I'd like to access it just to download the data. How can I do this?

the log file was as follows 20:05:25 - Initializing 'Galaxy' cluster type with storage type 'volume'. Please wait... 20:05:30 - Completed the initial cluster startup process. Configuring a predefined cluster of type Galaxy. 20:05:37 - Nginx service prerequisites OK; starting the service. 20:05:37 - Migration service prerequisites OK; starting the service. 20:05:37 - Supervisor service prerequisites OK; starting the service. 20:05:38 - Adding volume vol-05c9c0dabd91aa494 (galaxy FS)... 20:05:55 - Extracting archive url https://s3.amazonaws.com/cloudman-gvl-430/filesystems/gvl-galaxyfs-4.3.0.tar.gz to /mnt/galaxy. This could take a while... 20:08:45 - MD5 checksum for archive https://s3.amazonaws.com/cloudman-gvl-430/filesystems/gvl-galaxyfs-4.3.0.tar.gz is OK: 7c252fd983b52fcc358c62895b22eed4==7c252fd983b52fcc358c62895b22eed4 20:08:46 - Slurmctld service prerequisites OK; starting the service. 20:08:53 - NodeJSProxy service prerequisites OK; starting the service. 20:08:59 - Postgres service prerequisites OK; starting the service. 20:09:00 - ProFTPd service prerequisites OK; starting the service. 20:09:01 - Slurmd service prerequisites OK; starting the service. 20:09:03 - Galaxy service prerequisites OK; starting the service. 20:09:37 - Galaxy service state changed from 'Starting' to 'Running' 20:09:37 - GalaxyReports service prerequisites OK; starting the service. 20:09:52 - Found local directory /opt/gvl/scripts/triggers'; executing all scripts therein (note that this may take a while) 20:09:52 - Done running PSS scripts in /opt/gvl/scripts/triggers 20:09:52 - Found local directory /mnt/galaxy/gvl/poststart.d'; executing all scripts therein (note that this may take a while) 20:09:52 - Done running PSS scripts in /mnt/galaxy/gvl/poststart.d 20:09:53 - All cluster services started; the cluster is ready for use. 20:10:14 - Initiating galaxy FS file system expansion. 20:10:15 - Stopping NodeJS Proxy service 20:10:15 - Removing 'Postgres' service 20:10:15 - Shutting down ProFTPd service 20:10:15 - Removing 'GalaxyReports' service 20:10:15 - Shutting down Galaxy Reports... 20:10:16 - Removing 'Galaxy' service 20:10:16 - Shutting down Galaxy... 20:10:27 - Stopping PostgreSQL from /mnt/galaxy/db on port 5950... 20:10:56 - Created snapshot snap-05993268ab1fe78a2 from volume vol-05c9c0dabd91aa494 (galaxy FS). Check the snapshot for status. 20:22:38 - Adding volume vol-022ed48148db9076a (galaxy FS)... 20:22:54 - Mount point /mnt/galaxy already exists and is not empty!? (['tmp', 'home']) Will attempt to mount volume vol-022ed48148db9076a 20:22:55 - Successfully grew file system galaxy FS 20:23:01 - NodeJSProxy service prerequisites OK; starting the service. 20:23:07 - Postgres service prerequisites OK; starting the service. 20:23:12 - ProFTPd service prerequisites OK; starting the service. 20:23:12 - Galaxy service prerequisites OK; starting the service. 20:24:17 - Galaxy daemon not running. 20:24:17 - Galaxy service state changed from 'Starting' to 'Unstarted' 20:24:18 - Galaxy service prerequisites OK; starting the service. 20:24:33 - Galaxy service state changed from 'Starting' to 'Running' 20:24:34 - GalaxyReports service prerequisites OK; starting the service. 20:30:29 - The master instance is set to not execute jobs. To manually change this, use the CloudMan Admin panel. 20:30:29 - Adding 2 on-demand instance(s) 20:32:08 - Instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' reported alive 20:32:08 - Instance 'i-05804fe774f2f452a; 18.232.96.54; w2' reported alive 20:32:35 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'scontrol: error: slurm_receive_msg: Zero Bytes were transmitted or received slurm_reconfigure error: Zero Bytes were transmitted or received ' and stdout: '' 20:32:35 - Could not get a handle on job manager service to add node 'i-05804fe774f2f452a; 18.232.96.54; w2' 20:32:35 - Waiting on worker instance 'i-05804fe774f2f452a; 18.232.96.54; w2' to configure itself. 20:32:45 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'slurm_reconfigure error: Unable to contact slurm controller (connect failure) ' and stdout: '' 20:32:45 - Could not get a handle on job manager service to add node 'i-03a2adfbf62787d60; 52.23.250.210; w1' 20:32:45 - Waiting on worker instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' to configure itself. 20:32:45 - Slurm error: slurmctld not running; setting service state to Error 20:32:50 - Instance 'i-05804fe774f2f452a; 18.232.96.54; w2' ready 20:32:51 - Instance 'i-03a2adfbf62787d60; 52.23.250.210; w1' ready 16:17:16 - ---> PROBLEM, running command '/usr/bin/scontrol reconfigure' returned code '1', the following stderr: 'scontrol: error: slurm_receive_msg: Zero Bytes were transmitted or received slurm_reconfigure error: Zero Bytes were transmitted or received ' and stdout: '' 16:17:16 - Terminating instance i-03a2adfbf62787d60 16:17:16 - Initiated requested termination of instance. Terminating 'i-03a2adfbf62787d60'. 16:17:20 - Instance 'i-03a2adfbf62787d60' removed from the internal instance list. 16:17:23 - Slurm error: slurmctld not running; setting service state to Error 16:17:23 - ---> PROBLEM, running command '/usr/bin/scontrol update NodeName=w2 Reason="CloudMan-disabled" State=DOWN' returned code '1', the following stderr: 'slurm_update error: Invalid node name specified ' and stdout: '' 16:17:24 - Terminating instance i-05804fe774f2f452a 16:17:24 - Initiated requested termination of instance. Terminating 'i-05804fe774f2f452a'. 16:17:24 - Initiated requested termination of instances. Terminating '3' instances. 16:17:27 - Instance 'i-05804fe774f2f452a' removed from the internal instance list. 16:17:27 - The master instance is set to execute jobs. To manually change this, use the CloudMan Admin panel. 16:17:55 - Stopping all '0' worker instance(s) 16:17:55 - No idle instances found 16:17:55 - Did not terminate any instances. 16:17:55 - Stopping NodeJS Proxy service 16:17:55 - Removing 'Postgres' service 16:17:55 - Shutting down ProFTPd service 16:17:55 - Removing 'GalaxyReports' service 16:17:55 - Shutting down Galaxy Reports... 16:17:56 - Removing 'Galaxy' service 16:17:56 - Shutting down Galaxy... 16:18:00 - Removing 'Galaxy' service 16:18:00 - Shutting down Galaxy... 16:18:08 - Stopping PostgreSQL from /mnt/galaxy/db on port 5950... 16:18:09 - Removing Slurmd service 16:18:09 - Stopping Nginx service 16:18:09 - Stopping Supervisor service 16:18:09 - Removing Slurmctld service 16:18:12 - ---> PROBLEM, running command '/sbin/start-stop-daemon --retry TERM/5/KILL/10 --stop --exec /usr/sbin/slurmctld' returned code '1', the following stderr: '' and stdout: 'No /usr/sbin/slurmctld found running; none killed. ' 16:18:13 - Initiating removal of 'galaxyIndices FS' data service with: volumes [], buckets [], transient storage [], nfs server None and gluster fs None 16:18:13 - Initiating removal of 'transient_nfs FS' data service with: volumes [], buckets [], transient storage [Transient storage @ /mnt/transient_nfs], nfs server None and gluster fs None 16:18:13 - Initiating removal of 'galaxy FS' data service with: volumes [vol-022ed48148db9076a (galaxy FS)], buckets [], transient storage [], nfs server None and gluster fs None 16:18:19 - Error removing unmounted path /mnt/galaxy: [Errno 39] Directory not empty: '/mnt/galaxy' 16:18:23 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:26 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:29 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:32 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:35 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:38 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:41 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:44 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:47 - Initiating removal of 'galaxy FS' data service with: volumes [vol-022ed48148db9076a (galaxy FS)], buckets [], transient storage [], nfs server None and gluster fs None 16:18:47 - Error unmounting file system '/mnt/galaxy', running command '/bin/umount /mnt/galaxy' returned code '32', the following stderr: 'umount: /mnt/galaxy: not mounted ' and stdout: '' 16:18:47 - Could not unmount file system at '/mnt/galaxy'

aws cloudman galaxy • 311 views
ADD COMMENTlink modified 6 months ago by Enis Afgan690 • written 6 months ago by c2c1210
1
gravatar for Enis Afgan
6 months ago by
Enis Afgan690
United States
Enis Afgan690 wrote:

It looks like there was a process still using the /mnt/galaxy/ file system and hence CloudMan was not able to unmount it to shut down the machine. Regardless, to get to your data, launching the same cluster again should put everything back in order. To do that, from launch.usegalaxy.org, you should click on Fetch saved clusters button under the _Advnced GVL options_ and choose the cluster name you created initially. They just launch the cluster and the same underlying file system should be attached to the new instance that will launch.

ADD COMMENTlink written 6 months ago by Enis Afgan690
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour