Question: Kraken run time
0
gravatar for d.hand
12 months ago by
d.hand0
d.hand0 wrote:

I have been running 0.2-2M 150 bp kraken jobs on the bacterial database on a private cloudman cluster with one c3.large master node (set to not do jobs) and 0-10 r3.8xlarge worker nodes (32 cores, 244 gb memory).

The problem is that the jobs are taking between 6-13+ hours with the bacterial database to complete, is this normal?

It also made me think that I do not understand how Kraken works with galaxy. Is the cluster downloading the large bacterial database from NCBI and making it for each job?

kraken • 442 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by d.hand0

Hi thanks for the response, actually i still have a job running now for 24 hours! I have set the master to not do any jobs. So the job is running on one r3.8xlarge, I'm not sure How I can get much larger than this!

ADD REPLYlink written 12 months ago by d.hand0
1
gravatar for Enis Afgan
12 months ago by
Enis Afgan690
United States
Enis Afgan690 wrote:

A couple of comments on the system setup - Kraken is not setup to run using multiple cores on CloudMan (the list of tools that are setup is available here https://github.com/galaxyproject/ansible-cloudman-galaxy-setup/blob/master/files/galaxy/job_conf.xml.cloud) so any jobs will run using just a single process. The file could be modified on the live instance to include kraken and then Galaxy restarted for the change to take effect. Also, the first time a job that's using reference data is run, the data will need to be downloaded onto the node running the job from the globally available CVMFS data repository that the (Galaxy) project hosts/maintains. With this data being 180GB, there may actually be an additional problem because the cache for this data is limited to ~30GB, but I'm actually not sure if CVMFS can deal with that limitation. Finally, although conceptually it doesn't really matter, CloudMan is setup to use Slurm.

So for a more specific recommendation, I'd suggest taking a look at the outpuf of squeue and sinfo commands, as well as logging into the nodes running the job(s) to see what's their load. Probably updating the job_conf.xml file to make kraken run in parallel and seeing how far that goes.

ADD COMMENTlink written 12 months ago by Enis Afgan690

So i modified job_conf.xml, but I don't think this is the problem.

With "squeue" I get:

         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            11      main g464_kra   galaxy  R      49:46      1 w1

And a r3.8xlarge worker node is started, but i think that there is a bottle neck on the c3.large master (set to do no jobs) as when using "free -m" on the master i get:

         total       used       free     shared    buffers     cached

Mem: 3764 3436 327 37 25 2496 -/+ buffers/cache: 914 2849 Swap: 0 0 0

ADD REPLYlink written 12 months ago by d.hand0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour