Question: Kraken run time
0
gravatar for d.hand
9 weeks ago by
d.hand0
d.hand0 wrote:

I have been running 0.2-2M 150 bp kraken jobs on the bacterial database on a private cloudman cluster with one c3.large master node (set to not do jobs) and 0-10 r3.8xlarge worker nodes (32 cores, 244 gb memory).

The problem is that the jobs are taking between 6-13+ hours with the bacterial database to complete, is this normal?

It also made me think that I do not understand how Kraken works with galaxy. Is the cluster downloading the large bacterial database from NCBI and making it for each job?

kraken • 130 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by d.hand0

Hi thanks for the response, actually i still have a job running now for 24 hours! I have set the master to not do any jobs. So the job is running on one r3.8xlarge, I'm not sure How I can get much larger than this!

ADD REPLYlink written 9 weeks ago by d.hand0
1
gravatar for Enis Afgan
9 weeks ago by
Enis Afgan620
United States
Enis Afgan620 wrote:

A couple of comments on the system setup - Kraken is not setup to run using multiple cores on CloudMan (the list of tools that are setup is available here https://github.com/galaxyproject/ansible-cloudman-galaxy-setup/blob/master/files/galaxy/job_conf.xml.cloud) so any jobs will run using just a single process. The file could be modified on the live instance to include kraken and then Galaxy restarted for the change to take effect. Also, the first time a job that's using reference data is run, the data will need to be downloaded onto the node running the job from the globally available CVMFS data repository that the (Galaxy) project hosts/maintains. With this data being 180GB, there may actually be an additional problem because the cache for this data is limited to ~30GB, but I'm actually not sure if CVMFS can deal with that limitation. Finally, although conceptually it doesn't really matter, CloudMan is setup to use Slurm.

So for a more specific recommendation, I'd suggest taking a look at the outpuf of squeue and sinfo commands, as well as logging into the nodes running the job(s) to see what's their load. Probably updating the job_conf.xml file to make kraken run in parallel and seeing how far that goes.

ADD COMMENTlink written 9 weeks ago by Enis Afgan620

So i modified job_conf.xml, but I don't think this is the problem.

With "squeue" I get:

         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            11      main g464_kra   galaxy  R      49:46      1 w1

And a r3.8xlarge worker node is started, but i think that there is a bottle neck on the c3.large master (set to do no jobs) as when using "free -m" on the master i get:

         total       used       free     shared    buffers     cached

Mem: 3764 3436 327 37 25 2496 -/+ buffers/cache: 914 2849 Swap: 0 0 0

ADD REPLYlink written 9 weeks ago by d.hand0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 85 users visited in the last hour