Kraken run time

Question: Kraken run time

12 months ago by

d.hand • 0

d.hand • 0 wrote:

I have been running 0.2-2M 150 bp kraken jobs on the bacterial database on a private cloudman cluster with one c3.large master node (set to not do jobs) and 0-10 r3.8xlarge worker nodes (32 cores, 244 gb memory).

The problem is that the jobs are taking between 6-13+ hours with the bacterial database to complete, is this normal?

It also made me think that I do not understand how Kraken works with galaxy. Is the cluster downloading the large bacterial database from NCBI and making it for each job?

kraken • 442 views

ADD COMMENT • link •

modified 12 months ago • written 12 months ago by d.hand • 0

Hi thanks for the response, actually i still have a job running now for 24 hours! I have set the master to not do any jobs. So the job is running on one r3.8xlarge, I'm not sure How I can get much larger than this!

ADD REPLY • link written 12 months ago by d.hand • 0

12 months ago by

Enis Afgan • 690

United States

Enis Afgan • 690 wrote:

A couple of comments on the system setup - Kraken is not setup to run using multiple cores on CloudMan (the list of tools that are setup is available here https://github.com/galaxyproject/ansible-cloudman-galaxy-setup/blob/master/files/galaxy/job_conf.xml.cloud) so any jobs will run using just a single process. The file could be modified on the live instance to include kraken and then Galaxy restarted for the change to take effect. Also, the first time a job that's using reference data is run, the data will need to be downloaded onto the node running the job from the globally available CVMFS data repository that the (Galaxy) project hosts/maintains. With this data being 180GB, there may actually be an additional problem because the cache for this data is limited to ~30GB, but I'm actually not sure if CVMFS can deal with that limitation. Finally, although conceptually it doesn't really matter, CloudMan is setup to use Slurm.

So for a more specific recommendation, I'd suggest taking a look at the outpuf of squeue and sinfo commands, as well as logging into the nodes running the job(s) to see what's their load. Probably updating the job_conf.xml file to make kraken run in parallel and seeing how far that goes.

ADD COMMENT • link written 12 months ago by Enis Afgan • 690

So i modified job_conf.xml, but I don't think this is the problem.

With "squeue" I get:

         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            11      main g464_kra   galaxy  R      49:46      1 w1

And a r3.8xlarge worker node is started, but i think that there is a bottle neck on the c3.large master (set to do no jobs) as when using "free -m" on the master i get:

         total       used       free     shared    buffers     cached

Mem: 3764 3436 327 37 25 2496 -/+ buffers/cache: 914 2849 Swap: 0 0 0

ADD REPLY • link written 12 months ago by d.hand • 0

Similar posts • Search »