Question: Importing Very Large NGS DataSets
1
gravatar for mmoore
12 days ago by
mmoore20
United States
mmoore20 wrote:

I'm running my own Galaxy server, and I'm looking to import 20-40gb NGS fastq files for processing...... I know that's unusual..... :)

Currently I'm manually uploading to the FTP directory, and importing using "Choose FTP". The upload script completes in a short period of time, and the upload job is successfully started in the galaxy queue.

However the upload.py script is taking hours to complete for each file. Is there anyway to speed up the process by linking the files directly or by not performing some of the sanity checks contained within the script, or by another means?

Regards,

Mat

ADD COMMENTlink modified 12 days ago by Jennifer Hillman Jackson25k • written 12 days ago by mmoore20
1

The majority of time you spent waiting when importing large datasets to Galaxy from a local filesystem is most probably during 'detecting metadata' step - when Galaxy is trying to reason about the data (count sequences etc.). This would go faster if you can get a faster machine to run the job.

Besides that I do not think there is much you can do since Galaxy needs metadata for every dataset.

ADD REPLYlink written 12 days ago by Martin Čech ♦♦ 4.8k
1

Thanks Martin, And yes I can pretty much confirm this is the case. one core, 100% utilized for 4+ hours on the upload.py script. Maybe time for me to have a closer look at that script :)

ADD REPLYlink written 12 days ago by mmoore20
0
gravatar for Jennifer Hillman Jackson
12 days ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi,

This method would probably work out for your case: https://docs.galaxyproject.org/en/master/admin/useful_scripts.html?highlight=data%20libraries

Once the fastq files are in a data library, they can be copied into histories as datasets to run the analysis. Copied datasets do not use up more disc space, they just link back to the original file in the library.

I also asked the developers if they had better ideas or more to add. Please follow their responses in Gitter (or they might reply directly back here): https://gitter.im/galaxyproject/Lobby?at=5b92edb9e481f854a685c9d6

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 12 days ago • written 12 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 105 users visited in the last hour