Question: Importing Very Large NGS DataSets
1
gravatar for mmoore
11 weeks ago by
mmoore30
United States
mmoore30 wrote:

I'm running my own Galaxy server, and I'm looking to import 20-40gb NGS fastq files for processing...... I know that's unusual..... :)

Currently I'm manually uploading to the FTP directory, and importing using "Choose FTP". The upload script completes in a short period of time, and the upload job is successfully started in the galaxy queue.

However the upload.py script is taking hours to complete for each file. Is there anyway to speed up the process by linking the files directly or by not performing some of the sanity checks contained within the script, or by another means?

Regards,

Mat

admin data libraries local galaxy • 157 views
ADD COMMENTlink modified 11 weeks ago by Jennifer Hillman Jackson25k • written 11 weeks ago by mmoore30
1

The majority of time you spent waiting when importing large datasets to Galaxy from a local filesystem is most probably during 'detecting metadata' step - when Galaxy is trying to reason about the data (count sequences etc.). This would go faster if you can get a faster machine to run the job.

Besides that I do not think there is much you can do since Galaxy needs metadata for every dataset.

ADD REPLYlink written 11 weeks ago by Martin Čech ♦♦ 4.9k
1

Thanks Martin, And yes I can pretty much confirm this is the case. one core, 100% utilized for 4+ hours on the upload.py script. Maybe time for me to have a closer look at that script :)

ADD REPLYlink written 11 weeks ago by mmoore30
0
gravatar for Jennifer Hillman Jackson
11 weeks ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi,

This method would probably work out for your case: https://docs.galaxyproject.org/en/master/admin/useful_scripts.html?highlight=data%20libraries

Once the fastq files are in a data library, they can be copied into histories as datasets to run the analysis. Copied datasets do not use up more disc space, they just link back to the original file in the library.

I also asked the developers if they had better ideas or more to add. Please follow their responses in Gitter (or they might reply directly back here): https://gitter.im/galaxyproject/Lobby?at=5b92edb9e481f854a685c9d6

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 167 users visited in the last hour