Question: Downloading multiple files
0
gravatar for fkilpert
3.6 years ago by
fkilpert0
Germany
fkilpert0 wrote:

How do I download multiple files from Galaxy, preferably with the Unix shell?

How to get a list of working links for download?

I searched about this topic, but wasn't able to find anything that helped. There are some answers like Problem With Downloading describing how to get a download link by right clicking on the download icon. This is not working any longer, as the links are no longer real download addresses, unfortunately.

I also want to add that I need to download very many files (>>100 GB). Therefore, it would be helpful to generate a text file containing all the download addresses of a history.

Thank you!

shell multiple files download • 1.8k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by fkilpert0
1
gravatar for Daniel Blankenberg
3.6 years ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:

If you want to download very many files from a history using command-line processes, I would suggest using the Galaxy API to interact with the history and datasets. Or, you can try the export history option to download an archive with the entire histories content through your browser.

ADD COMMENTlink written 3.6 years ago by Daniel Blankenberg ♦♦ 1.7k
0
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Curl or wget can be used to download large files. I just tested the "right-click on the download icon" URL for datasets, and these do point to the actual files. All can be copied for the datasets that you need and added to simple shell script created to download in batch. Please give this method a try. Instructions are here for the command-line utilities if you need them:
http://wiki.galaxyproject.org/Support#Downloading_data

Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.6 years ago by Jennifer Hillman Jackson25k

Just a small follow up, if it is a dataset that also has metadata files (e.g. a bam Dataset has a bai index metadata file), you left click on the download link to bring up a popup with the download links for the BAM file and the BAI file, then you can right click on the file that you want and do the copy url bit.

ADD REPLYlink written 3.6 years ago by Daniel Blankenberg ♦♦ 1.7k
0
gravatar for fkilpert
3.6 years ago by
fkilpert0
Germany
fkilpert0 wrote:

Hi, thank you for your suggestions. The problem is that the "right-click on the download icon" does not produce a usable URL for wget or curl. I wonder how this might work for you?!

As a solution, I wrote a Python script (utilizing bioblend) for batch download of all datasets within a specified history. It works reliably. The data transfer is uncompressed though.

ADD COMMENTlink written 3.6 years ago by fkilpert0
0
gravatar for fkilpert
3.6 years ago by
fkilpert0
Germany
fkilpert0 wrote:

import bioblend
from bioblend.galaxy import GalaxyInstance
import os
import subprocess

galaxy_server = 'http://galaxy.server.net/'    # galaxy server url OR IP
galaxy_web_api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'    # your galaxy web api key; DO NOT SHARE!
history_name = "my history"    # your history name
outdir = os.getcwd()

########################################################################

def show_attributes(obj):
    """
    Prints key-value pairs of every attribute.
    """
    for key in sorted(obj.keys()):
        print key, ":", obj[key]
    print
    
########################################################################

gi = GalaxyInstance(galaxy_server, key=galaxy_web_api_key)

## History
h = gi.histories.get_histories(name=history_name)[0]
print "# History", "#"*48
print h['name']
print h['id']
##show_attributes(h)
print

## Dataset
i = 0
datasets = gi.histories.show_matching_datasets(h['id'])
for d in datasets:
    i += 1
    print "## Dataset", i, "of", len(datasets), "#"*40
    print d['name']
    print d['id']
    print d['misc_blurb']
    #show_attributes(d)
    
    outfile = os.path.join(outdir,d['name']+".gz")
    if os.path.isfile(outfile):
        print "File already exists:", outfile
        continue
    
    ## Download
    print "downloading..."
    download = gi.histories.download_dataset(h['id'], d['id'], file_path=outdir)
    print download
    
    ## gzip fastq
    if os.path.isfile(download) and download.endswith("fastq"):
        print "gzip..."
        subprocess.check_call("gzip -1 < {} > {}".format(download, outfile), shell=True)
        
        if os.path.isfile(outfile):
            print outfile
        try:
            os.remove(download)
        except:
            pass
            
    else:
        print "Error! File from download does NOT exist!"
        print download
        exit(1)
    
    print

 

ADD COMMENTlink written 3.6 years ago by fkilpert0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour