Downloading multiple files

Question: Downloading multiple files

3.6 years ago by

Germany

fkilpert • 0 wrote:

How do I download multiple files from Galaxy, preferably with the Unix shell?

How to get a list of working links for download?

I searched about this topic, but wasn't able to find anything that helped. There are some answers like Problem With Downloading describing how to get a download link by right clicking on the download icon. This is not working any longer, as the links are no longer real download addresses, unfortunately.

I also want to add that I need to download very many files (>>100 GB). Therefore, it would be helpful to generate a text file containing all the download addresses of a history.

Thank you!

shell multiple files download • 1.8k views

ADD COMMENT • link •

modified 3.6 years ago • written 3.6 years ago by fkilpert • 0

3.6 years ago by

Daniel Blankenberg ♦♦ 1.7k

United States

Daniel Blankenberg ♦♦ 1.7k wrote:

If you want to download very many files from a history using command-line processes, I would suggest using the Galaxy API to interact with the history and datasets. Or, you can try the export history option to download an archive with the entire histories content through your browser.

ADD COMMENT • link written 3.6 years ago by Daniel Blankenberg ♦♦ 1.7k

3.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Curl or wget can be used to download large files. I just tested the "right-click on the download icon" URL for datasets, and these do point to the actual files. All can be copied for the datasets that you need and added to simple shell script created to download in batch. Please give this method a try. Instructions are here for the command-line utilities if you need them:
http://wiki.galaxyproject.org/Support#Downloading_data

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.6 years ago by Jennifer Hillman Jackson ♦ 25k

Just a small follow up, if it is a dataset that also has metadata files (e.g. a bam Dataset has a bai index metadata file), you left click on the download link to bring up a popup with the download links for the BAM file and the BAI file, then you can right click on the file that you want and do the copy url bit.

ADD REPLY • link written 3.6 years ago by Daniel Blankenberg ♦♦ 1.7k

3.6 years ago by

fkilpert • 0

Germany

fkilpert • 0 wrote:

Hi, thank you for your suggestions. The problem is that the "right-click on the download icon" does not produce a usable URL for wget or curl. I wonder how this might work for you?!

As a solution, I wrote a Python script (utilizing bioblend) for batch download of all datasets within a specified history. It works reliably. The data transfer is uncompressed though.

ADD COMMENT • link written 3.6 years ago by fkilpert • 0

3.6 years ago by

fkilpert • 0

Germany

fkilpert • 0 wrote:

import bioblend
from bioblend.galaxy import GalaxyInstance
import os
import subprocess

galaxy_server = 'http://galaxy.server.net/'    # galaxy server url OR IP
galaxy_web_api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'    # your galaxy web api key; DO NOT SHARE!
history_name = "my history"    # your history name
outdir = os.getcwd()

########################################################################

def show_attributes(obj):
    """
    Prints key-value pairs of every attribute.
    """
    for key in sorted(obj.keys()):
        print key, ":", obj[key]
    print

########################################################################

gi = GalaxyInstance(galaxy_server, key=galaxy_web_api_key)

## History
h = gi.histories.get_histories(name=history_name)[0]
print "# History", "#"*48
print h['name']
print h['id']
##show_attributes(h)
print

## Dataset
i = 0
datasets = gi.histories.show_matching_datasets(h['id'])
for d in datasets:
    i += 1
    print "## Dataset", i, "of", len(datasets), "#"*40
    print d['name']
    print d['id']
    print d['misc_blurb']
    #show_attributes(d)

    outfile = os.path.join(outdir,d['name']+".gz")
    if os.path.isfile(outfile):
        print "File already exists:", outfile
        continue

    ## Download
    print "downloading..."
    download = gi.histories.download_dataset(h['id'], d['id'], file_path=outdir)
    print download

    ## gzip fastq
    if os.path.isfile(download) and download.endswith("fastq"):
        print "gzip..."
        subprocess.check_call("gzip -1 < {} > {}".format(download, outfile), shell=True)

        if os.path.isfile(outfile):
            print outfile
        try:
            os.remove(download)
        except:
            pass

    else:
        print "Error! File from download does NOT exist!"
        print download
        exit(1)

    print

ADD COMMENT • link written 3.6 years ago by fkilpert • 0

Please log in to add an answer.

Similar posts • Search »