Question: Bioblend download bed/interval file : HistoryDatasetAssociation not accessible
0
gravatar for christophe.habib
2.2 years ago by
France
christophe.habib340 wrote:

Hello everyone,

I need to download a large amount of files from histories of different users. These files are named the same way so I can easily get their IDs with histories.show_matching_datasets with a regular expression. The code looks like this :

hl = gi.histories.get_histories()
for h in hl:
    histID=h['id']
    todl = gi.histories.show_matching_datasets(histID,name_filter=".*_thing.vcf")[0]['id']
    gi.datasets.download_dataset(todl, file_path="./")

It works fine with VCF files, but when I try to do it with a bed files it doesn't work. I will try to give you all the details.

When I try to do it on the bed file I want to download, in a python terminal I have this error :

>>> gi.datasets.download_dataset(GAP, file_path="./")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/bioblend/galaxy/datasets/__init__.py", line 109, in download_dataset
r.raise_for_status()
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 840, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://jacob.bct.aphp.fr/galaxydiagnostic/api/histories/1ec857d2695a147a/contents/d665ee5b3d1a2cec/display?to_ext=interval

If I copy/paste the link in my browser it downloads the file (and the file is OK).

If I look into the uwsgi log I can read HistoryDatasetAssociation error (not accessible):

10.163.193.36 - - [06/sept./2016:17:11:46 +0200] "GET /galaxydiagnostic/api/datasets/d665ee5b3d1a2cec?hda_ldda=hda&key=KEY HTTP/1.1" 200 - "-" "python-requests/2.9.1"
[pid: 62931|app: 0|req: 3087/5520] 10.143.10.11 () {44 vars in 733 bytes} [Tue Sep 6 17:11:46 2016] GET /galaxydiagnostic/api/datasets/d665ee5b3d1a2cec?hda_ldda=hda&key=5c6ff32e1cea0d1103b5791fd9dc611d => generated 5066 bytes in 113 msecs (HTTP/1.1 200) 3 headers in 124 bytes (1 switches on core 1)
galaxy.webapps.galaxy.api.datasets ERROR 2016-09-06 17:11:47,128 Error getting display data for dataset (d665ee5b3d1a2cec) from history (1ec857d2695a147a): HistoryDatasetAssociation is not accessible by user
Traceback (most recent call last):
File "lib/galaxy/webapps/galaxy/api/datasets.py", line 293, in display
hda = self.hda_manager.get_accessible( decoded_content_id, trans.user )
File "lib/galaxy/managers/secured.py", line 33, in get_accessible
return self.error_unless_accessible( item, user, **kwargs )
File "lib/galaxy/managers/secured.py", line 43, in error_unless_accessible
raise exceptions.ItemAccessibilityException( "%s is not accessible by user" % ( self.model_class.__name__ ) )
ItemAccessibilityException: HistoryDatasetAssociation is not accessible by user

If I try to download the file with Galaxy interface and then use "Get Data" to have it in another history, and then try to download it with bioblend, I have the exact same error.

Here is the file :

chrX 101092448 101092628 - NXF5
chr12 1017619 1017979 + WNK1
chr19 10244847 10245027 - DNMT1
chr19 10246379 10246559 - DNMT1
chr5 10250057 10250237 + CCT5
chr19 10250677 10251037 - DNMT1
chr19 10252679 10252919 - DNMT1
chr19 10273232 10273503 - DNMT1

Can you reproduce this error ?

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by christophe.habib340

I just tested this on another instance, in another server, and it works. The instance where it doesn't works is located in a hospital where everything go through a proxy. I have already configurated the proxy in supervisor for uwsgi and handler.

When I try to get the file with wget I have this error :

galaxy@jacob:~/script/BIOBLEND/workingdir$ wget http://jacob.bct.aphp.fr/galaxydiagnostic/api/histories/97ac7a8501276335/contents/f36c37305f2ff761/display?to_ext=interval
--2016-08-02 12:00:07-- http://jacob.bct.aphp.fr/galaxydiagnostic/api/histories/97ac7a8501276335/contents/f36c37305f2ff761/display?to_ext=interval
Résolution de proxym-inter.aphp.fr (proxym-inter.aphp.fr)... 10.143.10.20
Connexion vers proxym-inter.aphp.fr (proxym-inter.aphp.fr)|10.143.10.20|:8080...connecté.
requête Proxy transmise, en attente de la réponse...500 Internal Server Error
2016-08-02 12:00:07 ERREUR 500: Internal Server Error.
view raw wget error hosted with ❤ by GitHub

So I guess something is missing in my configuration... Maybe NGINX needs to be configured as well for the hospital proxy ?

ADD REPLYlink written 2.2 years ago by christophe.habib340

In my nginx config file I have add these lines as suggested here : https://wiki.galaxyproject.org/Admin/Config/nginxProxy to send files with Nginx

location /_x_accel_redirect/ {
    internal;
    alias /;
}

Do I need to add something related to the hospital proxy ? What I don't understand is why I have no problem for VCF files, and I have this error for this type of file only.

PS : i tried to remove these lines and restart nginx, it does not change anything

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by christophe.habib340
2
gravatar for christophe.habib
2.2 years ago by
France
christophe.habib340 wrote:

Ok the problem was solved by upgrading Bioblend... pip install --upgrade bioblend

It's ridiculous, but i'll leave this post, if someone meets the same issue ...

ADD COMMENTlink written 2.2 years ago by christophe.habib340
0
gravatar for christophe.habib
2.2 years ago by
France
christophe.habib340 wrote:

Well actually it works only for a bed file that I have imported with get data. Not with the existing one, generated with my workflow. So the problem is not solved.

I tried to compare the dataset, I see something strange, the display_type is empty in the dataset that I can download :

Bad dataset
{u'accessible': True, u'metadata_endCol': 3, u'type_id': u'dataset-7cabe372bc15f85d', u'visible': True, u'resubmitted': False, u'create_time': u'2016-09-05T16:15:44.849522', u'creating_job': u'4bcf2d9357a3901f', u'metadata_chromCol': 1, u'file_size': 1851, u'file_ext': u'interval', u'id': u'7cabe372bc15f85d', u'misc_info': u"join (GNU coreutils) 8.13\nCopyright \xa9 2011 Free Software Foundation, Inc.\nLicense GPLv3+\xa0: GNU GPL version 3 ou ult\xe9rieure\n<http://gnu.org/licenses/gpl.html>\nCeci est logiciel libre, vous \xeates libre de le modifier et de le redistribuer.\nCe logiciel n'est", u'hda_ldda': u'hda', u'download_url': u'/galaxydiagnostic/api/histories/67eb836bb1e80a73/contents/7cabe372bc15f85d/display', u'state': u'ok', u'metadata_comment_lines': None, u'display_types': [{u'links': [{u'text': u'main', u'href': u'/galaxydiagnostic/datasets/87626/display_at/ucsc_main?redirect_url=http%3A%2F%2Fgenome.ucsc.edu%2Fcgi-bin%2FhgTracks%3Fdb%3Dhg19%26position%3Dchr19%3A10273232-50335453%26hgt.customText%3D%25s&display_url=http%3A%2F%2Fjacob.bct.aphp.fr%2Fgalaxydiagnostic%2Froot%2Fdisplay_as%3Fid%3D87626%26display_app%3Ducsc%26authz_method%3Ddisplay_at', u'target': u'_blank'}], u'label': u'display at UCSC'}], u'display_apps': [{u'links': [{u'text': u'Current', u'href': u'/galaxydiagnostic/display_application/7cabe372bc15f85d/ensembl_interval/ensembl_Current', u'target': u'_blank'}], u'label': u'display at Ensembl'}, {u'links': [{u'text': u'main', u'href': u'/galaxydiagnostic/display_application/7cabe372bc15f85d/rviewer_interval/lbl_main', u'target': u'_blank'}], u'label': u'display at RViewer'}, {u'links': [{u'text': u'local', u'href': u'/galaxydiagnostic/display_application/7cabe372bc15f85d/igv_interval_as_bed/local_default', u'target': u'_blank'}, {u'text': u'Human hg19', u'href': u'/galaxydiagnostic/display_application/7cabe372bc15f85d/igv_interval_as_bed/hg19', u'target': u'_blank'}], u'label': u'display with IGV'}], u'metadata_dbkey': u'hg19', u'type': u'file', u'metadata_column_types': [u'str', u'int', u'int', u'str', u'str'], u'misc_blurb': u'58 regions', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><th>1.Chrom</th><th>2.Start</th><th>3.End</th><th>4</th><th>5</th></tr><tr><td>chr19</td><td>10273232</td><td>10273503</td><td>-</td><td>DNMT1</td></tr></table>', u'update_time': u'2016-09-05T19:45:15.613279', u'data_type': u'galaxy.datatypes.interval.Interval', u'tags': [], u'deleted': False, u'history_id': u'67eb836bb1e80a73', u'metadata_column_names': None, u'meta_files': [], u'genome_build': u'hg19', u'hid': 40, u'metadata_startCol': 2, u'visualizations': [{u'href': u'/galaxydiagnostic/visualization/show/charts?dataset_id=7cabe372bc15f85d', u'target': u'galaxy_main', u'html': u'Charts', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/show/graphviz?dataset_id=7cabe372bc15f85d', u'target': u'galaxy_main', u'html': u'Graph Visualization', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/show/scatterplot?dataset_id=7cabe372bc15f85d', u'target': u'galaxy_main', u'html': u'Scatterplot', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/trackster?dataset_id=7cabe372bc15f85d&dbkey=hg19&hda_ldda=hda', u'target': u'_top', u'html': u'Trackster', u'embeddable': False}], u'metadata_data_lines': 58, u'annotation': None, u'dataset_id': u'0fa11a97a24d1e27', u'history_content_type': u'dataset', u'uuid': u'4af1f6bc-42df-4722-a8a5-7039bc2c4ab4', u'name': u'4919-AP_S23_L001_R1_001.Fsickle_sort_MD_IR_BR_PR_count_sorted_F20_groupL20X.bed', u'extension': u'interval', u'metadata_columns': 5, u'url': u'/galaxydiagnostic/api/histories/67eb836bb1e80a73/contents/7cabe372bc15f85d', u'metadata_strandCol': None, u'metadata_nameCol': None, u'model_class': u'HistoryDatasetAssociation', u'rerunnable': True, u'metadata_delimiter': u'\t', u'purged': False, u'api_type': u'file'}
Good dataset
{u'accessible': True, u'metadata_endCol': 3, u'type_id': u'dataset-58f254a4e72911bb', u'visible': True, u'resubmitted': False, u'create_time': u'2016-09-07T16:44:26.465325', u'creating_job': u'9f59d31e7ecbe656', u'metadata_chromCol': 1, u'file_size': 1851, u'file_ext': u'interval', u'id': u'58f254a4e72911bb', u'misc_info': u'uploaded interval file', u'hda_ldda': u'hda', u'download_url': u'/galaxydiagnostic/api/histories/67eb836bb1e80a73/contents/58f254a4e72911bb/display', u'state': u'ok', u'metadata_comment_lines': None, u'display_types': [], u'display_apps': [{u'links': [{u'text': u'local', u'href': u'/galaxydiagnostic/display_application/58f254a4e72911bb/igv_interval_as_bed/local_default', u'target': u'_blank'}], u'label': u'display with IGV'}], u'metadata_dbkey': u'?', u'type': u'file', u'metadata_column_types': [u'str', u'int', u'int', u'str', u'str'], u'misc_blurb': u'58 regions', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><th>1.Chrom</th><th>2.Start</th><th>3.End</th><th>4</th><th>5</th></tr><tr><td>chr19</td><td>10273232</td><td>10273503</td><td>-</td><td>DNMT1</td></tr></table>', u'update_time': u'2016-09-07T16:44:46.485710', u'data_type': u'galaxy.datatypes.interval.Interval', u'tags': [], u'deleted': False, u'history_id': u'67eb836bb1e80a73', u'metadata_column_names': None, u'meta_files': [], u'genome_build': u'?', u'hid': 45, u'metadata_startCol': 2, u'visualizations': [{u'href': u'/galaxydiagnostic/visualization/show/charts?dataset_id=58f254a4e72911bb', u'target': u'galaxy_main', u'html': u'Charts', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/show/graphviz?dataset_id=58f254a4e72911bb', u'target': u'galaxy_main', u'html': u'Graph Visualization', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/show/scatterplot?dataset_id=58f254a4e72911bb', u'target': u'galaxy_main', u'html': u'Scatterplot', u'embeddable': False}, {u'href': u'/galaxydiagnostic/visualization/trackster?dataset_id=58f254a4e72911bb&dbkey=%3F&hda_ldda=hda', u'target': u'_top', u'html': u'Trackster', u'embeddable': False}], u'metadata_data_lines': 58, u'annotation': None, u'dataset_id': u'53f56d4c992e4747', u'history_content_type': u'dataset', u'uuid': u'b21654c9-4975-4c85-adaa-9c8f715a54fd', u'name': u'Galaxy40-[4919-AP_S23_L001_R1_001.Fsickle_sort_MD_IR_BR_PR_count_sorted_F20_groupL20X.bed].interval', u'extension': u'interval', u'metadata_columns': 5, u'url': u'/galaxydiagnostic/api/histories/67eb836bb1e80a73/contents/58f254a4e72911bb', u'metadata_strandCol': None, u'metadata_nameCol': None, u'model_class': u'HistoryDatasetAssociation', u'rerunnable': False, u'metadata_delimiter': u'\t', u'purged': False, u'api_type': u'file'}
view raw Datasets hosted with ❤ by GitHub

ADD COMMENTlink written 2.2 years ago by christophe.habib340

So I wonder if the module that download the file is using the display type in any way ? I would like to try to modify the display types, but I can't manage to do it with bioblend. Any advice ?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by christophe.habib340

I just tried a minimal workflow, imported the 2 input that create the bed file. It create a file with display_type not null. So this lead is wrong it is not the reason of this behaviour.

ADD REPLYlink written 2.2 years ago by christophe.habib340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour