How to get the name of the file (.dat) through the API?

Question: How to get the name of the file (.dat) through the API?

4.6 years ago by

vaskin90 • 30 wrote:

I'm trying to use Galaxy for the NGS analysis. I'm running it on a server, so I don't use/need the GUI. I need to do merging of multiple files (actually it is merging of BAM). So I need to run the workflow with multiple inputs (I don't know the number in advance). Ok, there are steps, but using them, I cannot do the merging, since at the step the merging workflow has only one dataset at a time, right? So I was wondering if I can get the paths to actual files (to .dat) file through the API, so I can combine them in a comam-separated line and pass the my merger.

ngs api • 1.3k views

ADD COMMENT • link •

modified 4.5 years ago by jmchilton ♦ 1.1k • written 4.6 years ago by vaskin90 • 30

4.5 years ago by

jmchilton ♦ 1.1k

United States

jmchilton ♦ 1.1k wrote:

I would recommend against passing explicit file paths to your tool like this - it will break down Galaxy abstractions for security, provenance, etc... - Galaxy tools should consume datasets not files. Alternative approaches include rewriting the workflow on the fly to accommodate your number of inputs, breaking the execution into a few workflows and using the tools API to merge the in between the mapping steps at the beginning and the steps after the BAM merging, or using the new support for such workflows available by creating dataset collections.

However, if you still want this information it certainly can be obtained from the API - the result of the API call GET /api/histories/<encoded_history_id>/contents/<encoded_dataset_id> should contain an attribute named file_name. By default only admins can see this, but the configuration expose_dataset_path in your universe_wsig.ini can be set to True to expose this information to all users.

ADD COMMENT • link written 4.5 years ago by jmchilton ♦ 1.1k

Thank you, John.

Actually, I'd love to use datasets and it looks like, that dataset collection can help me. But since they are under development, I'd wait a while when they are well documented and stable.

Breaking the workflow won't solve the case, since the multiple datasets still have to be passed somehow.

Rewriting the workflow on the fly? That could work, but I didn't find a natural way of doing this. I'm using BioBlend, that does not have this option.

Using all the solutions above (except dataset collections) would make my code complicated.

expose_dataset_path is what I needed. And so far I would stick to this solution, that would be the only hack.

ADD REPLY • link modified 4.5 years ago • written 4.5 years ago by vaskin90 • 30

Similar posts • Search »