Question: Getting a Part of a File's Name
2
gravatar for mccoykg
3.2 years ago by
mccoykg80
United States
mccoykg80 wrote:

Hi all! I'm working with a large number of files, and each file has the sample's expansion factor in the title. One tool I use part-way through my analysis requires that expansion factor to be entered as a parameter. Instead I'd like to be able to run these files in a pipeline in batch with the expansion factor automatically taken from the name of each file, but I can't find a way to do that. If I were working with python I could use the split function since everything is named uniformly and the expansion factor is the 4th segment when splitting by "_". However, I don't think even using a similar function in the xml file (if such a function exists for the command line) would work, because the actual name of the file would be something like 3212.dat due to galaxy's internal naming rules.

Basically I'm looking for a way to use part of the title of a file as a flag / parameter in my xml file, if there is a way. Running the fitness calculation tool on each individual file by hand and entering the expansion factor myself each time takes a long time and it's quite dull. If anyone has any ideas I'd appreciate them!

Thanks much,

Kay

galaxy • 994 views
ADD COMMENTlink modified 3.2 years ago by Jennifer Hillman Jackson25k • written 3.2 years ago by mccoykg80
2
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

There is an open enhancement request for this exact function, please follow it here: https://trello.com/c/xCNTwtcg

That is just one solution. The other is to have the wrapper for the downstream tool interpret the given dataset file name as displayed in the UI itself (as opposed to the "dataset" identifier as tracked in the Galaxy file system). In this scenario, I could see the underlying tool capturing the input dataset name and then the XML adjusted to dynamically update based on selected content on the form. Or, perhaps the XML tool wrapper itself performing this task (all of the inputs are available at the point of a tool launch, from what I understand). 

Even though I personally do not know of a tool that does this specific task (parse out a dataset name then enter it as a tool form parameter setting), that does not mean one doesn't exist. Dataset names are used all the time to rename outputs, so capturing that portion of a wrapper could help you to understand which metadata value to trap and use. Many of the Picard tools access the dataset name metadata as does the tool FastQC. Those tool wrappers could contain code block(s) that could use used as models as a starting place.

Utilizing input dataset names is a most useful function for high-throughput analysis of inputs with informative naming (or informative headers/content within the file, or other metadata content). If anyone from the community does know of such a tool that can be used as a more complete model for this sort of task, please share a link to it!

I hope this helps a bit. I am sure you have already reviewed the Admin section of the wiki, so I will not point you to the help there, as it is almost certainly ancillary to your main goal at this point. Example tool wrappers are likely the best resource from where you are in the tool wrapper development.

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Jennifer Hillman Jackson25k
1

Thanks, I'll try looking into the metadata used to rename outputs. If I figure out a solution I'll post it here.

ADD REPLYlink written 3.2 years ago by mccoykg80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour