Question: Can't import pandas with python script
gravatar for luciaaheitor
4 months ago by
luciaaheitor20 wrote:

Good morning,

I have a python script that I've added to my Galaxy instance. When I try to run it I get the following error:

ImportError: No module named pandas

I've tried to search for a solution to this issue but I found little information. I found in the toolshed 2 packages of pandas with the other libraries that I need but after installing one I get the same problem. How can I fix this problem? Is there a way to manually install these python libraries in galaxy with conda? (I couldn't understand how to do this in the documentation)

This script also uses latex since the output is a pdf file. I can run this script on the server where the galaxy instance is but since the libraries I need don't work on galaxy I assume that latex won't work as well. Is there a way to be able to use the script?

Thank you for the attention!

new tool python importerror • 421 views
ADD COMMENTlink modified 4 months ago by gb60 • written 4 months ago by luciaaheitor20
gravatar for gb
4 months ago by
gb60 wrote:

Galaxy executes the python scripts in a virtual environment. So it uses the packages that are installed there and not the ones on your machine. I think you need to add a requirement You can also not use the venv But some one else can probaly explain it better.

Personally I execute a bash script with the python command because it works easier in my current set-up. (Not sure if it is recommended)

So my xml script:

<command interpreter="bash"> $input $output $param1 $param2 $param3

Bash script:

#!/bin/bash -i $1 -o $2 -t $3 -f $4 -m $5

And my python script starts with:


Bash script if it is a python pipeline:

tempFolder=$(mktemp -d /your/location/XXXXXX) -i $1 -o $tempFolder -t $3 -f $4 -m $5
mv $tempFolder"/outputfile" $2
rm -rf $tempFolder

If I have python scripts that are using multiple files or have multiple outputs I do something like this. I make a temporary folder and the script uses that folder to store all temp and output files. If the script is finished I move (mv) them to the galaxy output location. And remove that temp folder. If I would execute the python pipeline from the command line I have a nice organised output folder. If I make a pipeline with python most of the time I have a function like this in it:

call(["mkdir","-p", args.tempdir])
call(["mkdir", tempdir + "/temp_files"])
call(["mkdir", tempdir + "/output_files"])
ADD COMMENTlink modified 4 months ago • written 4 months ago by gb60

I think this is exactly what I needed, thank you so much! I've been trying to get this to work but the problem that I wanted to address here is solved. I had requirements added to the xml file of the tool but I didn't see any change anywhere and the information available wasn't very clear to me on how to proceed.

Once again, thank you for sharing your method!

ADD REPLYlink written 4 months ago by luciaaheitor20

Gitter thread:

Glad they were able to help you to sort out this issue, too, combined with gb's advice! Jen

ADD REPLYlink written 4 months ago by Jennifer Hillman Jackson25k

Also realize that you need to restart galaxy every time when you make a change to a xml file. Your problem is solved now but if you are installing a python package in the venv of galaxy you see the installation progress when you start galaxy the first time after you added it. So if you do not see the installation progress something went wrong.

ADD REPLYlink written 4 months ago by gb60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour