Question: Plugging R Into Galaxy
0
gravatar for Anthony Ferrari
7.6 years ago by
Anthony Ferrari50 wrote:
Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with *cheetah template engine*. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
galaxy • 1.7k views
ADD COMMENTlink modified 7.6 years ago by Bree, Freddy de80 • written 7.6 years ago by Anthony Ferrari50
0
gravatar for James Taylor
7.6 years ago by
James Taylor470
United States
James Taylor470 wrote:
Absolutely. xy_plot is just an example of what is possible with config files. If you just want to call an R script that takes command line arguments, you can do that as well. Just use Rscript. -- jt James Taylor Assistant Professor Department of Biology Department of Mathematics & Computer Science Emory University
ADD COMMENTlink written 7.6 years ago by James Taylor470
0
gravatar for ray mcgovern
7.6 years ago by
ray mcgovern10
ray mcgovern10 wrote:
Hi, This may not be the 'best' way, but it I've found it to be a workable solution. I'm running a Perl script that takes the tool-xml parameters, does some processing, and calls R to obtain a p-val calculation. The results are parsed and included in a formatted HTML page. It's all code based. You can likely use something similar in a Python script: # --- create the R command my $cmd=<<end; echo="" 'phyper($match,="" $listcount,="" $all,="" $qrycount,="" lower.tail="FALSE," log.p="FALSE)'"> rcmd END system($cmd); # --- run R in command line mode my $runR=<<end; r="" --slave="" -f="" rcmd=""> rpval END system($runR); # --- open and parse results open my $fh, "<", "rpval"; my $line = <$fh>; close $fh; $line =~ /\[1\] (.+)/; hope you find this of use. -ray
ADD COMMENTlink written 7.6 years ago by ray mcgovern10
0
gravatar for Bossers, Alex
7.6 years ago by
Bossers, Alex240
Bossers, Alex240 wrote:
Anthony, We had some discussion on the galaxy dev list about R. See for instance my posting of Oct 14th and several others as well (for ref sake a snippet is attached below). That example opens an R script outside tool config (in this case a script that opens an input file and generates some output file which is grabbed by galaxy). Hope this helps, Alex <command> R -slave --vanilla -f $in_r --args $in_data $out_data </command> <inputs> <param name="in_data" type="data" format="tabular" label="Test data file"/> <param name="in_r" type="data" format="text" label="R script to load and execute"/> </inputs> <outputs> <data name="out_data" type="data" format="tabular" label="R script output"/> </outputs> The R script provided will grab the args from the cmd line as you indiciated earlier: # R script file to grab input and output filenames from cmdline and just copy args <- commandArgs() output <- read.table(args[6], header=T) write.table(output,sep="\t",file=args[7],row.names=F) #end script Van: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user- bounces@lists.bx.psu.edu] Namens Anthony Ferrari Verzonden: dinsdag 9 november 2010 15:21 Aan: galaxy-user@lists.bx.psu.edu Onderwerp: [galaxy-user] plugging R into galaxy Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with cheetah template engine. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
ADD COMMENTlink written 7.6 years ago by Bossers, Alex240
Thanks to all for your comments. Alex, I have done something similar to what your suggest except that I use Rscript executable embedded in a script shell. There is something I don't understand when reading your example. How galaxy understands that is has to store your write.table() call in your $out_data variable ? Didn't you forget to redirect your R output stream with ">" ? <command> R –slave --vanilla -f $in_r --args $in_data > $out_data </command> Anthony
ADD REPLYlink written 7.6 years ago by Anthony Ferrari50
Hi Anthony, good you have it going. The example showed how to load a user provided R script (which might be a security issue but ok for testing). We normally would specify the R script with path to the tools dir. This has the advantage you can change and test your script without having to refresh your running tools in galaxy. The example is correct. The Provided R script in this case gets both the arguments $in_data AND $out_data into R by the args[6 and 7]. So output of the R script is directly pulled and put into the correct expected file without any mv statements in bash shell or whatever. The > would be used if output is at STDOUT. Freddy made a good comment on the WARNINGS! You have to capture them and either trash them to /dev/null (or &-) or append them to a log file (usually STDERR can be captured by 2>&- or 2>>./somelogfile.log). If you don't take care of it they will give you a red history box even when all was fine and just a warning so you have to deal with that... accounts for more tools than just R by the way. Cheers, Alex ________________________________ Van: Anthony Ferrari [ferraria@gmail.com] Verzonden: woensdag 10 november 2010 19:12 Aan: Bossers, Alex CC: galaxy-user@lists.bx.psu.edu Onderwerp: Re: [galaxy-user] plugging R into galaxy Thanks to all for your comments. Alex, I have done something similar to what your suggest except that I use Rscript executable embedded in a script shell. There is something I don't understand when reading your example. How galaxy understands that is has to store your write.table() call in your $out_data variable ? Didn't you forget to redirect your R output stream with ">" ? <command> R –slave --vanilla -f $in_r --args $in_data > $out_data </command> Anthony Anthony, We had some discussion on the galaxy dev list about R. See for instance my posting of Oct 14th and several others as well (for ref sake a snippet is attached below). That example opens an R script outside tool config (in this case a script that opens an input file and generates some output file which is grabbed by galaxy). Hope this helps, Alex <command> R –slave --vanilla -f $in_r --args $in_data $out_data </command> <inputs> <param name="in_data" type="data" format="tabular" label="Test data file"/> <param name="in_r" type="data" format="text" label="R script to load and execute"/> </inputs> <outputs> <data name="out_data" type="data" format="tabular" label="R script output"/> </outputs> The R script provided will grab the args from the cmd line as you indiciated earlier: # R script file to grab input and output filenames from cmdline and just copy args <- commandArgs() output <- read.table(args[6], header=T) write.table(output,sep="\t",file=args[7],row.names=F) #end script Van: galaxy-user-bounces@lists.bx.psu.edu<mailto:galaxy-user- bounces@lists.bx.psu.edu=""> [mailto:galaxy-user- bounces@lists.bx.psu.edu<mailto:galaxy-user-bounces@lists.bx.psu.edu>] Namens Anthony Ferrari Verzonden: dinsdag 9 november 2010 15:21 Aan: galaxy-user@lists.bx.psu.edu<mailto:galaxy-user@lists.bx.psu.edu> Onderwerp: [galaxy-user] plugging R into galaxy Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with cheetah template engine. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
ADD REPLYlink written 7.6 years ago by Bossers, Alex240
0
gravatar for Bree, Freddy de
7.6 years ago by
Bree, Freddy de80 wrote:
The out.data is simply picked up by galaxy: As long as you define your variable capturing the output and also in your tool.xml file. Make sure the output is defined as "data" in the tool.xml file. Initially I thought the same thing, but the piping is not needed with '>', worse, it doesn't work like that. Freddy
ADD COMMENTlink written 7.6 years ago by Bree, Freddy de80
OK. I have made some tests this morning to figure this all out. Indeed, the problem with ">" is that it catches every single message R sends to STDOUT and not only object of your interest, so this is definitely not a reliable solution. Solution suggests by Alex worked perfectly well. I would suggest also to use the 'trailingOnly=TRUE' option within the commandArgs() call in the R script. That allows you to only care about the args given after the '--args' option. You can then forget how many previous options you have in your command line (--vanilla, --slave, -f or others...). First parameter useful to your R script would then be commandArgs(trailingOnly=T)[1] and so on. There is just another little point that I would like to clarify. At the beginning my tests didn't work at all and I realize that I have to give the full path to the R script (in the command tag) to make it work. <command interpreter="bash"> r_script_wrapper.sh rcode.R --args $in_data $out_data </command> ### FAILURE <command interpreter="bash"> r_script_wrapper.sh /full/path/to/rcode.R --args $in_data $out_data </command> ### SUCCESS However, r_script_wrapper.sh and rcode.R are in the same directory. To shed a light on this, I made an `echo $PWD` in r_script_wrapper.sh and it returned : /path/to/galaxy_dist/database/job_working_directory/Num where Num is the number of the job submitted to the PBS queue. Is that a known feature ? here follows the content of r_script_wrapper.sh #!/bin/sh # Function that writes a message to stderr and exits fail() { echo "$@" >&2 exit 1 } # Ensure R executable is found which R > /dev/null || fail "'R' is required by this tool but was not found on path" # Extract first argument rcode=$1; shift # Ensure the file exists test -f $rcode || fail "R input file '$rcode' does not exist" # Invoke R R --vanilla --slave --file=$rcode --args $* Cheers Anthony
ADD REPLYlink written 7.6 years ago by Anthony Ferrari50
That is a very handy tip, commandArgs(trailingOnly=T), thanks! What happens if you do <command> rather than <command interpreter="bash">? Peter P.S. Cross posted to Galaxy-dev, could we continue this there?
ADD REPLYlink written 7.6 years ago by Peter30
This is worse. Here's what I got : An error occurred running this job: /var/spool/torque/mom_priv/jobs/ 77.node054.cluster.SC: line 11: ./r_script_wrapper.sh: No such file or directory This time even the wrapper is not found. And I have to specify both paths to make it work. (To r_script_wrapper.sh and rcode.R) Anthony Peter sure.
ADD REPLYlink written 7.6 years ago by Anthony Ferrari50
errata : there are no --args option in the <command> tag examples. <command interpreter="bash"> r_script_wrapper.sh rcode.R $in_data $out_data </command> ### FAILURE <command interpreter="bash"> r_script_wrapper.sh /full/path/to/rcode.R $in_data $out_data </command> ### SUCCESS
ADD REPLYlink written 7.6 years ago by Anthony Ferrari50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 87 users visited in the last hour