Question: Creating A Galaxy Tool In R - "You Must Not Use 8-Bit Bytestrings"
gravatar for Dan Tenenbaum
6.6 years ago by
Dan Tenenbaum20 wrote:
Hello, I'm a galaxy newbie and running into several issues trying to adapt an R script to be a galaxy tool. I'm looking at the XY plotting tool for guidance (tools/plot/xy_plot.xml), but I decided not to embed my script in XML, but instead have it in a separate script file, that way I can still run it from the command line and make sure it works as I make incremental changes. (So my script starts with args <- commandArgs(TRUE)). Also, if it doesn't work, this suggests to me that there is a problem with my galaxy configuration. First, I tried using the script that comes with the XY plotting tool, but it threw away my arguments: An error occurred running this job: ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat' __ignored__ ARGUMENT '/Users/dtenenba/dev/galaxy- dist/database/files/000/dataset_3.dat' __ignored__ ARGUMENT 'Fly' __ignored__ ARGUMENT 'Tagwise' __ignored__ etc. So then I tried just switching to Rscript: Rscript RNASeq.R $countsTsv $designTsv "$organism" $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2 (My script produces as output a csv file and a pdf file. The final two arguments I'm passing are the names of those files.) But then I get an error that Rscript can't be found. So I wrote a little wrapper script, #!/bin/sh Rscript $* And called that: RNASeq.R $countsTsv $designTsv "$organism" $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2 Then I got an error that RNASeq.R could not be found. So then I added the absolute path to my R script to the tag. This seemed to work (that is, it got me further, to the next error), but I'm not sure why I had to do this; in all the other tools I'm looking at, the directory to the script to run does not have to be specified; I assumed that the command would run in the appropriate directory. So now I've specified the full path to my R script: /Users/dtenenba/dev/galaxy-dist/tools/bioc/RNASeq.R $countsTsv $designTsv "$organism" $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2 And I get the following long error, which includes all of the output of my R script: Traceback (most recent call last): File "/Users/dtenenba/dev/galaxy- dist/lib/galaxy/jobs/runners/", line 133, in run_job job_wrapper.finish( stdout, stderr ) File "/Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/", line 725, in finish self.sa_session.flush() File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 127, in do return getattr(self.registry(), name)(*args, **kwargs) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 1356, in flush self._flush(objects) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 1434, in _flush flush_context.execute() File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 261, in execute UOWExecutor().execute(self, tasks) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 753, in execute self.execute_save_steps(trans, task) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 768, in execute_save_steps self.save_objects(trans, task) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 759, in save_objects task.mapper._save_obj(task.polymorphic_tosave_objects, trans) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/orm/", line 1413, in _save_obj c = connection.execute(statement.values(value_params), params) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/engine/", line 824, in execute return Connection.executors[c](self, object, multiparams, params) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/engine/", line 874, in _execute_clauseelement return self.__execute_context(context) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/engine/", line 896, in __execute_context self._cursor_execute(context.cursor, context.statement, context.parameters[0], context=context) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/engine/", line 950, in _cursor_execute self._handle_dbapi_exception(e, statement, parameters, cursor, context) File "/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r649 8-py2.7.egg/sqlalchemy/engine/", line 931, in _handle_dbapi_exception raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect) ProgrammingError: (ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. u'UPDATE job SET update_time=?, stdout=?, stderr=? WHERE = ?' ['2012-04-24 18:55:45.791417', '', 'BiocInstaller version 1.5.7, ?biocLite for help\nWarning message:\nNAs introduced by coercion \nLoading required package: methods\nLoading required package: limma\nLoading required package: BiasedUrn\nLoading required package: geneLenDataBase\nLoading required package:\nLoading required package: AnnotationDbi\nLoading required package: BiocGenerics\n\nAttaching package: \xe2\x80\x98BiocGenerics\xe2\x80\x99\n\nThe following object(s) are masked from \xe2\x80\x98package:stats\xe2\x80\x99:\n\n xtabs\n\nThe following object(s) are masked from \xe2\x80\x98package:base\xe2\x80\x99:\n\n anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,\n get, intersect, lapply, Map, mapply, mget, order, paste, pmax,\n, pmin,, Position, rbind, Reduce,,\n rownames, sapply, setdiff, table, tapply, union, unique\n\nLoading required package: Biobase\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n \'browseVignettes()\'. To cite Bioconductor, see\n \'citation("Biobase")\', and for packages \'citation("pkgname")\'.\n\nLoading required package: DBI\n\nCalculating library sizes from column totals.\nError in matrix(u, nrow = nrows, byrow = TRUE) : \n negative extents to matrix\nCalls: plotMDS.DGEList ... equalizeLibSizes -> splitIntoGroups -> lapply -> FUN -> matrix\nExecution halted\n', 15] Note that if I run my script from the command line: ./ RNASeq.R /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat Fly 1 1 Tagwise MDSPlot.pdf outputs.csv It works fine and does not produce a warning about "NAs introduced by coercion", nor does it fail with the "Error in matrix" above. So, can anyone tell me what is going wrong here? Why does R behave differently in galaxy than it does on the command line? (I'm using the same instance of R, same machine, for my galaxy and command-line efforts). Is this 8-bit bytestring error a red herring? Can I filter it so that galaxy is happy? Finally, one other curiosity. Every time I hit "Execute" in galaxy to run my tool, it is run twice--two jobs are created (which each fail in the same way). Why is this? My R script: My XML file: I can share more data (such as sample input files) if necessary. Thanks for your help. Dan
galaxy • 1.7k views
ADD COMMENTlink written 6.6 years ago by Dan Tenenbaum20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour