Question: DESeq2 ascii codec can't encode character
0
gravatar for christophe.habib
2.8 years ago by
France
christophe.habib340 wrote:

Hello everyone,

I have this error when I try to use DESeq2 1.8.2 from iuc team :

I guess there is a special character in one of the htseq-count output because the gff file contains it. I tried to clean the gff file to regenerate the htcount outputs, but I still have the same issue (but I could have "forget" one special character)

I would like to find the good place to make things work, like by adding something to decode/encode the string that create this error. First, I wonder if the issue comes from python, the DB (in postgresql which is in UTF8) or from R script. The error suggests that it is in python, but i'm not that sure about that.

Any guidance would be very appreciated, did you already met this error ?

Thank you !

Christ

ascii codec character deseq2 • 1.0k views
ADD COMMENTlink modified 2.7 years ago • written 2.8 years ago by christophe.habib340

It's more likely that the unicode character is included in the stderr or stdout. Can you check your locale setting of your Galaxy host and switch them to UTF-8 if possible.

ADD REPLYlink written 2.7 years ago by Bjoern Gruening5.1k

I just switched all locales to fr_FR.UTF-8. And I restarted the galaxy instance.

LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

But the issue is still here. Do I have to regenerate the htseq-count files before trying to use DESeq2 again ?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by christophe.habib340

I tried to regenerate all htseq-count files, it still doesn't work..

I just noticed that my DB was actually in ASCII : 

 server_encoding
-----------------
 SQL_ASCII

So I guess the issue is here. How do I change this properly ? I found that :

  1. Dump your database
  2. Drop your database,
  3. Create new database with the different encoding
  4. Reload your data.

Is that OK ? Do I have something else to do on the dumped database like managing the special charaters before reloading it ?

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by christophe.habib340
0
gravatar for christophe.habib
2.7 years ago by
France
christophe.habib340 wrote:

I tried to swith my ASCII postgresql DB to UTF8. To do so I changed the template encoding :

UPDATE pg_database SET datistemplate = FALSE WHERE datname = 'template1';
ALTER DATABASE template1 RENAME to templatebackup;
CREATE DATABASE template1 WITH TEMPLATE = template0 ENCODING = 'UNICODE';
UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template1';
\c template1
VACUUM FREEZE;

then

create database formation2;
ALTER database formation2 owner to galaxy;
psql formation2 < formation.psql;

and I just modified the connection line in the galaxy.ini to change the database from formation to formation2.The new database is in UTF8 as expected.

Now I have this error within DESeq2 :

What do you think about that ? Could it be related to the change of encoding of my database ? Or is it related to DESeq2 itself ?

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by christophe.habib340

According to bag, it looks like this new error comes from the inputs more than from any encoding issue. I guess the case is solved.

ADD REPLYlink written 2.7 years ago by christophe.habib340
1

I found where this error comes from. htseq-count is case sensitive so :

Multiple antibiotic resistance protein marC 

is different of

Multiple antibiotic resistance protein MarC 

But DESeq2 is not case sensitive. So he finds 2 rows with same name, and it leads to this error.

So it confirms that what I did regarding the database worked fine.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by christophe.habib340
1

Oh this is really a awful corner case. Sorry for all your inconvenience.

ADD REPLYlink written 2.7 years ago by Bjoern Gruening5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour