#1120 ✓resolved
Kengo Matsuyama

do_sqlite3 always return 'ASCII-8BIT' string in Ruby 1.9

Reported by Kengo Matsuyama | November 10th, 2009 @ 02:48 AM

In Ruby 1.9 environment, do_sqlite3 always return 'ASCII-8BIT' encoding String object. I think this should be 'UTF-8'.

I'm using next' branch on Ruby 1.9.1p243.

My test script is here:
http://gist.github.com/230744

And here is a patch to fix this problem:
http://gist.github.com/230745

diff --git a/do_sqlite3/ext/do_sqlite3_ext/do_sqlite3_ext.c b/do_sqlite3/ext/do_sqlite3_ext/do_sqlite3_ext.c
index 87923d5..f62f8e8 100755
--- a/do_sqlite3/ext/do_sqlite3_ext/do_sqlite3_ext.c
+++ b/do_sqlite3/ext/do_sqlite3_ext/do_sqlite3_ext.c
@@ -604,6 +604,9 @@ static VALUE cCommand_execute_reader(int argc, VALUE *argv, VALUE self) {
 
   rb_iv_set(reader, "@reader", Data_Wrap_Struct(rb_cObject, 0, 0, sqlite3_reader));
   rb_iv_set(reader, "@field_count", INT2NUM(field_count));
+#ifdef HAVE_RUBY_ENCODING_H
+  rb_iv_set(reader, "@connection", conn_obj);
+#endif
 
   field_names = rb_ary_new();
   field_types = rb_iv_get(self, "@field_types");

Comments and changes to this ticket

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) November 10th, 2009 @ 11:28 AM

    • State changed from “new” to “unconfirmed”
    • Assigned user set to “Dirkjan Bussink”
  • Philipp Pirozhkov

    Philipp Pirozhkov November 11th, 2009 @ 02:14 AM

    Had same issues.
    Patch works fine for me.
    ruby 1.9.1p243, patch applied to trunk DO

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) November 11th, 2009 @ 02:03 PM

    @Kengo: Would you be able to provide a failing spec that confirm this behavior? It would be great if this was in the data_objects shared specs (in data_objects/lib/data_objects/specs) because then we could use them to verify the other DO drivers are also handling encoding in the same way.

    I think the current shared specs for encoding just confirm that the connection object has the correct character set, and not that the actual database is accepting and returning UTF-8 encoded strings properly.

  • Kengo Matsuyama
  • Kengo Matsuyama

    Kengo Matsuyama November 12th, 2009 @ 12:19 PM

    FYI.
    After some investigations, I noticed do_sqlite3 has other problems to support encoded string object.
    * SQLite3 only supports UTF-8. * encoding option in URL is ignored. * Incomplete support for Extlib::ByteArray. * Connection#quote_byte_array can't handle \0 (NUL) character correctly. * Connection#character_set method is missing.

    I tried to write patch to fix these problems, but I couldn't finish.
    http://gist.github.com/233113

    3 failures were still remained with this patch.

    1) DataObjects::DataError in 'DataObjects::Sqlite3 with ByteArray writing a ByteArray should return the correct entry' Reader is not initialized

    I couldn't understand what this spec intend to.

    2) 'DataObjects::Sqlite3::Connection character_set sets the character set through the URI should == "ISO-8859-1"' FAILED expected: "ISO-8859-1",

       got: "UTF-8" (using ==)
    

    This spec always fail because SQLite3 and do_sqlite3 supports UTF-8 only.

    3) DataObjects::ConnectionError in 'DataObjects::Sqlite3::Connection with encoded string support reading a ByteArray before(:all)' database table is locked

    What???

  • Dirkjan Bussink

    Dirkjan Bussink November 12th, 2009 @ 01:44 PM

    • State changed from “unconfirmed” to “resolved”

    The problem is that with the shared specs they should be grouped by features that a system supports. I've reworked your patch to have a separate shared spec group, since otherwise it's impossible to cherry pick the supported shared specs.

    The database table locking happens because the reader wasn't closed. I've reworked your patch in the following commit and applied it:

    http://github.com/datamapper/do/commit/18a24f87929c591ef11c5d3d5c48...

    I've also changed the shared spec name here and made a different group for the default UTF-8 encodings. I've also added these specs for do_mysql and do_postgres, who were already passing.

    Thnx for the help with the specs, this a good start in having proper specs for 1.9 encodings. The 'different encodings' group should probably be extended so it tests that when for example latin1 is specified, strings on 1.9 are actually in that encoding.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Pages