#875 ✓resolved
Sindre Aarsaether

UTF-8 strings are not set to utf-8 when loaded from db

Reported by Sindre Aarsaether | June 2nd, 2009 @ 12:41 PM

The problem is that data_objects/do_mysql/dm-core or whatever is responsible, does not tell ruby that we are dealing with utf-8 strings from database, even though encoding is set to utf8. Bug shown below. Crashes consistently on latest do/dm-core and ruby 1.9.1

# encoding: utf-8

require 'rubygems'
require 'dm-core'

  :adapter => 'mysql',
  :host => 'localhost', 
  :username => 'root',
  :database => 'dm_core_test',
  :encoding => 'utf8'

class User
  include DataMapper::Resource

  property :id, Serial
  property :name, String, :length => 255
  property :title, String

u = User.create(:name => "günther")

puts u.name.encoding if u.name.respond_to?(:encoding)
# => UTF-8

puts u.name.encoding if u.name.respond_to?(:encoding)
# => ASCI-8BIT

puts u.name+"übercool"
# => utf8_bug.rb:33:in `<main>': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)

# The name coming from db is already encoded in utf-8, so we only need to tell ruby that it is
newname = u.name.force_encoding('utf-8') # This does no conversion
puts u.name+"übercool"
# => güntherübercool

Comments and changes to this ticket

  • Sindre Aarsaether
  • Dirkjan Bussink

    Dirkjan Bussink June 2nd, 2009 @ 02:28 PM

    This is indeed a known issue, but just marking everything as utf-8 is not really the solution. What is needed is a mapping of the different encodings supported by the database such as mysql and postgresql to the appropriate ruby encoding. This way we can both support different encoding and ensure it always works properly.

  • Sindre Aarsaether

    Sindre Aarsaether June 2nd, 2009 @ 02:49 PM

    When do you think this will be working? I'm guessing almost all dm-users today are using utf-8, so I would hope that it is possible to support this before having a grand framework for all kinds of encoding (if that will take much longer). Right now, I'm guessing data-objects cannot be used with 1.9.1 for any international app.

    Is there anything I can help / participate with without any C++ knowledge? Ping me at #dm-hacking.

  • Dirkjan Bussink

    Dirkjan Bussink June 2nd, 2009 @ 03:36 PM

    When either I have time for it, or someone else steps up to the plate and builds it. The former can take some time, since I have a whole lot of things to do too and for personally it's not very high on the priority list.

    Just marking everything as utf-8 imho creates a wrong precedent, like it's kinda working when in fact it's just an ugly hack. You can of course put in this ugly hack yourself ;). I know that Nokogiri has proper support for 1.9 encodings, they added some macro's for string creation that wrap around the encoding stuff. Might be a good place to start looking at how to implement this for DataObjects.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 3rd, 2009 @ 11:50 PM

    Dirkjan, would a good place for Sindre to start be to document the mappings of all the MySQL and PostgreSQL encodings to the Ruby encodings? I assume we would need to work out all the mappings before programming anyway, and resolve any cases where the mapping is ambiguous, or there is no corresponding encoding in Ruby where one exists in MySQL or PostgreSQL.

    I don't believe this would require any C knowledge, which might mean it's a good place to start.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) July 22nd, 2009 @ 03:30 PM

    • Assigned user set to “Dirkjan Bussink”

    @Dirkjan: Is this ticket now resolved?

  • Dirkjan Bussink

    Dirkjan Bussink July 22nd, 2009 @ 03:37 PM

    • State changed from “unconfirmed” to “resolved”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »