#1064 unconfirmed
Brock Whitten

UTF-8 or utf8 for encoding?

Reported by Brock Whitten | September 29th, 2009 @ 04:23 PM

automigrate only sets the table to to utf8 if the datamapper encoding is all uppercase "UTF-8". However, For datamapper to treat/save the content as utf8 it wants the encoding set to "utf8". This could have something to do with do_mysql and dm-core wanting two different things. Or perhaps its a dm-migrations thing.

tested with:

mysql 5.0.51b
dm-core 0.10.0
dm-migrations 0.10.0
data_objects 0.10.0
do_mysql: 0.10.0
ruby 1.9.1p129

Comments and changes to this ticket

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) October 7th, 2009 @ 12:59 PM

    • State changed from “new” to “hold”
    • Assigned user set to “Dan Kubb (dkubb)”

    @Brock: I'm not sure what you mean. DataMapper.auto_migrate! sets the table to the same encoding as whatever the connection's character set is, which you control when you configure MySQL. If for some reason, there isn't one then it defaults to "utf8".

    Can you provide an example of what your specific problem is, and how I could reproduce it on my end?

    Marking this as "on hold" until Brock replies.

  • Brock Whitten

    Brock Whitten October 7th, 2009 @ 01:39 PM

    I have tested this further. I have found that encoding in DM setup must be set to "UTF-8" rather than "utf8" to get table set to "utf8" in mysql. DM gives a warning if encoding is set to "utf8" so I would say that everything is fine there.

    As for part two (saving content), Im having troubles replicating this problem. I have been running into a lot of encoding issues and this may be due to contaminated content that I have in my database. I will try to narrow this down to irreducible complexity.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) October 8th, 2009 @ 03:28 PM

    • State changed from “hold” to “unconfirmed”
    • Assigned user cleared.

    When you specify the encoding in DataMapper.setup, you need to use "UTF-8" (specifying nothing defaults to UTF-8 BTW). The reason is that's what Ruby itself uses to designate a String as "UTF-8" encoded, and not "utf8 or "utf-8" and we map those onto whatever each DB uses, which can sometimes be quite different. We designated Ruby's POV as being authoritative, and have Hashes that map the Ruby encodings to whatever the DB supports.

    If you have alot of contaminated data, it might be worthwhile using something like Iconv to convert each "cell" into UTF-8 encoding, compare to the original and prompt you when they differ, allowing you to review them before accepting or rejecting the change. It can sometimes time consuming, but it totally depends on the size and state of your dataset.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 22nd, 2010 @ 01:26 AM

    @Brock: Any luck reproducing this problem with the latest DM/DO? There were a number of encoding fixes in DO 0.10.2 (released this week) that may have solved some of the problems.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »