
Add support for defining/changing default collation
Reported by Jordan Ritter | October 23rd, 2009 @ 01:07 PM
By default, DM creates tables in MySQL and SQLServer with a
default collation 'utf8_general_ci', making queries against
string/text fields case-insensitive.
This important fact is not visibly surfaced except during model creation and only when SQL logging is enabled. After that, even 'SHOW CREATE TABLE ...' (and mysqldump) won't emit the defined collation for the table. But it's there, and it's active.
GOOD: e.g. a User model, with first and last names, with a unique compound index. Different casing of the same name fails to insert (convenient). Searches (SELECT) don't have to do any field munging. Personally, I think this is such a common case that convenience trumping correctness may be justified.
BAD: referencing or replicating objects that rely on string keys instead of numerical keys. Salesforce is an excellent example: objects are keyed by alphanumeric, case-sensitive string IDs. Recently I heard an anecdote about a key collision on locally sync'd SF objects causing replication failures because the object IDs matched case insensitively in the local DB.
I'd argue that when this behavior manifests into a real-world
problem, the costs of discovering the cause plus fixing/working
around it are significant compared to the convenience
factor. In operational release/production environments, recreating
tables properly is expensive. Taking a DB down to change the
default collations is expensive. Even setting the
COLLATION_CONNECTION value in MySQL doesn't appear to override the
table's default (mine is set to 'latin1_swedish_ci', but my
'utf8_general_ci' and 'utf8_bin' tables still exhibited their
defined behavior).
AFAICT there doesn't appear to be an easy/standard way in DM to change this behavior. You can monkeypatch DataMapper::Migrations::MysqlAdapter#collation to return a different value, or manually set your COLLATION_CONNECTION in your local my.cnf (though I question whether the latter actually has an effect once the tables are created). But I submit that for the vast majority of common usage cases, wrt code and docs, there's little-to-no indication that this is an important consideration during model design, and I can't currently find any facility in DM to effect a (different) choice.
My suggestions for improvement:
- Make collation/charset specifiable via database.yml/::DM.setup (collation_connection)
- Make collation/charset specifiable per-model (applies to what's used when model is created)
- Make collation/charset specifiable per-field (applies to individual queries - at least possible in MySQL)
- [once DM has a mechanism to set this] Update website docs to acknowledge this as an important choice for RDBMS users when designing models
Comments and changes to this ticket
-
Dan Kubb (dkubb) November 10th, 2009 @ 07:20 PM
- State changed from new to unconfirmed
- Assigned user changed from Dan Kubb (dkubb) to Dirkjan Bussink
@Dirkjan: can you comment on this ticket? Do other DBs have the concept of collation? and can we expose that through the DO connection args?
-
Dirkjan Bussink November 21st, 2009 @ 06:25 AM
Collation doesn't really have a standardized approach. I know that with Postgres you have to define a system wide collation (8.3 and earlier) or one per database on database creation (with 8.4). So this is not an option that you can change on the fly on a connection.
I can imagine this being useful for MySQL though, but like already suggested, you probably also need control for collations at the table / column creation level. If you have that control, you also don't really need it that much on a connection, since you can just use it how the table specifies it.
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »