
find option :order results in inconsistent collection
Reported by Corrin Meyer | May 22nd, 2008 @ 08:04 AM
It seems that performing a find and sorting by a DateTime in either ascending or descending order results in an identical array of objects. I have example code at http://pastie.caboo.se/pastes/20....
If line 14 is changed such that it is ordered as such...
:order => [:published_at.asc]
When doing a search I get back identical data sets. I don't know if this is in DataObjects or in DataMapper. I am using the SQLite3 adapter from DataObjects.
Comments and changes to this ticket
-
Corrin Meyer May 25th, 2008 @ 03:58 PM
- Title changed from find ordered by DateTime not working to find option :order results in inconsistent collection
After a little more experimentation, I have found what appears to be an inconsistency with the DataMapper::Collection returned by a find statement with an ":order" option.
Going with the above example, I have 2 articles in the database, both with a non-null "published_at" entry. I then execute the following in an interactive merb session...
a0 = Article.all(:order => [:published_at.desc])[0]
a = Article.all(:order => [:published_at.desc])
I would expect a0.id == a[0].id would be "true", however it is "false". Comparing all the other fields indicates that indeed a0 and a[0] are not the same article at all.
-
Dan Kubb (dkubb) May 25th, 2008 @ 06:22 PM
cjmeyer, I'm having trouble duplicating this on my end. Can you enable logging within your app and pastie what actual SQL queries are being executed? If you're using merb, try using the instructions in #297.
-
Corrin Meyer May 25th, 2008 @ 06:55 PM
I have pastied my calls to Article#all and the logged SQL statements.
This is actually in an application that I am working on and is on GitHub so if you want to grab the actual code causing this, it is at git://github.com/cjmeyer/ao-website.git. Just setup your database using SQLite3 and use interactive merb.
For my setup, once in interactive merb, execute the statements in
I have a spec in my project that can be pulled from Git, but it is passing, so it maybe that I am only seeing this in interactive merb.
Also, when in interactive merb, it seems to be printing out some weird statements. I pastied them at http://pastie.caboo.se/203218. The descending search seems to print out article id=2 twice but when accessing the actual collection both articles are indeed returned.
To add another wrinkle...I tried adding a spec to my project to show the failure and it passes.
-
Corrin Meyer May 25th, 2008 @ 06:56 PM
Oops...forgot the merb interactive and DataMapper log output that you requested...
-
Dan Kubb (dkubb) May 26th, 2008 @ 12:11 AM
Thanks for the log output. I can see now what the problem is now, and I think it points to a bug in how Collection and Resource work together. Actually there are two problems that this highlights:
1) When the collection is retrieved it looks like it pulls back a few fields, except for body an excerpt, which are lazy due to them both being Text fields. Something inside Collection or Resource is then accessing one or both of the lazy fields causing Collection to be reloaded, accounting for the second query. This second query should not happen unless the field is explicitly accessed by user-level code.
2) The second problem is that when a Collection is reloaded due to lazy loading, the system re-queries the data source and updates the Collection with the latest results. The new information is merged with the old, and the Collection is re-ordered to match the way the results came out of the data source. Since the second query doesn't appear to be using an ORDER BY clause, the order may not match the original Collection's.
As a temporary solution if we could figure out why the resources are being reloaded due to being too aggressive with the lazy loading, both problems would go away in these specific instances.
However, to truly fix this problem Collection is going to need to be modified to hold onto the query it was originally generated with, so that when asked to reload itself, it can ensure the same :order is used, among other things. Several other tickets in LH are in need of this modification too.
-
Corrin Meyer May 26th, 2008 @ 12:20 AM
That would completely explain why my spec passes. When using interactive merb and doing a search such Article.all() it prints out the article data, including the lazy fields which would be why on such a find there are 2 queries. Thanks for looking at this Dan.
-
Dan Kubb (dkubb) May 26th, 2008 @ 02:54 AM
I actually wonder if the too aggressive lazy-loading is happening due to the Resource#inspect method that is being triggered by ri to display the internal state of the resources in the interactive shell.
As a test, try adding an inspect method to your model that just returns a string like "Hello World" or something, and then repeat the same actions you did in ri. I'd bet the first SQL query with the ORDER BY would fire, but not the second one with the IN() clause.
If this experiment works, then we should probably update the Resource#inspect method to just display loaded attributes, not all attributes as it currently does.
Of course this doesn't solve the need for Collection to preserve its original order when reloaded, but would at least stop these "phantom" lazy loads from happening.
-
Corrin Meyer May 26th, 2008 @ 09:19 AM
So I added a Article#inspect method and I think it is doing what you were expecting. I have pastied some accesses to the database and the resulting SQL statements generated at http://pastie.caboo.se/203427.
One thing that I am noticing is that if I do "a0 = Article.all(:order => [:published_at.desc])" the body and excerpt fields (text) are not loaded; this I expect. However, when I do a "a0.body" it loads the body of the article but it issues a SQL statement that searches "WHERE id IN (2, 1)". I would think that it should only be searching "WHERE id = 2" for example since when I access "a1.body" it does the same search again.
-
Dan Kubb (dkubb) May 26th, 2008 @ 01:42 PM
Corrin, thanks for testing this. It does look like Resource#inspect is causing the extra loading. Of course this still doesn't solve the underlying problem, but its good to know.
What you're seeing is a solution to the "n+1 select problem" (google this if you've not heard of it). This is one of the important parts of DM's design.
-
Corrin Meyer May 26th, 2008 @ 04:10 PM
If I do a query like "a0 = Article.all(:order => [:published_at.desc])[0]" (Note that I am retrieving only the first entry in the result array), I can then see that "a0.id", for example, equals "1". Now, if I do "a0.body", which is a text field, I know that it issues another SQL statement for to load the "body" field because it is lazy.
What form of SQL statement should I expect to be issued with loading the lazy loaded field "body"?
Since "a0" should represent a single Article, calling "a0.body", I would think, would issue a SQL statement similar to...
"SELECT body FROM articles WHERE id = 1"
However, I see...
"SELECT body FROM articles WHERE id = (1, 2)"
Is this what you are refering to as part of the solution to the "n+1 select problem"? Since "a0" should be a single Article instance, it would seem that this is more work on the database though I understand this may just be a undesirable side effect.
Now, if I had issued "a = Article.all(:order => [:published_at.desc])" and then access "a[0].body" I can see why issuing the second SQL statement above would make sense since if we are accessing one body of an article in the result Array then it is likely we are going to be accessing the body field of more articles in the result array.
Sorry if I am missing the point of your previous post.
-
Dan Kubb (dkubb) May 26th, 2008 @ 08:37 PM
- State changed from new to open
The SQL statement you should expect when doing a0.body would be a query to retrieve the body (and other lazy properties within the same context) for a0 and every other entry in the collection it belongs to.
The best way to retrieve the first entry in a result Array would be to go "a3 = Article.first(:order => [:published_at.desc])" not to load up the entire collection and then just pluck out the first entry. If you use first(), the collection would only contain a3, and thus when you access it's lazy properties it would reload the collection it belongs to, which includes only itself.
In the case where you're using all() DM currently has no idea you mean to retrieve just the first entry of the entire collection, so it has to assume when you access any single entry's lazy attributes that you mean to access all of the entries' attributes. If DM loaded each entry's attributes one at a time with the first query you have in your last comment, you'd have the n+1 select problem.
Most of the time when you're using a collection you mean to iterate over it, so it's safe for DM to act in this manner. Very rarely would you pull back a collection just to pluck out one or two entries.. in those cases you're much better off using first() to pull out exactly what you want.
You really should read the article I linked to in my previous comment though. It explains n+1 select problem and how DM lazy loads stuff to get around it. It probably isn't the best place to explain it here, and it's such a common problem that you can find dozens of articles on it in Google.. however, if you read the article and do a google search, but it's still not clear, hop onto IRC and I'd be happy to explain it there.
-
Corrin Meyer May 26th, 2008 @ 09:01 PM
Dan, thanks for the info. I think what was sticky for me was that the collection returned by "Article.all(:order => [:published_at.desc])[0]" has no idea that it is not an "array" collection.
-
Dan Kubb (dkubb) May 29th, 2008 @ 02:56 AM
- State changed from open to resolved
- Assigned user changed from Sam Smoot to Dan Kubb (dkubb)
This was fixed by the commit Fixed Collection to remember the query order it was generated with
Corrin, please test this out and add a comment and/or re-open the ticket if this is still a problem for you.
-
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
People watching this ticket
Referenced by
-
299 sorting by DateTime not working This is a duplicate of ticket #301 which I posted. I di...
-
299 sorting by DateTime not working Closing because this is a duplicate of #301