#238 ✓resolved
Sergei Yakovlev

[Suggestion] Lazy query slicing

Reported by Sergei Yakovlev | May 3rd, 2008 @ 04:33 PM

Calling slice() on a query should modify offset / limit parameters, instead of executing the query. For example, the following statements

Goblin.all(:level => 4)[40]
Goblin.all(:level => 4)[40, 10]
Goblin.all(:level => 4)[40...60]

should be equivalent to

Goblin.first(:level => 4, :offset => 40)
Goblin.all(:level => 4, :limit => 10, :offset => 40)
Goblin.all(:level => 4, :limit => 20, :offset => 40)

Comments and changes to this ticket

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 5th, 2008 @ 05:38 AM

    This is something I would like to work towards supporting with the new DataMapper::Collection object.

    The Collection object was designed to mimic an Array in alot of ways, except it lazy-loads the data from the backend only when a kicker method is called (to_a, each, etc).

    Before the Collection object will be able to perform something like this it's going to have to store the Query object that was passed into Goblin.all(). Using that Query object as a base it can construct a new query with the additional limit and/or offset parameter to restrict the results.

    This prep work is something I will be working towards over the next week or so. Once that is complete, updating the [] method to work the way you outlined should be rather simple.

  • Adam French

    Adam French May 6th, 2008 @ 09:08 PM

    this is a great suggestion. ++

    dkubb, glad your thinking ahead for something like this.

  • Sam Smoot

    Sam Smoot May 9th, 2008 @ 12:16 AM

    • Milestone cleared.
  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 14th, 2008 @ 10:48 AM

    • Assigned user set to “Dan Kubb (dkubb)”
  • Sindre Aarsaether

    Sindre Aarsaether May 18th, 2008 @ 08:05 AM

    It should be possible to circumvent this. There are many times when I want to know the length of the actual result-set, and only loop through a certain # of them. This will be possible in some way right?

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 18th, 2008 @ 01:45 PM

    Somebee, there are a couple of ways to approach the above logic, but the most efficient one with the DO adapters is to use Model.count (available in dm-aggregates) to get the total number of rows, and then slice the ones you want from that.

    I doubt dm-core will support something like this natively, although it would be pretty simple to make a dm-pagination plugin and use the Paginator gem (http://paginator.rubyforge.org/) to use the primitive methods to slice out the records you want.

    will_paginate, probably one of the best AR paginator plugins uses Paginator under the hood like this and it works great.

    In many cases it would be too inefficient to pull back all the rows and then slice out from there.. imagine a case where you want to display 20 records, but there are 10,000 results in total. You wouldn't want to pull back all 10,000 rows, just to throw away all but 20 of them.

    Related to slicing, is the concept of pulling back results in batches for processing, rather than pulling them all back in a single query, like if you were iterating over the rows using each. Collection currently doesn't support this, but there are a few gotchas with it the way most people implement it. One thing is that when the offset becomes large, the database still needs to skip over all the rows to find the ones you want, which can get progressively slower the further in you get.

    Server-side cursors might get rid of this, but I don't know how supported they are in all DB's. If Sam is interested in DM supporting this, I'd love to add support into Collection.

  • Sindre Aarsaether

    Sindre Aarsaether May 18th, 2008 @ 02:25 PM

    In many cases it would be too inefficient to pull back all the rows and then slice out from there.. imagine a case where you want to display 20 records, but there are 10,000 results in total. You wouldn't want to pull back all 10,000 rows, just to throw away all but 20 of them.

    Well, that is what :limit is for right? I'm only saying that making #[] behave the way outlined might be a little problematic in some cases.

    Sometimes you might want to display @articles[0,5] somewhere, and [5,15] somewhere else in the same view. Should this execute two queries? Even if you have already limited articles with @articles = Article.all(:limit => 20, ...)?

    But yeah, I see the pros for this to.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 19th, 2008 @ 02:07 AM

    I think Collection should act like how I outlined above: i.e. it should be lazy by default. There's no way Collection could be smart enough to automatically switch from lazy to a non-lazy mode depending on the context. Better to be explicit and go:

    collection = Article.all(:limit => 20) # no query

    @articles = collection.to_a # query

    @articles[0,5] # no query

    @articles[5,15] #no query

  • Sindre Aarsaether

    Sindre Aarsaether May 19th, 2008 @ 03:16 AM

    +1

    After thinking it through, I agree that making #[] lazy is actually the most consistent behavior, and a very sexy way to define limit, and offset.

    My realworld scenarios has usually been from complex sqls that I need to do manually anyways, and your suggestion outlined above is clearly good enough for the simple stuff.

    But, just to make sure it behaves as one would expect in ruby. What happens here?

    1. @articles = collection[10,20]

    2. @articles[5,10]

    1 should return a modified collection-object with limit and offset right? Then what does 2 return? I know this is far-fetched, but imo this should (intuitively) return a collection with limit 10, and offset 15 (just as it would if this was an actual array). Not trying to be problematic here :-)

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) May 19th, 2008 @ 11:44 AM

    Hmm, that is an interesting example. I suppose you could handle it one of two ways:

    1. make #[] a kicker method and @articles would just be an Array and it would work as normal ruby

    2. add logic to Collection so if you take a slice of a slice, it knows to set the offset to 15, and the limit to 10.

    In the second case it would also have to be smart enough that if the first slice was 10,10, that we're looking at records 10...20, and 5,10 would only return records 15..20 since its a subset of a subset.

    What I would probably do is start with option #1 first, spec all the behavior using this approach, and then behind the scenes implement #2 and make sure it passes those specs. That way I get behavior that I know is identical to how ruby Array's work, which I think is important for any object that is supposed to act like an Array (hence the large number of specs for LazyArray and Collection).

    I still think #2 is likely to be most efficient and consistent overall, and if people want behavior #1, they can just use to_a explicitly to work with a plain Array.

  • Sam Smoot

    Sam Smoot June 2nd, 2008 @ 04:04 PM

    • Assigned user changed from “Dan Kubb (dkubb)” to “Adam French”

    Ok, well, supporting this at this point would be really simple. Just need to decide on how to do it. And it needs to be well documented from the outset of course. I'm OK with #[] slicing with the API defined in the opening comment.

    Adam, want to give it a go while you're waiting for your plane?

  • Adam French

    Adam French June 5th, 2008 @ 05:36 PM

    • Assigned user changed from “Adam French” to “Dan Kubb (dkubb)”
  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 8th, 2008 @ 06:26 AM

    Lazy slicing has been added to Collection in the following commit:

    http://github.com/sam/dm-core/co...

    This adds Collection#slice and Collection#[], which take the same arguments, similar to Array and the example above.

    Please give it a shot and let me know if it works out for you.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 8th, 2008 @ 10:09 AM

    • State changed from “new” to “resolved”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

People watching this ticket

Pages