#374 ✓resolved
Dan Sully

dm-aggregates: 'group by' and 'distinct' support

Reported by Dan Sully | June 12th, 2008 @ 12:22 PM

The dm-works branch of 0.3 had a :group option to .all() that implemented "group by".

Comments and changes to this ticket

  • Adam French
  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 12th, 2008 @ 03:17 PM

    • State changed from “new” to “open”

    Dan, are you talking about a group_by like the following or something else?

    def group_by(&block)
      groups = []
      each do |entry|
        value = yield(entry)
        if(last_group = groups.last) && last_group.first == value
          last_group.last << entry
        else
          groups << [ value, [ entry ] ]
        end
      end
      groups
    end
    

    Something like this could work it's way into Collection (or rather LazyArray), and be built when the :group_by (or whatever) was passed into a Query.

  • Adam French

    Adam French June 12th, 2008 @ 03:33 PM

    I have a sneaky suspicion he's asking for the SQL equivalent of `GROUP BY`, which would belong to the realm of dm-aggregates in dm-more since it doesn't return a mappable object.

  • Dan Sully

    Dan Sully June 12th, 2008 @ 10:59 PM

    Yes - I meant "group by"

    Here's the commit for cardmagic's 0.3 dm-works branch:

    http://github.com/cardmagic/dm-w...

  • Dan Sully

    Dan Sully June 12th, 2008 @ 10:59 PM

    And DISTINCT, etc would also be nice.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 13th, 2008 @ 03:49 AM

    • State changed from “open” to “invalid”

    Based on conversations with Sam, GROUP BY support won't be making it into dm-core because it doesn't correspond with dm-core's goals of being an ORM, not a reporting tool.

    The dm-aggregates plugin was created for reporting, and DISTINCT and GROUP BY would be welcome additions as patches to that plugin.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 13th, 2008 @ 08:33 AM

    • State changed from “invalid” to “open”

    I mistakenly marked this as invalid, when it should remain open to remind us to add a :group and :distinct operator to Query within the dm-aggregates plugin.

    While patches are welcome, if you don't get to it, or are unfamiliar with DataMapper's internals enough to generate a patch we still need this ticket to ensure this is completed by someone.

  • Dan Sully

    Dan Sully June 13th, 2008 @ 11:42 AM

    In looking at dm-aggregates, it's geared towards returning a single value. "GROUP BY" on the other hand often returns an array of values, and I see it as simply a modifier on a SELECT, much like "ORDER BY".

    I also fail to see the distinction between ORM and reporting tool.

    Every ORM I've ever used (Class::DBI, DBIx::Class & EOF) have supported this functionality.

    This sounds like a semantics push back to me.

  • Adam French

    Adam French June 13th, 2008 @ 11:59 AM

    'GROUP BY' clauses must be used in conjunction with an aggregate operation of some-kind (sum(), count(), avg()...). If you don't, most (all?) SQL engines will throw an error and tell you that you need an aggregate function in the select

    http://www.w3schools.com/sql/sql...

    That's my reasoning behind my thinking it should go into dm-aggregates rather than into core.

  • Dan Sully

    Dan Sully June 13th, 2008 @ 12:19 PM

    Not to put too fine a point on it, but you're on crack. :)

    (Keep in mind that distinct() and "group by" are very similar:

    select * from queries where appname = "foo" group by hostname;

    or:

    select distinct(hostname) from queries where appname = "foo";

    Will give the same results. The "group by" is better for turning into DM objects though.

  • Sam Smoot

    Sam Smoot June 13th, 2008 @ 12:33 PM

    Dan,

    It's not semantics. ;-)

    I'm not against building in robust reporting functionality. It's just as important to a good application as an O/RM IMO. But they are absolutely not the same thing. Objects are generally lousy for reporting, and the path that AR takes leads to madness. How are you going to Unique (read:IdentityMap) objects if their keys bind them to aggregate data and not actual representations of full materialized objects?

    Adam's right for every database I've ever used. "SELECT * FROM queries WHERE appname = 'foo' GROUP BY hostname" will raise an error telling you that every column not grouped-by must be an aggregate function.

    I'm going to assume that was just a typo though, since your second example does something completely different. :-)

    Anyways, it's not push-back for it's own sake. GROUP BY and DISTINCT support will be added at some point. But not in dm-core because for the most part it only applies to an RDBMS.

    It also, similar to DM 0.3.x, should return Structs, and not Resource instances.

    So it's a work in process. :-) But confusing the two issues is just going to lead to a messier DM, with more untested edge-cases. A jack of all trades and a master of none.

    I would actually prefer to see a collaboration with the Sequel guys to handle the one-two punch of O/RM (DataMapper) + Reporting (Sequel), but I don't know how likely that is...

  • Carl Lerche

    Carl Lerche June 13th, 2008 @ 12:42 PM

    select * from queries where appname = "foo" group by hostname; will blow up on PostgreSQL and probably.

    However, I do want to protest as well that GROUP BY is only used for reporting purposes.

    Take tagging for example. "I want to find all items that were tagged with tag1, tag2, and tag2". Ok, this COULD be considered reporting, but still.... Also, it could be handled by sub-queries, but I'm not sure that sub-queries are the fastest solution.

  • Dan Sully

    Dan Sully June 13th, 2008 @ 12:44 PM

    I guess I don't see it as reporting because I'm not using it as such. I'm just needing it to effectively whittle down a result set in SQL instead of Ruby.

    And wrt the "group by" query above, it works fine in MySQL (5.x)

    ...

    286 rows in set (0.17 sec)

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 13th, 2008 @ 01:07 PM

    Being a MySQL guy myself I'll be the first to admit that MySQL isn't really compliant with the spec nearly as much as other DB's like PostgreSQL.

    It doesn't actually make any sense to have a mismatch between the fields in the SELECT part and the GROUP BY part. Essentially you're pulling back data that will be "folded" into the result in a non-deterministic manner. That's bad.

  • Adam French

    Adam French June 16th, 2008 @ 10:03 AM

    • Title changed from “Need group by” to “dm-aggregates: 'group by' and 'distinct' support”
  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 17th, 2008 @ 01:22 PM

    There is actually one place that GROUP BY might be necessary: hm:t and habtm.

    If you look at RelationshipChain#get_children you'll see there's a spot where the code performs a query joining multiple tables together, and then uses Array#uniq to make sure the return value only contains uniq resources.

    Obviously doing uniq here isn't optimal, since the result set could be huge. It also creates problems because the return value is a plain Array, so you lose some of the laziness that you get when using a Collection.

    A possibly solution to this would be to make Collection a Set underneath instead of an Array, but that might be alot slower, and it is sort of just sweeping the problem under the rug anyway.

    The downside to adding a GROUP BY to the Query system is it shift the burden to adapter authors who if their storage systems don't natively support it have to manually

    If there's an alternative approach to this issue than using a "GROUP BY" then I'm open for suggestions.

    If we do end up implementing this option I'd suggest the Query object get a :unique => true flag rather than passing in a :group_by option. We'd tell people that the results are grouped by whatever your Query :fields looks like. That way people can't make the common mistake of there being a mis-match between the SELECT fields and the GROUP BY fields.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 23rd, 2008 @ 07:38 PM

    • Assigned user changed from “Sam Smoot” to “Dan Kubb (dkubb)”
  • Dan Sully

    Dan Sully June 26th, 2008 @ 05:35 PM

    • Tag set to dm-more, feature

    Any progress here?

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) June 26th, 2008 @ 07:10 PM

    Nothing yet on my end. I'm a bit tied up with work right now. I do plan on it, but if you're interested in giving it a try I'd encourage it.

    If you fork dm-core I can review it and help you along. My only request is that the interface to Query work as outline in my preceding comment.

    I think it would actually be safe to make it so that :unique defaults to true when Query#links contains one or more links OR you're using an aggregate function AND at least one of the Query#fields is a non-aggregate function. This should work fine in every case I can think of. There's no reason you'd want duplicate rows returned inside dm-core, which as of right now can only happen in joins. With aggregate functions if there's not a non-aggregate field in the SELECT, then it doesn't make sense to use GROUP BY -- not to mention some DB's won't even allow you to use GROUP BY fields that aren't in the SELECT section.

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) July 14th, 2008 @ 12:29 AM

    FYI dbussink is working on this right now. You'll notice he committed some code to add GROUP BY SQL, and is currently working on updating dm-aggregates to work with it.

  • Dan Sully

    Dan Sully July 14th, 2008 @ 11:28 AM

    Where is the repo that contains these changes?

    Not seeing anything related under dbussink's github user.

    Thanks

  • Dan Kubb (dkubb)

    Dan Kubb (dkubb) July 15th, 2008 @ 02:52 PM

    • State changed from “open” to “resolved”

    The following commit adds Model#aggregate and Collection#aggregate which allow for multiple aggregate functions to be used in a single query, with automatic grouping when :fields is specified:

    http://github.com/sam/dm-more/co...

    Suggestions (and better yet patches) are welcome.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

People watching this ticket

Pages