
Pooling fail?
Reported by Carl | November 5th, 2008 @ 07:56 PM
We just moved our app to Merb 1.0 RC5 and DM 0.9.6 (from earlier betas). We're now seeing a very weird behavior where our Mongrels eventually stop responding to requests to actions using the database. The requests just "hang" and never complete.
The only way to restore normal functionality is to kill the Mongrel and restart it. It happens after a few hours so it's not easy to reproduce, but it's happening on ALL our Mongrels and all the time, so it's not an isolated case.
I used GDB to get the backtrace of a stuck Mongrel and it seems like the problem is somewhere in DataMapper's connection pool. Here's what I got:
["/usr/lib/ruby/gems/1.8/gems/extlib-0.9.8/lib/extlib/pooling.rb:177:in `new'",
"/usr/lib/ruby/gems/1.8/gems/extlib-0.9.8/lib/extlib/pooling.rb:156:in `synchronize'",
"/usr/lib/ruby/gems/1.8/gems/extlib-0.9.8/lib/extlib/pooling.rb:156:in `new'",
"/usr/lib/ruby/gems/1.8/gems/extlib-0.9.8/lib/extlib/pooling.rb:107:in `new'",
"/usr/lib/ruby/gems/1.8/gems/data_objects-0.9.6/lib/data_objects/connection.rb:20:in `new'",
"/usr/lib/ruby/gems/1.8/gems/dm-core-0.9.6/lib/dm-core/adapters/data_objects_adapter.rb:147:in `create_connection'",
"/usr/lib/ruby/gems/1.8/gems/dm-core-0.9.6/lib/dm-core/adapters/data_objects_adapter.rb:170:in `with_connection'",
"/usr/lib/ruby/gems/1.8/gems/dm-core-0.9.6/lib/dm-core/adapters/data_objects_adapter.rb:61:in `read_one'",
"/usr/lib/ruby/gems/1.8/gems/dm-core-0.9.6/lib/dm-core/repository.rb:72:in `read_one'",
"/usr/lib/ruby/gems/1.8/gems/dm-core-0.9.6/lib/dm-core/model.rb:263:in `first'",
"/var/www/current/app/controllers/application_status.rb:12:in `verify_if_up'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/controller/abstract_controller.rb:311:in `send'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/controller/abstract_controller.rb:311:in `_call_action'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/controller/abstract_controller.rb:285:in `_dispatch'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/controller/merb_controller.rb:190:in `_original_dispatch'",
"./lib/ext/merb/merb_controller.rb:6:in `_dispatch'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/dispatch/dispatcher.rb:101:in `dispatch_action'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/dispatch/dispatcher.rb:73:in `handle'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/dispatch/dispatcher.rb:35:in `handle'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/application.rb:17:in `call'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/handler/mongrel.rb:75:in `process'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:159:in `process_client'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `each'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:158:in `process_client'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `initialize'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `new'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:285:in `run'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `initialize'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `new'",
"/usr/lib/ruby/gems/1.8/gems/mongrel-1.1.5/lib/mongrel.rb:268:in `run'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/mongrel.rb:35:in `start_server'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:230:in `start_at_port'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:52:in `spawn_worker'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:96:in `start'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:95:in `upto'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:95:in `start'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:94:in `catch'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/rack/adapter/abstract.rb:94:in `start'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/server.rb:173:in `bootup'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/server.rb:158:in `daemonize'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/server.rb:142:in `fork'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/server.rb:142:in `daemonize'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core/server.rb:35:in `start'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/lib/merb-core.rb:134:in `start'",
"/usr/lib/ruby/gems/1.8/gems/merb-core-0.9.13/bin/merb:11",
"/usr/bin/merb:19:in `load'",
"/usr/bin/merb:19"]
For the record, we're using MySQL.
Comments and changes to this ticket
-
Carl November 12th, 2008 @ 05:16 PM
- Tag changed from bug, database, dataobjects, db, dm-core, do_mysql, extlib, merb_datamapper, mysql, pool, pooling to bug, critical, database, dataobjects, db, dm-core, do_mysql, extlib, merb_datamapper, mysql, pool, pooling
Any update on this? Or maybe a workaround? It makes Datamapper completely unusable for us, and probably others.
Thanks!
-
Dan Kubb (dkubb) November 12th, 2008 @ 09:08 PM
- State changed from new to open
Carl, sorry to hear about this problem.
Unfortunately I think Sam is the only DM contributor who understands the pooling code, so we're going to have to wait until he has some spare time to look into this.
I'd help, but I've never programmed using threads in Ruby so I don't know any of it's idiosyncracies.
Sam: do you know if you'll have an spare time to look into this?
Carl: in the meantime, do you think you could put together a reduction that reproduces your problem with more consistency? Even a stand-alone program that exercises the pooling code directly that can be left alone to run for a while and causes the same lock-ups would help greatly. While I don't know Ruby Threads, I know from personal experience these classes of problems are the worst to try to figure out what's going on because it's next to impossible to reproduce in any consistent manner.
Also, I'd imagine it would be helpful to know what version of Ruby and DB you're running this with.
-
Carl November 13th, 2008 @ 10:06 AM
Dan,
It's a hard problem to reproduce since it only happens in production with some decent load. Also, it takes a few hours before the pooling starts acting up. It's very unpredictable.
However, what I can tell you is that it was working fine with a prior beta of DM. I "believe" we were on 0.9.3. So something went wrong between then and now.
My guess is that just any app will have the same problem with some load. We're running with the Merb mutex off, which I thought could cause the problem, so I also tried with mutex on, but it happened again.
We're using MySQL 5 and Ruby 1.8.6.
I'll be glad to give you more details.
-
Sam Smoot November 13th, 2008 @ 10:22 AM
Carl,
I'm sorry I haven't got back to you sooner. I'll do my best to see if I or Bernerd can dig into this this afternoon.
-Sam
-
Carl November 13th, 2008 @ 02:34 PM
Sam,
thanks for that! I appreciate. It seems like an important enough bug to fix asap too. I doubt we're the only ones affected by that.
-
Sam Smoot November 13th, 2008 @ 05:10 PM
Carl,
Could you add the following to your debugging when this happens?
DataObjects::Mysql::Connection.__pools.inspect
The problem is, we don't have this issue, and we run the latest DM on sites that use both Mysql and Postgres, so I'm having a hard time reproducing it.
The above line won't really tell me the answer, but it'll help since I'll be able to spot if your pools are maxed out, which would imply code somewhere that isn't releasing connections properly (most likely in DM unless you're using DO connections directly anywhere), or if the pools look healthy, then... well I'm not sure, but I'm sure some inspiration will strike.
-
Sam Smoot November 13th, 2008 @ 05:22 PM
Carl,
Actually, pooling.rb#177 does tell me one thing at least. My stupid naive check for threads that I should have taken out awhile ago is getting checked.
I'd definitely lean towards this being a pool starvation problem. Which shouldn't be incredibly difficult to track down. trace when a Connection is allocated, and pop a stack when it's released. The traces should tell us what's aquiring a Connection and not releasing it thereby starving the pool and possibly showing false-positives on other stack-traces when we actually hit the wall.
-
Sam Smoot November 13th, 2008 @ 05:32 PM
Not a solution, but explains why you'd get a failure on that particular line: http://github.com/sam/extlib/com...
Which will hopefully make tracking down the issue easier.
-
Carl November 13th, 2008 @ 09:28 PM
Sam,
Thanks for looking into this. Definitely appreciated.
Where should I put DataObjects::Mysql::Connection.__pools.inspect exactly? The code hangs and doesn't throw an exception, so I'm not quite sure where the best place to inspect it is.
Maybe I could put a timeout around the wait.wait(lock), rescue it and inspect the pools there?
What do you suggest?
thanks!
-
Carl November 14th, 2008 @ 09:22 AM
Here's my quick'n'dirty patch to find out what goes wrong:
begin require 'timeout' Timeout::timeout(5) { wait.wait(lock) } rescue x = DataObjects::Mysql::Connection.__pools.inspect File.open("/tmp/datamapper.log","a") { |f| f.puts x } raise end
I'll update here when the problem happens.
-
Carl November 20th, 2008 @ 10:57 PM
Good news!
We can't reproduce this anymore with DM 0.9.7. I'm not sure if it's 0.9.7 or something we did on our end that did the trick. All I know is that it hasn't happened in 2 days!
Don't you like when tickets close themselves just like that?
-
Dan Kubb (dkubb) November 23rd, 2008 @ 08:13 PM
- State changed from open to resolved
Carl, I'll close this down for now. If you notice it come back please let me know.
Alternatively you could try the experimental extlib and dm-core that we're working on right now:
http://github.com/dkubb/extlib/t... http://github.com/dkubb/dm-core/...
There's a few pooling fixes and changes there, and of course large changes to dm-core (which isn't quite stable yet, but will be soon).
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »