Posted by Seamus on Tuesday, March 15, 2011.

What you should know about mysql2 memory usage

If you’re using mysql2, you should be aware of a memory usage issue:

# mysql2 gem - no way to avoid using a lot of memory if you're streaming a lot of rows
client = Mysql2::Client.new(:host => "localhost", :username => "root")
results = client.query("SELECT * FROM users WHERE group='githubbers'")

# mysql gem - keep memory usage flat if you're streaming a lot of rows
dbh = Mysql.init
dbh.connect "localhost", "root"
dbh.query_with_result = false
dbh.query("SELECT * FROM users WHERE group='githubbers'")
results = dbh.use_result

While working on our reference data web service, I ran these benchmarks:

The problem

The author of the gem in question, mysql2, knows about the issue. In a nutshell, the gem’s use of mysql_store_result (as opposed to mysql_use_result) leads the underlying libmysql library to always load entire resultsets into memory… even if :cache_rows => false is passed as a runtime option. A :streaming => true option would make perfect sense!

The solution

  • I had to modify my gem, mysql2xxxx (which provides mysql2csv, mysql2json, and mysql2xml) to use the “old” mysql gem.
  • If you’re using mysql2 (it’s the default on Rails 3!) then be aware that processing huge resultsets has a different memory impact than it did with mysql.
  • Vote up the issue on github… it would be nice to optionally use mysql_use_result!

I believe mysql2 has a promising future and this memory problem will probably be gone soon. Thanks brianmario!

What blog is this?

Safety in Numbers is Brighter Planet's blog about climate science, Ruby, Rails, data, transparency, and, well, us.

Who's behind this?

We're Brighter Planet, the world's leading computational sustainability platform.

Who's blogging here?

  1. Patti Prairie CEO