Entry

MySQL 5.7 on OS X: Server Has Gone Away

by: Derek Jones on: 5/31/2016

Exception Caught

SQLSTATE[HY000] [2006] MySQL server has gone away
/system/ee/legacy/database/drivers/mysqli/mysqli_connection.php:82

Have you seen this error show up randomly while working in your local environment? I started getting this error randomly after updating to MySQL 5.7, and it’s really bugged me. I finally tracked it down so I thought I’d share my frustration and make this issue a bit more discoverable. MySQL and Unix internals are not my wheelhouse, so if I get some details wrong, please let me know so I can correct them.

tl;dr: MySQL 5.7 will break on OS X after too many connections are left open within eight hours. Restarting the server will fix it until the next time this occurs. You might be able to prevent this by dramatically reducing MySQL’s interactive_timeout and wait_timeout.

The details

MySQL uses your OS’s system-level functions to track the availability of input/output channels. The two most common on Unix-based systems are poll() and select().

poll() is typically preferred due to some limitations of select(), which will become clear in a moment. But not all flavors of Unix implement poll() identically. In fact, its behavior is not even consistent within a single implementation.

In the case of OS X—and it’s unclear if this is true on OS X > 10.4.4—poll() didn’t handle the POLLHUP event to “hang up” the connection. So PHP’s connection would close, but MySQL server would think it was still open, and wait out the default 28800 second timeout before closing it. On MySQL 5.6.8 and lower, you would have had some kill issues, but probably never noticed as PHP would make a new connection and be on its way. Oracle noticed this bug with kill on OS X using poll() and switched to select().

select() has a default limitation of handling 1024 file descriptors. MySQL can potentially spin up hundreds for the files it uses for your databases, pushing over that limit. Most Unix versions have a default max of 1024 for FD_SETSIZE, and you cannot configure it at runtime. Darwin—OS X’s Unix core—is one of the Unix versions where recompiling is not an option, as you can’t compile your own version of OS X. So as with many other Unix systems, select() is stuck with the limitation of 1024 file descriptors.

This switch to using select() resulted in a new bug that caused stack corruption on OS X when the file descriptors exceeded 1024. Not good. Oracle fixed this recently by rejecting the connection request outright. The result is the “MySQL server has gone away” error above. And if you check your MySQL error log, you’ll see the more detailed message like:1

[Warning] File Descriptor 1043 exceedeed FD_SETSIZE=1024

A potential workaround?

I suspect but have not tested that switching back to poll() and recompiling MySQL would be fine for modern versions of OS X. All of my research reveals that Darwin’s poll() handles all of the events necessary to use it instead of select(). That’s a bit of an onerous fix to apply yourself though, even if true. I’ve reached out to Oracle, and while I didn’t get a clear response on the use of poll(), there is hope. Originally they were not going to look at alternatives, and insisted that Apple should solve it by increasing the FD_SETSIZE constant. However, they are now working on an alternative that uses kevent, so I’m hopeful a fix will be coming soon.

What I’m trying in the meantime is changing MySQL’s interactive_timeout and wait_timeout settings from their default of 8 hours to a much lower value—say, five minutes. In my.cnf:

interactive_timeout = 300
wait_timeout = 300

My theory is that this will cause the server to release connections, and thus file descriptors, at a fast enough rate that the select() imposed limit of 1024 won’t be an issue. For local development, five minutes per connection shouldn’t be a problem. It’s been working for me so far, but it’s hard to say conclusively if my theory is right. MySQL isn’t the only process creating file descriptors, so I can’t test in isolation. If nothing else, the mystery is solved on why the server randomly goes away, and it looks like Oracle has changed their position and intends to address it.

.(JavaScript must be enabled to view this email address) or share your feedback on this entry with @ellislab on Twitter.

ExpressionEngine News

#eecms, #events, #releases