SQLSTATE[HY000]  MySQL server has gone away
Have you seen this error show up randomly while working in your local environment? I started getting this error randomly after updating to MySQL 5.7, and it’s really bugged me. I finally tracked it down so I thought I’d share my frustration and make this issue a bit more discoverable. MySQL and Unix internals are not my wheelhouse, so if I get some details wrong, please let me know so I can correct them.
tl;dr: MySQL 5.7 will break on OS X after too many connections are left open within eight hours. Restarting the server will fix it until the next time this occurs. You might be able to prevent this by dramatically reducing MySQL’s
MySQL uses your OS’s system-level functions to track the availability of input/output channels. The two most common on Unix-based systems are
poll() is typically preferred due to some limitations of
select(), which will become clear in a moment. But not all flavors of Unix implement
poll() identically. In fact, its behavior is not even consistent within a single implementation.
In the case of OS X—and it’s unclear if this is true on OS X > 10.4.4—
poll() didn’t handle the
POLLHUP event to “hang up” the connection. So PHP’s connection would close, but MySQL server would think it was still open, and wait out the default 28800 second timeout before closing it. On MySQL 5.6.8 and lower, you would have had some kill issues, but probably never noticed as PHP would make a new connection and be on its way. Oracle noticed this bug with kill on OS X using
poll() and switched to
select() has a default limitation of handling 1024 file descriptors. MySQL can potentially spin up hundreds for the files it uses for your databases, pushing over that limit. Most Unix versions have a default max of 1024 for
FD_SETSIZE, and you cannot configure it at runtime. Darwin—OS X’s Unix core—is one of the Unix versions where recompiling is not an option, as you can’t compile your own version of OS X. So as with many other Unix systems,
select() is stuck with the limitation of 1024 file descriptors.
This switch to using
select() resulted in a new bug that caused stack corruption on OS X when the file descriptors exceeded 1024. Not good. Oracle fixed this recently by rejecting the connection request outright. The result is the “MySQL server has gone away” error above. And if you check your MySQL error log, you’ll see the more detailed message like:1
[Warning] File Descriptor 1043 exceedeed FD_SETSIZE=1024
I suspect but have not tested that switching back to
poll() and recompiling MySQL would be fine for modern versions of OS X. All of my research reveals that Darwin’s
poll() handles all of the events necessary to use it instead of
select(). That’s a bit of an onerous fix to apply yourself though, even if true. I’ve reached out to Oracle, and while I didn’t get a clear response on the use of
poll(), there is hope. Originally they were not going to look at alternatives, and insisted that Apple should solve it by increasing the
FD_SETSIZE constant. However, they are now working on an alternative that uses
kevent, so I’m hopeful a fix will be coming soon.
What I’m trying in the meantime is changing MySQL’s
wait_timeout settings from their default of 8 hours to a much lower value—say, five minutes. In
interactive_timeout = 300 wait_timeout = 300
My theory is that this will cause the server to release connections, and thus file descriptors, at a fast enough rate that the
select() imposed limit of 1024 won’t be an issue. For local development, five minutes per connection shouldn’t be a problem. It’s been working for me so far, but it’s hard to say conclusively if my theory is right. MySQL isn’t the only process creating file descriptors, so I can’t test in isolation. If nothing else, the mystery is solved on why the server randomly goes away, and it looks like Oracle has changed their position and intends to address it.
Yes, that’s “exceedeed”: https://github.com/mysql/mysql-server/commit/de3d16186#diff-b7559927391564ed82ae7d4b821a7cf4R927 ↩︎