When MySQL runs out of connections because everything is sleeping

Note

**TL;DR — MySQL Sleep connections are real and they cost.** A connection in the Sleep state isn't running a query but is still using server resources — a thread, ~256 KB-2 MB of memory, a slot against `max_connections`. If you have hundreds of them, your application is probably opening connections, doing one query, and forgetting to close them. Lower `wait_timeout` to recycle them faster, fix the application's connection lifecycle, and consider connection pooling (ProxySQL or app-level) for the long term.

The error in the application logs was the classic one: `SQLSTATE[HY000] [1040] Too many connections`. Site was returning 500s. I jumped on the database server, expecting to find a stuck query or a runaway batch job pinning everything. Ran `SHOW PROCESSLIST` and got a wall of text scrolling past my terminal. Hundreds of rows. All in the same state: Sleep.

If you've never seen this before it looks completely contradictory. The database is "out of connections" but every connection is doing nothing. They're not running queries. They're not waiting on locks. They're just sitting there, sleeping. Idle. And yet they're consuming the entire connection pool.

What does Sleep actually mean?

A MySQL connection in the Sleep state is one where the client has finished its previous query but hasn't yet closed the connection or sent another. From MySQL's perspective the connection is alive — there's a thread allocated to it, memory reserved, a slot held against `max_connections` — but the client isn't doing anything with it.

On most stacks, a connection in Sleep is a connection your application opened, used for one query, and then forgot to close. It'll stay there until either the application explicitly closes it, the application crashes, or MySQL kicks it out for being idle longer than `wait_timeout`. Default `wait_timeout` is 28,800 seconds. Eight hours.

Now you can see how you get into trouble. If your application opens ten connections per second and forgets to close any of them, and `wait_timeout` is eight hours, you've created 288,000 sleeping connections in your worst case before the timeout starts kicking them out. You'll hit `max_connections` (default 151) about fifteen seconds in.

Counting the sleeps

The fastest way to confirm what's happening:

sql

SELECT COMMAND, COUNT(*) FROM information_schema.PROCESSLIST GROUP BY COMMAND;

On the affected server it returned:

text

+---------+----------+
| COMMAND | COUNT(*) |
+---------+----------+
| Query   |        3 |
| Sleep   |      147 |
+---------+----------+

150 connections total, 147 of them in Sleep, 3 actually running queries. `max_connections` was 151 so the database had effectively run out of room because of the sleeping ones.

The next question is *who* is holding all those sleeping connections. Group by user and host:

sql

SELECT USER, HOST, COUNT(*) FROM information_schema.PROCESSLIST
WHERE COMMAND='Sleep'
GROUP BY USER, HOST
ORDER BY 3 DESC;

All 147 sleeps were owned by the same WordPress user, all from the same web server's IP. So this wasn't a leaked connection from some forgotten migration script or cron job. It was the application itself, in normal operation, opening connections faster than it was closing them.

The short-term fix: lower wait_timeout

If you need to stop the bleeding right now and you can't ship application code, the quickest move is telling MySQL to recycle idle connections faster. In `/etc/mysql/mariadb.conf.d/50-server.cnf`:

ini

[mysqld]
wait_timeout = 60
interactive_timeout = 60

Restart (or reload) MariaDB and now any connection that sits idle for more than 60 seconds gets dropped. The pool of sleeping connections becomes self-cleaning. This is a band-aid — your application is still leaking — but it'll keep `max_connections` from filling up while you go fix the actual bug.

Warning

Don't go below 30 seconds for `wait_timeout`. Some legitimate workflows (long-running admin operations, slow data exports) will look idle for tens of seconds between queries on the same connection. If you set `wait_timeout` too aggressively you'll start dropping legitimate sessions and getting `MySQL server has gone away` errors mid-page-load.

The actual fix: figure out what's leaking

The application was a WordPress site with several plugins doing background work. The leak turned out to be a custom plugin that was opening a new `mysqli_connect` for each WP-Cron event and never calling `mysqli_close`. Every cron event that fired left an extra connection behind, which then sat in Sleep until eight hours later when MySQL would finally kick it out.

Fixing the plugin (using WordPress's `$wpdb` instead of opening its own connection at all) was a 4-line change. After deploying it, the sleeping count dropped from 147 to a steady state of about 8-12 — which is normal background noise from the WordPress core's own connection handling.

Why connection pools exist

If you find yourself dealing with this on a bigger application — say, a horizontally scaled API where every web server can independently saturate the database — the right long-term answer isn't tighter timeouts. It's a connection pool. ProxySQL, PgBouncer (for Postgres), or built-in pooling from your application framework lets you have, say, 1,000 concurrent application requests sharing 50 actual database connections. The pool reuses connections aggressively and the database never sees the burst.

WordPress traditionally doesn't use connection pooling — it opens a fresh connection per page load through `$wpdb` and closes it cleanly when PHP-FPM finishes the request. That works fine until something in the stack starts holding connections open beyond the request lifecycle, which is exactly the bug I was tracking down.

A useful one-liner for next time

Whenever I'm investigating sleep accumulation on someone else's database, the first command I run is this:

sql

SELECT USER, HOST, COMMAND, COUNT(*) AS c, AVG(TIME) AS avg_idle
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST, COMMAND
ORDER BY c DESC;

It tells you who is opening connections, what state they're in, how many of each, and how long they've been idle on average. From that one query you usually know within ten seconds whether the problem is one process leaking, one service exhausting the pool, or genuine traffic that needs more headroom. Save it to your shell history and you'll thank yourself the next time `Too many connections` shows up in production.