|
| 1 | +--- |
| 2 | +title: 'Postgres Transaction ID (XID) Wraparound' |
| 3 | +author: Adela |
| 4 | +updated_at: 2025/10/24 12:00 |
| 5 | +feature_image: /content/blog/postgres-transaction-id-wraparound/banner.webp |
| 6 | +tags: Explanation |
| 7 | +description: 'What is Postgres Transaction ID (XID) Wraparound and how to monitor and prevent it.' |
| 8 | +--- |
| 9 | + |
| 10 | +PostgreSQL’s transaction system is built around a simple but powerful idea: every transaction gets a **Transaction ID (XID)** - a 32-bit integer that powers its **MVCC (Multi-Version Concurrency Control)** engine. |
| 11 | + |
| 12 | +MVCC lets multiple users read and write data at the same time without blocking each other. |
| 13 | +But those transaction IDs are finite, and eventually they **wrap around**. |
| 14 | + |
| 15 | +If not handled properly, wraparound can make data invisible or even force the entire database into emergency shutdown. |
| 16 | +Let’s understand why. |
| 17 | + |
| 18 | +### How Postgres Differs from Other Databases |
| 19 | + |
| 20 | +PostgreSQL uses a **tuple-based** MVCC model - every data change creates a new version (tuple) of a row stored directly in the table. |
| 21 | + |
| 22 | +Each tuple has two hidden fields: `xmin` and `xmax`. `xmin` is the transaction ID that created the tuple, and `xmax` is the transaction ID that deleted or replaced it. |
| 23 | + |
| 24 | +Other databases handle versioning differently: |
| 25 | + |
| 26 | +| Database | MVCC type | Where old versions are stored | Wraparound risk | |
| 27 | +| ------------------ | ------------------- | ---------------------------------- | --------------- | |
| 28 | +| **PostgreSQL** | Tuple-based | In the main table (needs `VACUUM`) | ✅ Yes | |
| 29 | +| **MySQL (InnoDB)** | Undo-log-based | Undo logs in system tablespace | ❌ No | |
| 30 | +| **SQL Server** | Version-store-based | Tempdb version store | ❌ No | |
| 31 | +| **Oracle** | Undo-log-based | Undo segments | ❌ No | |
| 32 | + |
| 33 | +In short: |
| 34 | +PostgreSQL keeps old rows **inline** with the table, and each one carries its own transaction ID. |
| 35 | +That design enables powerful visibility control but also means transaction IDs must eventually be **reused** - leading to the wraparound problem. |
| 36 | +Other databases store old versions separately, so their transaction identifiers can grow freely without ever wrapping around. |
| 37 | + |
| 38 | +### What Is Transaction Wraparound |
| 39 | + |
| 40 | +PostgreSQL’s transaction IDs are **32-bit integers** (`0` to `4,294,967,295`). |
| 41 | +After the counter hits its limit, it **wraps around** back to 3 again. |
| 42 | + |
| 43 | +You can picture XIDs as points on a **circle**, not a straight line. |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | +When the counter wraps, very old tuples may suddenly appear to have "future" XIDs and become **invisible** to all new transactions. |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | +To prevent this, Postgres periodically **freezes** old tuples - marking them with a special `FrozenXID`, meaning "committed long ago, always visible." |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | +### Consequences of Wraparound |
| 56 | + |
| 57 | +If the database fails to freeze tuples in time, it risks data corruption. |
| 58 | +When `datfrozenxid` becomes dangerously old, Postgres refuses new writes and may shut down with errors like: |
| 59 | + |
| 60 | +``` |
| 61 | +PANIC: database is not accepting commands to avoid wraparound data loss |
| 62 | +``` |
| 63 | + |
| 64 | +#### Real-world cases |
| 65 | + |
| 66 | +- **Sentry (2024):** |
| 67 | + Sentry’s Postgres database stopped accepting writes after autovacuum couldn’t keep up with freezing old transaction IDs. The system hit the wraparound limit, forcing emergency manual vacuuming and downtime. |
| 68 | + [https://blog.sentry.io/transaction-id-wraparound-in-postgres/](https://blog.sentry.io/transaction-id-wraparound-in-postgres/) |
| 69 | + |
| 70 | +- **Mailchimp/Mandrill (2016):** |
| 71 | + A busy shard’s autovacuum fell behind, triggering wraparound protection and halting writes. Recovery required truncations and manual vacuums, leading to roughly 40 hours of outage. |
| 72 | + [https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/](https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/) |
| 73 | + |
| 74 | +These cases show that wraparound isn’t theoretical - it’s one of the few PostgreSQL maintenance failures that can completely stop production systems. |
| 75 | + |
| 76 | +### How to Monitor and Prevent Wraparound |
| 77 | + |
| 78 | +#### 1. Autovacuum is the First Line of Defense |
| 79 | + |
| 80 | +Autovacuum automatically scans tables and freezes tuples before they age out. |
| 81 | +Key parameters: |
| 82 | + |
| 83 | +- `autovacuum_freeze_max_age` – threshold for wraparound prevention |
| 84 | +- `vacuum_freeze_table_age` – when to start freezing during normal vacuum |
| 85 | +- `vacuum_freeze_min_age` – minimum XID age before freezing allowed |
| 86 | + |
| 87 | +If autovacuum is off or slow, wraparound danger grows silently. |
| 88 | + |
| 89 | +#### 2. Check XID Age with SQL |
| 90 | + |
| 91 | +To see how close your database is to wraparound: |
| 92 | + |
| 93 | +```sql |
| 94 | +SELECT datname, age(datfrozenxid) AS xid_age |
| 95 | +FROM pg_database |
| 96 | +ORDER BY xid_age DESC; |
| 97 | +``` |
| 98 | + |
| 99 | +For table-level detail: |
| 100 | + |
| 101 | +```sql |
| 102 | +SELECT relname, age(relfrozenxid) AS xid_age |
| 103 | +FROM pg_class |
| 104 | +WHERE relkind = 'r' |
| 105 | +ORDER BY xid_age DESC; |
| 106 | +``` |
| 107 | + |
| 108 | +⚠️ **Guidelines** |
| 109 | + |
| 110 | +- Above **1.5 billion** → warning zone |
| 111 | +- Above **2 billion** → database may lock writes |
| 112 | + |
| 113 | +#### 3. Cloud Provider Recommendations |
| 114 | + |
| 115 | +**AWS RDS / Aurora** |
| 116 | +Use `postgres_get_av_diag` to monitor autovacuum health and aging tables — |
| 117 | +[https://aws.amazon.com/blogs/database/prevent-transaction-id-wraparound-by-using-postgres_get_av_diag-for-monitoring-autovacuum/](https://aws.amazon.com/blogs/database/prevent-transaction-id-wraparound-by-using-postgres_get_av_diag-for-monitoring-autovacuum/) |
| 118 | + |
| 119 | +**Google Cloud SQL** |
| 120 | +Cloud SQL provides a Recommender for High Transaction ID Utilization - |
| 121 | +[https://cloud.google.com/sql/docs/postgres/recommender-high-transactionid-utilization](https://cloud.google.com/sql/docs/postgres/recommender-high-transactionid-utilization) |
| 122 | + |
| 123 | +General advice: |
| 124 | + |
| 125 | +- Never disable autovacuum. |
| 126 | +- Schedule manual `VACUUM FREEZE` during off-peak hours. |
| 127 | +- Avoid long-running idle transactions that block freezing. |
| 128 | + |
| 129 | +### The Challenge of Moving to 64-bit Transaction IDs |
| 130 | + |
| 131 | +At first glance, the easiest fix would be to make transaction IDs **64-bit** instead of 32-bit. |
| 132 | +That would raise the ceiling from 4 billion transactions to roughly **18 quintillion** - effectively eliminating wraparound forever. |
| 133 | + |
| 134 | +This idea has been discussed for years, with real prototypes already attempted: |
| 135 | + |
| 136 | +- **Early discussions (2018–2019):** |
| 137 | + [https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com](https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com) |
| 138 | + Developers debated whether to store full 64-bit IDs or use a **hybrid scheme** (16-bit epoch + 48-bit XID) to retain compatibility. |
| 139 | + |
| 140 | +- **Experimental patch for Postgres 15 (2021):** |
| 141 | + [https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyjVWA@mail.gmail.com](https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyjVWA@mail.gmail.com) |
| 142 | + It proved feasible but caused major ripple effects: |
| 143 | + |
| 144 | + - Every tuple grows by 8 bytes (`xmin` + `xmax`). |
| 145 | + - Index and WAL formats must be redesigned. |
| 146 | + - Replication and visibility logic rely on 32-bit arithmetic. |
| 147 | + |
| 148 | +- **Community view:** |
| 149 | + [https://news.ycombinator.com/item?id=19083745](https://news.ycombinator.com/item?id=19083745) |
| 150 | + Developers agreed the change would *solve wraparound permanently* but **break on-disk compatibility**, forcing every database to migrate storage format. |
| 151 | + |
| 152 | +For now, the community focuses on improving **autovacuum efficiency** and **wraparound monitoring**, accepting that 32-bit XIDs remain part of the architecture - at least until a cleaner migration path emerges. |
| 153 | + |
| 154 | +### Best Practices |
| 155 | + |
| 156 | +- ✅ Keep autovacuum **enabled and tuned** |
| 157 | +- ✅ Monitor XID age regularly |
| 158 | +- ✅ Vacuum frequently on high-write tables |
| 159 | +- ✅ Avoid long-running transactions |
| 160 | +- ✅ Run `VACUUM FREEZE` during maintenance windows |
| 161 | +- ✅ Partition or archive old data to reduce bloat |
0 commit comments