Skip to content

Commit f0de1c1

Browse files
docs: add pg xid wraparound blog (#919)
* add pg xid wraparound blog * Update content/blog/postgres-transaction-id-wraparound.md Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
1 parent 6c28f1a commit f0de1c1

File tree

5 files changed

+161
-0
lines changed

5 files changed

+161
-0
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: 'Postgres Transaction ID (XID) Wraparound'
3+
author: Adela
4+
updated_at: 2025/10/24 12:00
5+
feature_image: /content/blog/postgres-transaction-id-wraparound/banner.webp
6+
tags: Explanation
7+
description: 'What is Postgres Transaction ID (XID) Wraparound and how to monitor and prevent it.'
8+
---
9+
10+
PostgreSQL’s transaction system is built around a simple but powerful idea: every transaction gets a **Transaction ID (XID)** - a 32-bit integer that powers its **MVCC (Multi-Version Concurrency Control)** engine.
11+
12+
MVCC lets multiple users read and write data at the same time without blocking each other.
13+
But those transaction IDs are finite, and eventually they **wrap around**.
14+
15+
If not handled properly, wraparound can make data invisible or even force the entire database into emergency shutdown.
16+
Let’s understand why.
17+
18+
### How Postgres Differs from Other Databases
19+
20+
PostgreSQL uses a **tuple-based** MVCC model - every data change creates a new version (tuple) of a row stored directly in the table.
21+
22+
Each tuple has two hidden fields: `xmin` and `xmax`. `xmin` is the transaction ID that created the tuple, and `xmax` is the transaction ID that deleted or replaced it.
23+
24+
Other databases handle versioning differently:
25+
26+
| Database | MVCC type | Where old versions are stored | Wraparound risk |
27+
| ------------------ | ------------------- | ---------------------------------- | --------------- |
28+
| **PostgreSQL** | Tuple-based | In the main table (needs `VACUUM`) | ✅ Yes |
29+
| **MySQL (InnoDB)** | Undo-log-based | Undo logs in system tablespace | ❌ No |
30+
| **SQL Server** | Version-store-based | Tempdb version store | ❌ No |
31+
| **Oracle** | Undo-log-based | Undo segments | ❌ No |
32+
33+
In short:
34+
PostgreSQL keeps old rows **inline** with the table, and each one carries its own transaction ID.
35+
That design enables powerful visibility control but also means transaction IDs must eventually be **reused** - leading to the wraparound problem.
36+
Other databases store old versions separately, so their transaction identifiers can grow freely without ever wrapping around.
37+
38+
### What Is Transaction Wraparound
39+
40+
PostgreSQL’s transaction IDs are **32-bit integers** (`0` to `4,294,967,295`).
41+
After the counter hits its limit, it **wraps around** back to 3 again.
42+
43+
You can picture XIDs as points on a **circle**, not a straight line.
44+
45+
![Transaction ID (XID) Wraparound](/content/blog/postgres-transaction-id-wraparound/pg-xid-cycle-circle.webp)
46+
47+
When the counter wraps, very old tuples may suddenly appear to have "future" XIDs and become **invisible** to all new transactions.
48+
49+
![Transaction ID (XID) Wraparound](/content/blog/postgres-transaction-id-wraparound/pg-xid-cycle.webp)
50+
51+
To prevent this, Postgres periodically **freezes** old tuples - marking them with a special `FrozenXID`, meaning "committed long ago, always visible."
52+
53+
![Transaction ID (XID) Wraparound](/content/blog/postgres-transaction-id-wraparound/pg-xid-freeze.webp)
54+
55+
### Consequences of Wraparound
56+
57+
If the database fails to freeze tuples in time, it risks data corruption.
58+
When `datfrozenxid` becomes dangerously old, Postgres refuses new writes and may shut down with errors like:
59+
60+
```
61+
PANIC: database is not accepting commands to avoid wraparound data loss
62+
```
63+
64+
#### Real-world cases
65+
66+
- **Sentry (2024):**
67+
Sentry’s Postgres database stopped accepting writes after autovacuum couldn’t keep up with freezing old transaction IDs. The system hit the wraparound limit, forcing emergency manual vacuuming and downtime.
68+
[https://blog.sentry.io/transaction-id-wraparound-in-postgres/](https://blog.sentry.io/transaction-id-wraparound-in-postgres/)
69+
70+
- **Mailchimp/Mandrill (2016):**
71+
A busy shard’s autovacuum fell behind, triggering wraparound protection and halting writes. Recovery required truncations and manual vacuums, leading to roughly 40 hours of outage.
72+
[https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/](https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/)
73+
74+
These cases show that wraparound isn’t theoretical - it’s one of the few PostgreSQL maintenance failures that can completely stop production systems.
75+
76+
### How to Monitor and Prevent Wraparound
77+
78+
#### 1. Autovacuum is the First Line of Defense
79+
80+
Autovacuum automatically scans tables and freezes tuples before they age out.
81+
Key parameters:
82+
83+
- `autovacuum_freeze_max_age` – threshold for wraparound prevention
84+
- `vacuum_freeze_table_age` – when to start freezing during normal vacuum
85+
- `vacuum_freeze_min_age` – minimum XID age before freezing allowed
86+
87+
If autovacuum is off or slow, wraparound danger grows silently.
88+
89+
#### 2. Check XID Age with SQL
90+
91+
To see how close your database is to wraparound:
92+
93+
```sql
94+
SELECT datname, age(datfrozenxid) AS xid_age
95+
FROM pg_database
96+
ORDER BY xid_age DESC;
97+
```
98+
99+
For table-level detail:
100+
101+
```sql
102+
SELECT relname, age(relfrozenxid) AS xid_age
103+
FROM pg_class
104+
WHERE relkind = 'r'
105+
ORDER BY xid_age DESC;
106+
```
107+
108+
⚠️ **Guidelines**
109+
110+
- Above **1.5 billion** → warning zone
111+
- Above **2 billion** → database may lock writes
112+
113+
#### 3. Cloud Provider Recommendations
114+
115+
**AWS RDS / Aurora**
116+
Use `postgres_get_av_diag` to monitor autovacuum health and aging tables —
117+
[https://aws.amazon.com/blogs/database/prevent-transaction-id-wraparound-by-using-postgres_get_av_diag-for-monitoring-autovacuum/](https://aws.amazon.com/blogs/database/prevent-transaction-id-wraparound-by-using-postgres_get_av_diag-for-monitoring-autovacuum/)
118+
119+
**Google Cloud SQL**
120+
Cloud SQL provides a Recommender for High Transaction ID Utilization -
121+
[https://cloud.google.com/sql/docs/postgres/recommender-high-transactionid-utilization](https://cloud.google.com/sql/docs/postgres/recommender-high-transactionid-utilization)
122+
123+
General advice:
124+
125+
- Never disable autovacuum.
126+
- Schedule manual `VACUUM FREEZE` during off-peak hours.
127+
- Avoid long-running idle transactions that block freezing.
128+
129+
### The Challenge of Moving to 64-bit Transaction IDs
130+
131+
At first glance, the easiest fix would be to make transaction IDs **64-bit** instead of 32-bit.
132+
That would raise the ceiling from 4 billion transactions to roughly **18 quintillion** - effectively eliminating wraparound forever.
133+
134+
This idea has been discussed for years, with real prototypes already attempted:
135+
136+
- **Early discussions (2018–2019):**
137+
[https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com](https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com)
138+
Developers debated whether to store full 64-bit IDs or use a **hybrid scheme** (16-bit epoch + 48-bit XID) to retain compatibility.
139+
140+
- **Experimental patch for Postgres 15 (2021):**
141+
[https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyjVWA@mail.gmail.com](https://www.postgresql.org/message-id/flat/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyjVWA@mail.gmail.com)
142+
It proved feasible but caused major ripple effects:
143+
144+
- Every tuple grows by 8 bytes (`xmin` + `xmax`).
145+
- Index and WAL formats must be redesigned.
146+
- Replication and visibility logic rely on 32-bit arithmetic.
147+
148+
- **Community view:**
149+
[https://news.ycombinator.com/item?id=19083745](https://news.ycombinator.com/item?id=19083745)
150+
Developers agreed the change would *solve wraparound permanently* but **break on-disk compatibility**, forcing every database to migrate storage format.
151+
152+
For now, the community focuses on improving **autovacuum efficiency** and **wraparound monitoring**, accepting that 32-bit XIDs remain part of the architecture - at least until a cleaner migration path emerges.
153+
154+
### Best Practices
155+
156+
- ✅ Keep autovacuum **enabled and tuned**
157+
- ✅ Monitor XID age regularly
158+
- ✅ Vacuum frequently on high-write tables
159+
- ✅ Avoid long-running transactions
160+
- ✅ Run `VACUUM FREEZE` during maintenance windows
161+
- ✅ Partition or archive old data to reduce bloat
30.7 KB
Loading
9.21 KB
Loading
27 KB
Loading
39.2 KB
Loading

0 commit comments

Comments
 (0)