Skip to content

Commit 9706bb6

Browse files
committed
blog: CREATE INDEX CONCURRENTLY
1 parent b388434 commit 9706bb6

File tree

3 files changed

+219
-0
lines changed

3 files changed

+219
-0
lines changed
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
---
2+
title: How to Use Postgres CREATE INDEX CONCURRENTLY
3+
author: Tianzhou
4+
updated_at: 2025/08/27 16:00:00
5+
feature_image: /content/blog/postgres-create-index-concurrently/banner.webp
6+
tags: Explanation
7+
featured: false
8+
description: Learn how to use CREATE INDEX CONCURRENTLY in Postgres to build indexes without blocking writes. Pitfalls, safety checks, and performance tips.
9+
---
10+
11+
## The Problem with Regular CREATE INDEX
12+
13+
When you run a standard `CREATE INDEX` command in PostgreSQL, it acquires a **SHARE (ShareLock)** on the table, which has severe implications for concurrent operations.
14+
15+
```sql
16+
-- This will block all writes to the users table
17+
CREATE INDEX idx_users_email ON users(email);
18+
```
19+
20+
### Technical Lock Details
21+
22+
The `SHARE` lock acquired by `CREATE INDEX` conflicts with several other lock modes:
23+
24+
- **RowExclusiveLock** (used by INSERT, UPDATE, DELETE)
25+
- **ShareUpdateExclusiveLock** (used by VACUUM, ANALYZE, REINDEX CONCURRENTLY)
26+
- **ShareRowExclusiveLock** (used by CREATE TRIGGER and some ALTER TABLE variants)
27+
- **ExclusiveLock** (used by REFRESH MATERIALIZED VIEW CONCURRENTLY)
28+
- **AccessExclusiveLock** (used by DROP TABLE, TRUNCATE, REINDEX, CLUSTER)
29+
30+
This lock compatibility matrix explains why regular index creation is so disruptive. Per the [PostgreSQL locking documentation](https://www.postgresql.org/docs/current/explicit-locking.html), a SHARE lock allows concurrent SELECT operations but blocks all write operations.
31+
32+
## CREATE INDEX CONCURRENTLY: The Solution
33+
34+
PostgreSQL's `CREATE INDEX CONCURRENTLY` allows index building without blocking writes by using a **SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock)** instead of a SHARE lock:
35+
36+
```sql
37+
-- This allows writes to continue during index creation
38+
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
39+
```
40+
41+
### Lock Mode Comparison
42+
43+
The key difference lies in the lock mode used:
44+
45+
| Operation | Lock Mode | ❌ Conflicts With | ✅ Allows |
46+
| --------------------------- | ---------------------------- | ------------------------------------------------------ | ------------------------------ |
47+
| `CREATE INDEX` | **ShareLock** | INSERT, UPDATE, DELETE, VACUUM, other DDL | SELECT only |
48+
| `CREATE INDEX CONCURRENTLY` | **ShareUpdateExclusiveLock** | Other DDL operations, multiple concurrent index builds | SELECT, INSERT, UPDATE, DELETE |
49+
50+
![lock-mode](/content/blog/postgres-create-index-concurrently/lock-table.webp)
51+
52+
The `ShareUpdateExclusiveLock` is specifically designed to allow concurrent data modifications while preventing conflicting DDL operations.
53+
54+
### How It Works
55+
56+
`CREATE INDEX CONCURRENTLY` uses a multi-phase approach:
57+
58+
1. **Initial Catalog Entry**: Creates index metadata with `indisvalid = false`
59+
2. **First Table Scan**: Builds initial index structure while allowing writes
60+
3. **Second Table Scan**: Catches up with changes that occurred during first scan
61+
4. **Validation**: Marks index as valid (`indisvalid = true`)
62+
63+
During this process:
64+
65+
- **Writes continue normally** - INSERT, UPDATE, DELETE work without interruption
66+
- **Takes longer than regular indexing** - Typically 2-3x slower due to multiple scans
67+
- **Uses more resources** - Higher CPU and I/O load from tracking concurrent changes
68+
69+
### Key Limitations
70+
71+
While powerful, `CREATE INDEX CONCURRENTLY` has important restrictions:
72+
73+
#### Cannot Run Inside Transactions
74+
75+
```sql
76+
BEGIN;
77+
CREATE INDEX CONCURRENTLY idx_users_email ON users(email); -- ERROR!
78+
COMMIT;
79+
-- ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
80+
```
81+
82+
This limitation exists because the operation needs to commit multiple internal transactions during its phases.
83+
84+
#### Other Limitations
85+
86+
- **Only one concurrent index per table** - Multiple concurrent index builds on the same table will serialize
87+
- **Failure leaves invalid index** - Must be manually dropped if creation fails
88+
- **Foreign keys reference check** - Creating unique index concurrently may fail if duplicate values are inserted during creation
89+
90+
## Tracking Index Creation Progress
91+
92+
### Using pg_stat_progress_create_index
93+
94+
We can use [pg_stat_progress_create_index](https://www.postgresql.org/docs/current/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING) to track the index creation:
95+
96+
```sql
97+
-- Monitor index creation progress
98+
SELECT
99+
pid,
100+
datname,
101+
relid::regclass AS table_name,
102+
index_relid::regclass AS index_name,
103+
phase,
104+
lockers_total,
105+
lockers_done,
106+
current_locker_pid,
107+
blocks_total,
108+
blocks_done,
109+
tuples_total,
110+
tuples_done,
111+
partitions_total,
112+
partitions_done
113+
FROM pg_stat_progress_create_index;
114+
```
115+
116+
The `phase` column shows the current operation stage:
117+
118+
- `initializing`: Starting up
119+
- `waiting for writers before build`: Waiting for concurrent writes to finish
120+
- `building index`: Main index creation phase
121+
- `waiting for writers before validation`: Preparing for validation
122+
- `index validation: scanning index`: Validating index entries
123+
- `index validation: scanning table`: Final validation
124+
- `waiting for old snapshots`: Waiting for transactions to complete
125+
- `waiting for readers before marking dead`: Final cleanup
126+
127+
### Quick Validity Check with pg_index
128+
129+
The fastest way to check if an index is ready for use:
130+
131+
```sql
132+
-- Check if index is valid and ready for use
133+
SELECT
134+
schemaname,
135+
tablename,
136+
indexname,
137+
indexdef
138+
FROM pg_indexes i
139+
JOIN pg_class c ON c.relname = i.indexname
140+
JOIN pg_index idx ON idx.indexrelid = c.oid
141+
WHERE NOT idx.indisvalid
142+
AND schemaname NOT IN ('pg_catalog', 'information_schema');
143+
```
144+
145+
The `indisvalid` column:
146+
147+
- `true`: Index is complete and being used by the query planner
148+
- `false`: Index is either being built or failed during concurrent creation
149+
150+
_Postgres doesn't have built-in INVISBLE INDEX clause. You can achieve the behavior by setting `indisvalid` to `false`_.
151+
152+
## Best Practices
153+
154+
### Automatic Review
155+
156+
To prevent developers from running `CREATE INDEX` and accidentally locking the database, implement automatic SQL linting during the review process.
157+
158+
<HintBlock type="info">
159+
160+
[Bytebase SQL Review](https://docs.bytebase.com/sql-review/review-rules#create-index-concurrently) provides automated enforcement of the `CREATE INDEX CONCURRENTLY` rule and can be integrated with [CI/CD pipelines](https://docs.bytebase.com/vcs-integration/overview#github-actions) to catch violations before deployment.
161+
162+
</HintBlock>
163+
164+
### Always Verify Index Validity After Creation
165+
166+
```sql
167+
-- After CREATE INDEX CONCURRENTLY completes
168+
SELECT indisvalid
169+
FROM pg_index
170+
WHERE indexrelid = 'idx_users_email'::regclass;
171+
172+
-- If false, the index creation failed and needs cleanup
173+
DROP INDEX IF EXISTS idx_users_email;
174+
```
175+
176+
### Clean Up Failed Indexes
177+
178+
```sql
179+
-- Find and drop all invalid indexes
180+
DO $$
181+
DECLARE
182+
r RECORD;
183+
BEGIN
184+
FOR r IN
185+
SELECT schemaname, indexname
186+
FROM pg_indexes i
187+
JOIN pg_class c ON c.relname = i.indexname
188+
JOIN pg_index idx ON idx.indexrelid = c.oid
189+
WHERE NOT idx.indisvalid
190+
AND schemaname NOT IN ('pg_catalog', 'information_schema')
191+
LOOP
192+
EXECUTE format('DROP INDEX %I.%I', r.schemaname, r.indexname);
193+
RAISE NOTICE 'Dropped invalid index %.%', r.schemaname, r.indexname;
194+
END LOOP;
195+
END $$;
196+
```
197+
198+
### Performance Consideration
199+
200+
| Aspect | CREATE INDEX | CREATE INDEX CONCURRENTLY |
201+
| ------------------- | --------------------- | -------------------------------------- |
202+
| **Lock Level** | SHARE (blocks writes) | SHARE UPDATE EXCLUSIVE (allows writes) |
203+
| **Duration** | Baseline (1x) | 2-3x longer |
204+
| **CPU Usage** | High burst | Sustained moderate |
205+
| **I/O Impact** | Single intensive scan | Multiple moderate scans |
206+
| **Memory Usage** | maintenance_work_mem | Similar, held longer |
207+
| **Transaction Log** | Minimal | Higher due to concurrent changes |
208+
209+
Even though `CREATE INDEX CONCURRENTLY` doesn't block writes, it still impacts performance:
210+
211+
- Schedule during low-traffic periods when possible
212+
- Monitor CPU and I/O metrics during creation
213+
- Consider increasing `maintenance_work_mem` temporarily for faster indexing
214+
215+
## References
216+
217+
1. [Postgres locking mode](https://www.postgresql.org/docs/current/explicit-locking.html)
218+
1. [Index creation progress table](https://www.postgresql.org/docs/current/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING)
219+
1. [Source code for CREATE INDEX CONCURRENTLY](https://github.com/postgres/postgres/blob/ef5b87b970dc28adeeb88191fbf66c9d6298b112/src/backend/commands/indexcmds.c#L542)
28 KB
Loading
57.2 KB
Loading

0 commit comments

Comments
 (0)