Skip to content

Commit 2a75ddc

Browse files
authored
Merge pull request #585 from bytebase/o-branch-18
docs: add mysql howto create index large table reference
2 parents febccc1 + ba5e1a5 commit 2a75ddc

File tree

2 files changed

+252
-0
lines changed

2 files changed

+252
-0
lines changed

content/reference/mysql/how-to/_layout.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919

2020
## [How to CREATE TABLE](/reference/mysql/how-to/how-to-create-table-mysql)
2121

22+
## [How to CREATE INDEX on large table](/reference/mysql/how-to/how-to-create-index-on-large-table-mysql)
23+
2224
## [How to ALTER TABLE](/reference/mysql/how-to/how-to-alter-table-mysql)
2325

2426
## [How to ALTER COLUMN TYPE](/reference/mysql/how-to/how-to-alter-column-type-mysql)
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
---
2+
title: How to CREATE INDEX on large table in MySQL
3+
updated_at: 2025/04/09 12:00:00
4+
---
5+
6+
_Official documentation: [CREATE INDEX](https://dev.mysql.com/doc/refman/8.0/en/create-index.html)_
7+
8+
## Performance Considerations
9+
10+
<HintBlock type="info">
11+
12+
Creating indexes on large tables can be resource-intensive and potentially disruptive. Without proper planning, these operations can cause extended downtime, lock tables, or consume excessive server resources.
13+
14+
Many organizations require approval for index operations on large tables. You can enforce [approval processes](/docs/administration/custom-approval/) or [automated schema reviews](/docs/sql-review/review-rules/#index) via Bytebase.
15+
16+
</HintBlock>
17+
18+
1. **Table Locking**: Traditional index creation locks the entire table, preventing other operations until completed.
19+
20+
2. **Resource Consumption**: Building indexes on large tables requires significant CPU, memory, and disk I/O.
21+
22+
3. **Transaction Log Growth**: Index creation generates substantial transaction logs, potentially filling storage.
23+
24+
4. **Replication Impact**: On replicated environments, index operations must propagate to all replicas.
25+
26+
## Understanding Large Table Challenges
27+
28+
Defining "large" varies by environment, but tables with millions of rows or several GB in size typically require special consideration for index operations. The main challenges are:
29+
30+
- Operation duration (potentially hours)
31+
- Table locking causing application downtime
32+
- Server resource consumption affecting other workloads
33+
- Replication lag on replicated environments
34+
35+
## Methods for Creating Indexes on Large Tables
36+
37+
### Using Online DDL
38+
39+
MySQL 5.6+ supports online DDL operations that allow concurrent DML while creating indexes:
40+
41+
```sql
42+
-- Create index with ALGORITHM=INPLACE and LOCK=NONE
43+
ALTER TABLE large_table
44+
ADD INDEX idx_column (column_name)
45+
ALGORITHM=INPLACE, LOCK=NONE;
46+
```
47+
48+
This approach:
49+
50+
- Minimizes table locking
51+
- Allows concurrent reads and writes
52+
- Works for most index types in InnoDB
53+
54+
Limitations:
55+
56+
- Not all index operations support ALGORITHM=INPLACE
57+
- Still consumes significant resources
58+
- May fail on very large tables if the operation times out
59+
60+
### Using pt-online-schema-change
61+
62+
For extremely large tables, Percona Toolkit's `pt-online-schema-change` provides a reliable solution:
63+
64+
```bash
65+
pt-online-schema-change --alter="ADD INDEX idx_column (column_name)" \
66+
--host=localhost --user=username --password=password \
67+
D=database,t=large_table --execute
68+
```
69+
70+
This approach:
71+
72+
- Creates a new table with the desired structure
73+
- Copies data in small batches
74+
- Maintains triggers to capture ongoing changes
75+
- Performs an atomic table swap when complete
76+
77+
Limitations:
78+
79+
- Requires double the disk space temporarily
80+
- Adds overhead from triggers
81+
- Doesn't work well with foreign keys unless handled carefully
82+
83+
### Using GitHub's gh-ost
84+
85+
gh-ost is another tool specifically designed for online schema changes:
86+
87+
```bash
88+
gh-ost --user="user" --password="password" --host=hostname \
89+
--database="db" --table="large_table" \
90+
--alter="ADD INDEX idx_column (column_name)" \
91+
--execute
92+
```
93+
94+
This approach:
95+
96+
- Uses binary log streaming instead of triggers
97+
- Creates minimal locking
98+
- Provides detailed progress reporting
99+
- Is generally safer for very large tables
100+
101+
### Using FORCE INDEX for Temporary Relief
102+
103+
If you can't immediately create an index but need query performance, consider using a temporary table:
104+
105+
```sql
106+
-- Create indexed temporary table with subset of data
107+
CREATE TEMPORARY TABLE tmp_table AS
108+
SELECT * FROM large_table WHERE some_condition
109+
LIMIT 1000000;
110+
111+
CREATE INDEX idx_column ON tmp_table (column_name);
112+
113+
-- Query the temporary table
114+
SELECT * FROM tmp_table WHERE column_name = 'value';
115+
```
116+
117+
## Monitoring Index Creation Progress
118+
119+
### For Online DDL Operations
120+
121+
```sql
122+
-- Check progress in performance_schema
123+
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED,
124+
ROUND(WORK_COMPLETED/WORK_ESTIMATED*100, 2) AS "% Complete"
125+
FROM performance_schema.events_stages_current
126+
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
127+
```
128+
129+
### For MySQL 8.0+ (Performance Schema)
130+
131+
```sql
132+
-- Get detailed progress information
133+
SELECT * FROM performance_schema.events_stages_current
134+
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
135+
```
136+
137+
### General Progress Monitoring
138+
139+
```sql
140+
-- Monitor through process list
141+
SELECT * FROM information_schema.processlist
142+
WHERE info LIKE 'ALTER TABLE%';
143+
144+
-- Check for lock waits
145+
SELECT * FROM sys.innodb_lock_waits;
146+
```
147+
148+
## Common Errors and Solutions
149+
150+
See [MySQL Error Reference](/reference/mysql/error/overview/) for errors you may encounter. Here are some most common ones:
151+
152+
### Error 1206: The total number of locks exceeds the lock table size
153+
154+
```sql
155+
-- Increase innodb_buffer_pool_size (requires restart)
156+
SET GLOBAL innodb_buffer_pool_size = 8589934592; -- 8GB
157+
158+
-- Or perform the operation in smaller batches using external tools
159+
```
160+
161+
### Error 1114: The table is full
162+
163+
```sql
164+
-- Check available disk space
165+
df -h
166+
167+
-- Consider using external tool like pt-online-schema-change
168+
-- which uses less temporary space
169+
170+
-- Increase tablespace:
171+
ALTER TABLESPACE ts_name ADD DATAFILE 'file_name.ibd';
172+
```
173+
174+
### Error 1205: Lock wait timeout exceeded
175+
176+
```sql
177+
-- Increase lock wait timeout for this session
178+
SET SESSION innodb_lock_wait_timeout = 3600; -- 1 hour
179+
180+
-- Use less locking approach
181+
ALTER TABLE large_table
182+
ADD INDEX idx_column (column_name)
183+
ALGORITHM=INPLACE, LOCK=NONE;
184+
```
185+
186+
### Error 3032: Got error 28 from storage engine
187+
188+
```sql
189+
-- This indicates disk space issues
190+
-- Free up disk space or use external storage
191+
```
192+
193+
## Best Practices
194+
195+
1. **Schedule During Low Traffic Periods**: Perform index creation during off-peak hours.
196+
197+
2. **Monitor Server Resources**: Watch CPU, memory, disk I/O and space during the operation.
198+
199+
3. **Test in Staging**: Practice the operation in a similar environment with production-like data.
200+
201+
4. **Backup First**: Always take a backup before major schema changes.
202+
203+
5. **Consider Alternative Approaches**:
204+
205+
- Create the index on a replica first
206+
- Use an intermediate temporary table
207+
- Add indexes when initially loading data
208+
209+
6. **Use Partial Indexes** when appropriate:
210+
211+
```sql
212+
-- Index only part of a string column
213+
CREATE INDEX idx_large_text ON large_table (large_text_column(20));
214+
215+
-- Functional index on subset of data
216+
CREATE INDEX idx_partial ON large_table ((CASE WHEN status='active' THEN id END));
217+
```
218+
219+
7. **Monitor Replication**: If using replication, monitor lag on replicas during and after index creation.
220+
221+
8. **Have a Rollback Plan**: Document steps to remove the index if problems occur.
222+
223+
## Advanced Techniques
224+
225+
### Staged Index Creation for Extremely Large Tables
226+
227+
For tables with hundreds of millions of rows, consider a staged approach:
228+
229+
1. First create a partial index on recent/active data:
230+
231+
```sql
232+
CREATE INDEX idx_partial ON large_table (column_name)
233+
WHERE created_at > DATE_SUB(NOW(), INTERVAL 30 DAY);
234+
```
235+
236+
2. Create additional indexes to cover older data in stages, during separate maintenance windows.
237+
238+
### Using Covering Indexes for Large Tables
239+
240+
When querying large tables, covering indexes can dramatically improve performance:
241+
242+
```sql
243+
-- Create index that includes all columns used in the query
244+
CREATE INDEX idx_covering ON large_table (search_column, col1, col2, col3);
245+
246+
-- Query can now be served entirely from the index
247+
SELECT col1, col2, col3 FROM large_table WHERE search_column = 'value';
248+
```
249+
250+
This is especially beneficial for large tables as it reduces the need to access the main table data.

0 commit comments

Comments
 (0)