Skip to content

Commit b27bab1

Browse files
Update async inserts article (#141)
1 parent dccebb2 commit b27bab1

File tree

1 file changed

+67
-16
lines changed

1 file changed

+67
-16
lines changed

content/en/altinity-kb-queries-and-syntax/async-inserts.md

Lines changed: 67 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,78 @@
22
title: "Async INSERTs"
33
linkTitle: "Async INSERTs"
44
description: >
5-
Async INSERTs
5+
Comprehensive guide to ClickHouse Async INSERTs - configuration, best practices, and monitoring
66
---
77

8-
Async INSERTs is a ClickHouse® feature tha enables batching data automatically and transparently on the server-side. We recommend to batch at app/ingestor level because you will have more control and you decouple this responsibility from ClickHouse, but there are use cases where this is not possible and Async inserts come in handy if you have hundreds or thousands of clients doing small inserts.
8+
## Overview
99

10-
You can check how they work here: [Async inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts)
10+
Async INSERTs is a ClickHouse® feature that enables automatic server-side batching of data. While we generally recommend batching at the application/ingestor level for better control and decoupling, async inserts are valuable when you have hundreds or thousands of clients performing small inserts and client-side batching is not feasible.
1111

12-
Some insights about Async inserts you should now:
12+
**Key Documentation:** [Official Async Inserts Documentation](https://clickhouse.com/docs/en/optimize/asynchronous-inserts)
1313

14-
* Async inserts give acknowledgment immediately after the data got inserted into the buffer (wait_for_async_insert = 0) or by default, after the data got written to a part after flushing from buffer (wait_for_async_insert = 1).
15-
* `INSERT .. SELECT` is NOT async insert. (You can use matView + Null table OR ephemeral columns instead of INPUT function so Async inserts will work)
16-
* Async inserts will do (idempotent) retries.
17-
* Async inserts can do batching, so multiple inserts can be squashed as a single insert (but in that case, retries are not idempotent anymore).
18-
* Important to use `wait_for_async_insert = 1` because with any error you will loose data without knowing it. For example your table is read only -> losing data, out of disk space -> losing data, too many parts -> losing data.
19-
* If `wait_for_async_insert = 0`:
20-
* Async inserts can loose your data in case of sudden restart (no fsyncs by default).
21-
* Async inserted data becomes available for selects not immediately after acknowledgment.
22-
* Async insert is fast sending ACK to clients unblocking them, because they have to wait until ACK is received. If your use case can handle data loss, you can use `wait_for_async_insert = 0` it will increase the throughput.
23-
* Async inserts generally have more `moving parts` there are some background threads monitoring new data to be sent and pushing it out.
24-
* Async inserts require extra monitoring from different system.tables (see `system.part_log`, `system.query_log`, `system.asynchronous_inserts` and `system_asynchronous_insert_log`).
25-
* The new `async_insert_use_adaptive_busy_timeout` setting enables adaptive async inserts starting in 24.3. It is turned on by default, and ClickHouse ignores manual settings like `async_insert_busy_timeout_ms`, which can be confusing. Turn off adaptive async inserts if you want deterministing behavior. (`async_insert_use_adaptive_busy_timeout = 0`)
14+
## How Async Inserts Work
15+
16+
When `async_insert=1` is enabled, ClickHouse buffers incoming inserts and flushes them to disk when one of these conditions is met:
17+
1. Buffer reaches specified size (`async_insert_max_data_size`)
18+
2. Time threshold elapses (`async_insert_busy_timeout_ms`)
19+
3. Maximum number of queries accumulate (`async_insert_max_query_number`)
20+
21+
## Critical Configuration Settings
22+
23+
### Core Settings
24+
25+
```sql
26+
-- Enable async inserts (0=disabled, 1=enabled)
27+
SET async_insert = 1;
28+
29+
-- Wait behavior (STRONGLY RECOMMENDED: use 1)
30+
-- 0 = fire-and-forget mode (risky - no error feedback)
31+
-- 1 = wait for data to be written to storage
32+
SET wait_for_async_insert = 1;
33+
34+
-- Buffer flush conditions
35+
SET async_insert_max_data_size = 1000000; -- 1MB default
36+
SET async_insert_busy_timeout_ms = 1000; -- 1 second
37+
SET async_insert_max_query_number = 100; -- max queries before flush
38+
```
39+
40+
### Adaptive Timeout (Since 24.3)
41+
42+
```sql
43+
-- Adaptive timeout automatically adjusts flush timing based on server load
44+
-- Default: 1 (enabled) - OVERRIDES manual timeout settings
45+
-- Set to 0 for deterministic behavior with manual settings
46+
SET async_insert_use_adaptive_busy_timeout = 0;
47+
```
48+
49+
## Important Behavioral Notes
50+
51+
### What Works and What Doesn't
52+
53+
**Works with Async Inserts:**
54+
- Direct INSERT with VALUES
55+
- INSERT with FORMAT (JSONEachRow, CSV, etc.)
56+
- Native protocol inserts (since 22.x)
57+
58+
**Does NOT Work:**
59+
- `INSERT .. SELECT` statements - Other strategies are needed for managing performance and load. Do not use `async_insert`.
60+
61+
### Data Safety Considerations
62+
63+
**ALWAYS use `wait_for_async_insert = 1` in production!**
64+
65+
Risks with `wait_for_async_insert = 0`:
66+
- **Silent data loss** on errors (read-only table, disk full, too many parts)
67+
- Data loss on sudden restart (no fsync by default)
68+
- Data not immediately queryable after acknowledgment
69+
- No error feedback to client
70+
71+
### Deduplication Behavior
72+
73+
- **Sync inserts:** Automatic deduplication enabled by default
74+
- **Async inserts:** Deduplication disabled by default
75+
- Enable with `async_insert_deduplicate = 1` (since 22.x)
76+
- **Warning:** Don't use with `deduplicate_blocks_in_dependent_materialized_views = 1`
2677

2778
# features / improvements
2879

0 commit comments

Comments
 (0)