Skip to content

Commit 87140d6

Browse files
Add Clickhouse TTL job to TTL and Retention Section (#936)
1 parent 27c28da commit 87140d6

File tree

1 file changed

+108
-0
lines changed
  • docs/self_hosting/configuration

1 file changed

+108
-0
lines changed

docs/self_hosting/configuration/ttl.mdx

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,111 @@ TRACE_TIER_TTL_DURATION_SEC_MAP='{"longlived": 34560000, "shortlived": 1209600}'
3737
),
3838
]}
3939
/>
40+
41+
## ClickHouse TTL Cleanup Job
42+
43+
As of version **0.11**, a cron job runs on weekends to assist in deleting expired data that may not have been cleaned up by ClickHouse's built-in TTL mechanism.
44+
45+
:::warning Performance Considerations
46+
This job uses potentially long running **mutations** (`ALTER TABLE DELETE`), which are expensive operations that can impact ClickHouse's performance. We recommend running these operations only during off-peak hours (nights and weekends). During testing with **1 concurrent active** mutation (default), we did not observe significant CPU, memory, or latency increases.
47+
:::
48+
49+
### Default Schedule
50+
51+
By default, the cleanup job runs:
52+
53+
- **Saturday**: 8pm and 10pm UTC
54+
- **Sunday**: 12am, 2am, and 4am UTC
55+
56+
### Disabling the Job
57+
58+
To disable the cleanup job entirely:
59+
60+
```yaml
61+
queue:
62+
extraEnv:
63+
- name: "ENABLE_CLICKHOUSE_TTL_CLEANUP_CRON"
64+
value: "false"
65+
```
66+
67+
### Configuring the Schedule
68+
69+
You can customize when the cleanup job runs by modifying the cron expressions:
70+
71+
```yaml
72+
queue:
73+
extraEnv:
74+
# UTC: Sunday 12am/2am/4am
75+
- name: "CLICKHOUSE_TTL_CLEANUP_CRON_WEEKEND_MORNING"
76+
value: "0 0,2,4 * * 0"
77+
# UTC: Saturday 8pm/10pm
78+
- name: "CLICKHOUSE_TTL_CLEANUP_CRON_WEEKEND_EVENING"
79+
value: "0 20,22 * * 6"
80+
```
81+
82+
:::tip Single Schedule
83+
To run the job on a single cron schedule, set both `CLICKHOUSE_TTL_CLEANUP_CRON_WEEKEND_EVENING` and `CLICKHOUSE_TTL_CLEANUP_CRON_WEEKEND_MORNING` to the same value. Job locking prevents overlapping executions.
84+
:::
85+
86+
### Configuring Minimum Expired Rows Per Part
87+
88+
The job goes table by table, scanning parts and deleting data from parts containing a minimum number of expired rows. This threshold balances efficiency and thoroughness:
89+
90+
- **Too low**: Job scans entire parts to clear minimal data (inefficient)
91+
- **Too high**: Job misses parts with significant expired data
92+
93+
```yaml
94+
queue:
95+
extraEnv:
96+
- name: "CLICKHOUSE_TTL_CRON_MIN_EXPIRED_ROWS_PER_PART"
97+
value: "100000" # 100k expired rows
98+
```
99+
100+
#### Checking Expired Rows
101+
102+
Use this query to analyze expired rows in your tables, and tweak your minimum value accordingly:
103+
104+
```sql
105+
-- Query for Runs table. For other tables, replace 'ttl_seconds' with 'trace_ttl_seconds'
106+
SELECT
107+
_part,
108+
count() AS expired_rows
109+
FROM runs
110+
WHERE trace_first_received_at IS NOT NULL
111+
AND ttl_seconds IS NOT NULL
112+
AND toDateTime(assumeNotNull(trace_first_received_at) + toIntervalSecond(assumeNotNull(ttl_seconds))) < now()
113+
GROUP BY _part
114+
ORDER BY expired_rows DESC
115+
```
116+
117+
### Configuring Maximum Active Mutations
118+
119+
Delete operations can be time-consuming (~50 minutes for a 100GB part). You can increase concurrent mutations to speed up the process:
120+
121+
```yaml
122+
queue:
123+
extraEnv:
124+
- name: "CLICKHOUSE_TTL_CRON_MAX_ACTIVE_MUTATIONS"
125+
value: "1"
126+
```
127+
128+
:::danger Concurrent Mutations
129+
Increasing concurrent DELETE operations can severely impact system performance. Monitor your system carefully and only increase this value if you can tolerate potentially slower insert and read latencies.
130+
:::
131+
132+
### Emergency: Stopping Running Mutations
133+
134+
If you experience latency spikes and need to terminate a running mutation:
135+
136+
1. **Find active mutations**:
137+
138+
```sql
139+
SELECT * FROM system.mutations WHERE is_done = 0;
140+
```
141+
142+
Look for the `mutation_id` where the `command` column contains a `DELETE` statement.
143+
144+
2. **Kill the mutation**:
145+
```sql
146+
KILL MUTATION WHERE mutation_id = '<mutation_id>';
147+
```

0 commit comments

Comments
 (0)