-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Description
Name and Version
bitnamilegacy/clickhouse:25.7.5-debian-12-r0
What steps will reproduce the bug?
We are running ClickHouse via Bitnami Helm chart to store OpenTelemetry telemetry (traces/logs/metrics). In production traffic (100k–200k spans/sec), inserts generate parts faster than merges can process, leading to:
- frequent Too many parts ... merges slower than inserts errors
- very slow merge backlog
- expired partitions not fully dropped for 2–3 days, even with TTL enabled
- very high memory usage due to background merges
We need guidance on:
- How to speed up merges, especially for expired partitions
- Memory optimisation / stability tuning under high insert rates
Environment
-
Deployment: Bitnami Helm ClickHouse
-
ClickHouse image: bitnamilegacy/clickhouse:25.7.5-debian-12-r0
-
Cluster topology: 3 shards × 2 replicas (ReplicatedMergeTree)
-
Node resources per pod:
- CPU: 32 vCPU
- Memory: 268 GiB -
Workload: OpenTelemetry traces/logs/metrics
-
Ingestion:
- Dev traffic: ~5k spans/sec (works fine) - Prod traffic: 100k–200k spans/sec -
Hashing / sharding: we use sipHash64(TraceId) and observe distribution is good (even across shards)
Tables / Schema:
We have local replicated tables + distributed tables for: traces/metrics/logs [check attached file]
Table design (example traces):
Engine: ReplicatedMergeTree
Partitioning: PARTITION BY toDate(IngestedAt)
TTL toDateTime(IngestedAt) + toIntervalDay(1)
enabled ttl_only_drop_parts=1
Distributed table writes through exporter to traces_dist
Problem
In production, insert concurrency is very high and merges cannot keep up. This causes:
-
Too many parts error
We hit this regularly:
code: 252, message: Too many parts (3001 with average size of 84.45 MiB) in table 'otel.traces (...UUID...)'.
Merges are processing significantly slower than inserts
After this happens:
exporter requests fail and retry
otel collector exporter queue increases
ingestion backlog accumulates -
TTL expired partitions not dropping quickly
TTL does work and partitions gradually shrink, however for very large partitions (example: ~3 TB/day partition), fully dropping can take 2–3 days due to merge backlog.
We see:
expired partitions still present (with active parts)
size gradually decreasing over time instead of being dropped soon after expiry -
Memory pressure
Large percentage of pod memory is used (merges + background tasks) and we see overall memory heavily utilized during merge backlog.
Mitigation/Tuning applied so far
We increased background merge capacity:
background_pool_size: 32
background_merges_mutations_concurrency_ratio: 2
This helped but did not resolve the issue completely under 100k–200k spans/sec.
Otel clickhouse exporter:
clickhouse:
endpoint: tcp://clickhouse.svc.cluster.local:9000
username: default
password: mysecurepassword
create_schema: false
database: otel
traces_table_name: traces_dist
ttl: 24h
sending_queue:
enabled: true
num_consumers: 25
queue_size: 20000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
timeout: 30s
batch/clickhouse:
send_batch_max_size: 12000
send_batch_size: 10000
timeout: 5s
Are you using any custom parameters or values?
No response
What is the expected behavior?
No response
What do you see instead?
Merge process is slow and memory utilising full
Additional information
No response