Skip to content

Commit 3360b74

Browse files
feat(clickhouse): add s3_tier storage policy with automatic tiering
Add a new s3_tier storage policy that provides automatic data movement from local disks to S3 based on disk space availability. This combines local performance for hot data with S3's unlimited capacity for cold data. Key features: - Starts with local NVMe storage for best write performance - Automatically moves oldest data to S3 when disk free space falls below configured threshold (default: 20%) - Configurable via --s3-tier-move-factor (range: 0.0-1.0) - Data on S3 remains queryable with cache-assisted reads Implementation: - Add s3_tier policy with local and s3 disk volumes to storage.xml - Add --s3-tier-move-factor option to clickhouse init command - Include input validation to reject values outside [0.0, 1.0] - Update ClickHouse config generation to use renamed disk identifiers (local_nvme, s3) for consistency across all policies - Add comprehensive unit tests for boundary validation - Add E2E test verifying data movement to S3 Documentation: - Update docs/user-guide/clickhouse.md with s3_tier policy guide - Add policy comparison table including all three policies - Document when to use tiered storage and how it works
1 parent 98d2ccd commit 3360b74

File tree

15 files changed

+383
-19
lines changed

15 files changed

+383
-19
lines changed

bin/end-to-end-test

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -892,6 +892,54 @@ EOF
892892
clickhouse-query "INSERT INTO test (id) VALUES (1)"
893893
}
894894

895+
step_clickhouse_s3_tier_test() {
896+
if [ "$ENABLE_CLICKHOUSE" != true ]; then
897+
echo "=== Skipping ClickHouse S3 tier test (use --clickhouse to enable) ==="
898+
return 0
899+
fi
900+
echo "=== Testing ClickHouse S3 tier storage policy ==="
901+
902+
local data_bucket
903+
data_bucket=$(jq -r '.dataBucket // empty' state.json 2>/dev/null)
904+
if [ -z "$data_bucket" ]; then
905+
echo "ERROR: dataBucket not found in state.json"
906+
return 1
907+
fi
908+
echo "Data bucket: $data_bucket"
909+
910+
echo "--- Creating table with s3_tier storage policy ---"
911+
clickhouse-query <<'EOF'
912+
CREATE OR REPLACE TABLE test_s3_tier (
913+
id UInt64,
914+
ts DateTime DEFAULT now()
915+
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test_s3_tier', '{replica}')
916+
ORDER BY id
917+
SETTINGS storage_policy = 's3_tier'
918+
EOF
919+
920+
echo "--- Inserting 10000 rows ---"
921+
clickhouse-query "INSERT INTO test_s3_tier (id) SELECT number FROM numbers(10000)"
922+
923+
echo "--- Forcing data move to S3 disk ---"
924+
clickhouse-query "ALTER TABLE test_s3_tier MOVE PARTITION tuple() TO DISK 's3'"
925+
926+
echo "--- Verifying data exists in S3 bucket ---"
927+
local s3_object_count
928+
s3_object_count=$(aws s3 ls "s3://${data_bucket}/clickhouse/" --recursive 2>/dev/null | wc -l)
929+
echo "S3 object count under clickhouse/: $s3_object_count"
930+
if [ "$s3_object_count" -gt 0 ]; then
931+
echo "S3 tier test PASSED: $s3_object_count objects found in s3://${data_bucket}/clickhouse/"
932+
else
933+
echo "ERROR: No objects found in s3://${data_bucket}/clickhouse/ after moving data to S3 tier"
934+
return 1
935+
fi
936+
937+
echo "--- Cleaning up test table ---"
938+
clickhouse-query "DROP TABLE IF EXISTS test_s3_tier"
939+
940+
echo "=== ClickHouse S3 tier test completed ==="
941+
}
942+
895943
step_clickhouse_stop() {
896944
if [ "$ENABLE_CLICKHOUSE" != true ]; then
897945
echo "=== Skipping ClickHouse stop (use --clickhouse to enable) ==="
@@ -1245,6 +1293,7 @@ STEPS_WORKDIR=(
12451293
"step_cassandra_start_stop:Cassandra start/stop cycle"
12461294
"step_clickhouse_start:Start ClickHouse"
12471295
"step_clickhouse_test:Test ClickHouse"
1296+
"step_clickhouse_s3_tier_test:Test ClickHouse S3 tier policy"
12481297
"step_clickhouse_stop:Stop ClickHouse"
12491298
"step_opensearch_start:Start OpenSearch"
12501299
"step_opensearch_test:Test OpenSearch"

docs/user-guide/clickhouse.md

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ easy-db-lab clickhouse init --s3-cache-on-write false
3434
|--------|-------------|---------|
3535
| `--s3-cache` | Size of the local S3 cache | 10Gi |
3636
| `--s3-cache-on-write` | Cache data during write operations | true |
37+
| `--s3-tier-move-factor` | Move data to S3 tier when local disk free space falls below this fraction (0.0-1.0) | 0.2 |
38+
| `--replicas-per-shard` | Number of replicas per shard | 3 |
3739

3840
Configuration is saved to the cluster state and applied when you run `clickhouse start`.
3941

@@ -189,14 +191,14 @@ ClickHouse is configured with two storage policies. You select the policy when c
189191

190192
### Policy Comparison
191193

192-
| Aspect | `local` | `s3_main` |
193-
|--------|---------|-----------|
194-
| **Storage Location** | Local NVMe disks | S3 bucket with configurable local cache |
195-
| **Performance** | Best latency, highest throughput | Higher latency, cache-dependent |
196-
| **Capacity** | Limited by disk size | Virtually unlimited |
197-
| **Cost** | Included in instance cost | S3 storage + request costs |
198-
| **Data Persistence** | Lost when cluster is destroyed | Persists independently |
199-
| **Best For** | Benchmarks, low-latency queries | Large datasets, cost-sensitive workloads |
194+
| Aspect | `local` | `s3_main` | `s3_tier` |
195+
|--------|---------|-----------|-----------|
196+
| **Storage Location** | Local NVMe disks | S3 bucket with configurable local cache | Hybrid: starts local, moves to S3 when disk fills |
197+
| **Performance** | Best latency, highest throughput | Higher latency, cache-dependent | Good initially, degrades as data moves to S3 |
198+
| **Capacity** | Limited by disk size | Virtually unlimited | Virtually unlimited |
199+
| **Cost** | Included in instance cost | S3 storage + request costs | S3 storage + request costs |
200+
| **Data Persistence** | Lost when cluster is destroyed | Persists independently | Persists independently |
201+
| **Best For** | Benchmarks, low-latency queries | Large datasets, cost-sensitive workloads | Mixed hot/cold workloads with automatic tiering |
200202

201203
### Local Storage (`local`)
202204

@@ -251,6 +253,50 @@ SETTINGS storage_policy = 's3_main';
251253
- Cache is automatically managed by ClickHouse
252254
- First query on cold data will be slower; subsequent queries use cache
253255

256+
### S3 Tiered Storage (`s3_tier`)
257+
258+
The S3 tiered policy provides automatic data movement from local disks to S3 based on disk space availability. This policy starts with local storage and automatically moves data to S3 when local disk space runs low, providing the best of both worlds: fast local performance for hot data and unlimited S3 capacity for cold data.
259+
260+
**Prerequisite**: Your cluster must be initialized with an S3 bucket. Set this during `init`:
261+
262+
```bash
263+
easy-db-lab init my-cluster --s3-bucket my-clickhouse-data
264+
```
265+
266+
Configure the tiering behavior before starting ClickHouse:
267+
268+
```bash
269+
# Move data to S3 when local disk free space falls below 20% (default)
270+
easy-db-lab clickhouse init --s3-tier-move-factor 0.2
271+
272+
# More aggressive tiering - move when free space < 50%
273+
easy-db-lab clickhouse init --s3-tier-move-factor 0.5
274+
```
275+
276+
Then create tables with S3 tiered storage:
277+
278+
```sql
279+
CREATE TABLE my_table (...)
280+
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/my_table', '{replica}')
281+
ORDER BY id
282+
SETTINGS storage_policy = 's3_tier';
283+
```
284+
285+
**When to use S3 tiered storage:**
286+
287+
- Workloads with mixed hot/cold data access patterns
288+
- Growing datasets that may outgrow local disk capacity
289+
- Want automatic cost optimization without manual intervention
290+
- Need local performance for recent data with S3 capacity for historical data
291+
292+
**How automatic tiering works:**
293+
294+
- New data is written to local disks first (fast writes)
295+
- When local disk free space falls below the configured threshold (default: 20%), ClickHouse automatically moves the oldest data to S3
296+
- Data on S3 is still queryable but with higher latency
297+
- The local cache (configured with `--s3-cache`) helps performance for frequently accessed S3 data
298+
- Manual moves are also possible: `ALTER TABLE my_table MOVE PARTITION tuple() TO DISK 's3'`
299+
254300
## Stopping ClickHouse
255301

256302
To remove the ClickHouse cluster:
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Design: ClickHouse S3 Tier Storage Policy
2+
3+
## Data Flow
4+
5+
```
6+
clickhouse init --s3-tier-move-factor 0.1
7+
→ ClickHouseConfig.s3TierMoveFactor = 0.1
8+
→ saved to state.json
9+
10+
clickhouse start
11+
→ ClickHouseManifestBuilder.buildClusterConfigMap(s3TierMoveFactor = 0.1)
12+
→ ConfigMap key: "s3-tier-move-factor" = "0.1"
13+
→ ClickHouseManifestBuilder.buildServerContainer()
14+
→ env var: CLICKHOUSE_S3_TIER_MOVE_FACTOR from ConfigMap
15+
→ Pod reads env var → config.xml reads <move_factor from_env="CLICKHOUSE_S3_TIER_MOVE_FACTOR"/>
16+
```
17+
18+
## config.xml Changes
19+
20+
### Disk renames
21+
- `default``local` (explicit `type=local`, path `/mnt/db1/clickhouse/`)
22+
- `s3_cache``s3` (cache layer over `s3_disk`, path `/mnt/db1/clickhouse/disks/s3/`)
23+
24+
### New policy
25+
```xml
26+
<s3_tier>
27+
<volumes>
28+
<hot><disk>local</disk></hot>
29+
<cold><disk>s3</disk></cold>
30+
</volumes>
31+
<move_factor from_env="CLICKHOUSE_S3_TIER_MOVE_FACTOR"/>
32+
</s3_tier>
33+
```
34+
35+
## Kotlin Changes
36+
37+
### Constants.ClickHouse
38+
```kotlin
39+
const val DEFAULT_S3_TIER_MOVE_FACTOR = 0.2
40+
```
41+
42+
### ClickHouseConfig (ClusterState.kt)
43+
```kotlin
44+
data class ClickHouseConfig(
45+
...
46+
val s3TierMoveFactor: Double = Constants.ClickHouse.DEFAULT_S3_TIER_MOVE_FACTOR,
47+
)
48+
```
49+
50+
### ClickHouseInit
51+
New option: `--s3-tier-move-factor`
52+
53+
### ClickHouseManifestBuilder
54+
- `buildClusterConfigMap`: adds `"s3-tier-move-factor"` key
55+
- `buildServerContainer`: adds `CLICKHOUSE_S3_TIER_MOVE_FACTOR` env var from ConfigMap
56+
- `buildAllResources` + `buildClusterConfigMap`: accept `s3TierMoveFactor` param
57+
- `buildServerInitDataDirContainer`: creates `/mnt/db1/clickhouse/disks/s3` (renamed from `s3_cache`)
58+
59+
### ClickHouseStart
60+
Passes `clickHouseConfig.s3TierMoveFactor` to `buildAllResources`.
61+
62+
### Event.ClickHouse.ConfigSaved
63+
New field: `s3TierMoveFactor: Double`
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Proposal: ClickHouse S3 Tier Storage Policy
2+
3+
## Problem
4+
5+
ClickHouse supports two storage modes today:
6+
- `local`: NVMe only
7+
- `s3_main`: S3 with local cache (data lives primarily in S3)
8+
9+
There is no policy for a tiered approach where data starts on fast local NVMe and migrates to S3 only when local disk fills up. This is the classic "hot/cold" tiering pattern that maximises write speed while keeping storage costs low.
10+
11+
## Proposed Solution
12+
13+
Add a new `s3_tier` storage policy to ClickHouse with:
14+
- **Hot volume**: local NVMe disk (`local` disk)
15+
- **Cold volume**: S3 with local cache (`s3` disk)
16+
- **Automatic migration**: controlled by `move_factor` — when local disk is `(1 - move_factor)` full, ClickHouse moves the oldest parts to the cold volume
17+
18+
The move factor is configurable via `--s3-tier-move-factor` on `clickhouse init` (default: `0.2`, meaning data moves when local disk reaches 80% capacity).
19+
20+
## Disk Rename
21+
22+
As part of this change, disk names are made more descriptive:
23+
- `default``local` (explicit local disk definition)
24+
- `s3_cache``s3` (the cache-over-S3 disk)
25+
26+
This makes storage policies self-documenting: `local` policy uses `local` disk, `s3_main` policy uses `s3` disk, `s3_tier` policy uses both.
27+
28+
## User Experience
29+
30+
```bash
31+
# Use default move factor (0.2)
32+
easy-db-lab clickhouse init
33+
34+
# Use custom move factor
35+
easy-db-lab clickhouse init --s3-tier-move-factor 0.1
36+
37+
easy-db-lab clickhouse start
38+
39+
# Create a table with tiered storage
40+
clickhouse-query "CREATE TABLE events ... SETTINGS storage_policy = 's3_tier'"
41+
```
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Tasks: ClickHouse S3 Tier Storage Policy
2+
3+
## Implementation
4+
5+
- [x] Add `DEFAULT_S3_TIER_MOVE_FACTOR = 0.2` to `Constants.ClickHouse`
6+
- [x] Add `s3TierMoveFactor: Double` field to `ClickHouseConfig`
7+
- [x] Add `--s3-tier-move-factor` option to `ClickHouseInit`
8+
- [x] Update `Event.ClickHouse.ConfigSaved` to include `s3TierMoveFactor`
9+
- [x] Rename disks in `config.xml`: `default``local`, `s3_cache``s3`
10+
- [x] Add `s3_tier` policy to `config.xml`
11+
- [x] Update `ClickHouseManifestBuilder.buildClusterConfigMap` to store `s3-tier-move-factor`
12+
- [x] Update `ClickHouseManifestBuilder.buildServerContainer` to inject `CLICKHOUSE_S3_TIER_MOVE_FACTOR` env var
13+
- [x] Update `ClickHouseManifestBuilder.buildAllResources` signature
14+
- [x] Update `buildServerInitDataDirContainer` to create `/mnt/db1/clickhouse/disks/s3`
15+
- [x] Wire `ClickHouseStart` to pass `s3TierMoveFactor` from config to builder
16+
17+
## Tests
18+
19+
- [x] Update `ClickHouseManifestBuilderTest` to pass `s3TierMoveFactor`
20+
- [x] Add assertion for `s3-tier-move-factor` in cluster config ConfigMap test
21+
- [x] Add test asserting `s3_tier` policy appears in `config.xml`
22+
- [x] Add `step_clickhouse_s3_tier_test` to `bin/end-to-end-test`:
23+
- Creates table with `s3_tier` policy
24+
- Inserts 10,000 rows
25+
- Forces move to S3 disk via `ALTER TABLE ... MOVE PARTITION tuple() TO DISK 's3'`
26+
- Verifies S3 bucket contains objects under `clickhouse/` prefix

src/main/kotlin/com/rustyrazorblade/easydblab/Constants.kt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,7 @@ object Constants {
237237
const val DEFAULT_S3_CACHE_SIZE = "10Gi"
238238
const val DEFAULT_S3_CACHE_ON_WRITE = "true"
239239
const val DEFAULT_REPLICAS_PER_SHARD = 3
240+
const val DEFAULT_S3_TIER_MOVE_FACTOR = 0.2
240241
}
241242

242243
// YACE (Yet Another CloudWatch Exporter) configuration

src/main/kotlin/com/rustyrazorblade/easydblab/commands/clickhouse/ClickHouseInit.kt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,25 @@ class ClickHouseInit : PicoBaseCommand() {
4040
)
4141
var replicasPerShard: Int = Constants.ClickHouse.DEFAULT_REPLICAS_PER_SHARD
4242

43+
@Option(
44+
names = ["--s3-tier-move-factor"],
45+
description = ["Move data to S3 tier when local disk free space falls below this fraction (0.0-1.0) (default: \${DEFAULT-VALUE})"],
46+
)
47+
var s3TierMoveFactor: Double = Constants.ClickHouse.DEFAULT_S3_TIER_MOVE_FACTOR
48+
4349
override fun execute() {
50+
// Validate s3TierMoveFactor range [0.0, 1.0] per ClickHouse requirements
51+
require(s3TierMoveFactor in 0.0..1.0) {
52+
"s3TierMoveFactor must be in range [0.0, 1.0], got: $s3TierMoveFactor"
53+
}
54+
4455
val state = clusterStateManager.load()
4556
val config =
4657
ClickHouseConfig(
4758
s3CacheSize = s3CacheSize,
4859
s3CacheOnWrite = s3CacheOnWrite,
4960
replicasPerShard = replicasPerShard,
61+
s3TierMoveFactor = s3TierMoveFactor,
5062
)
5163
state.updateClickHouseConfig(config)
5264
clusterStateManager.save(state)
@@ -56,6 +68,7 @@ class ClickHouseInit : PicoBaseCommand() {
5668
replicasPerShard = replicasPerShard,
5769
s3CacheSize = s3CacheSize,
5870
s3CacheOnWrite = s3CacheOnWrite,
71+
s3TierMoveFactor = s3TierMoveFactor,
5972
),
6073
)
6174
}

src/main/kotlin/com/rustyrazorblade/easydblab/commands/clickhouse/ClickHouseStart.kt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import picocli.CommandLine.Option
2626
* Storage policies available for tables:
2727
* - 'local': Local disk storage (default)
2828
* - 's3_main': S3 storage with local cache
29+
* - 's3_tier': Local NVMe as hot tier, S3 as cold tier (automatic data migration)
2930
*
3031
* Example creating a distributed replicated table:
3132
* ```sql
@@ -197,6 +198,7 @@ class ClickHouseStart : PicoBaseCommand() {
197198
replicasPerShard = replicasPerShard,
198199
s3CacheSize = clickHouseConfig.s3CacheSize,
199200
s3CacheOnWrite = clickHouseConfig.s3CacheOnWrite,
201+
s3TierMoveFactor = clickHouseConfig.s3TierMoveFactor,
200202
)
201203

202204
for (resource in resources) {

src/main/kotlin/com/rustyrazorblade/easydblab/configuration/ClusterState.kt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ data class ClickHouseConfig(
156156
val s3CacheSize: String = Constants.ClickHouse.DEFAULT_S3_CACHE_SIZE,
157157
val s3CacheOnWrite: String = Constants.ClickHouse.DEFAULT_S3_CACHE_ON_WRITE,
158158
val replicasPerShard: Int = Constants.ClickHouse.DEFAULT_REPLICAS_PER_SHARD,
159+
val s3TierMoveFactor: Double = Constants.ClickHouse.DEFAULT_S3_TIER_MOVE_FACTOR,
159160
)
160161

161162
/**

0 commit comments

Comments
 (0)