Skip to content

add s3_tier storage policy with renamed disks#589

Merged
rustyrazorblade merged 8 commits intomainfrom
claude/issue-582-20260318-1757
Mar 19, 2026
Merged

add s3_tier storage policy with renamed disks#589
rustyrazorblade merged 8 commits intomainfrom
claude/issue-582-20260318-1757

Conversation

@rustyrazorblade
Copy link
Copy Markdown
Owner

Adds S3 cold tier support for ClickHouse:

  • New s3_tier storage policy: local NVMe as hot tier, S3 as cold tier (data moves automatically when local disk fills up)
  • Disk renames for clarity: defaultlocal, s3_caches3
  • --s3-tier-move-factor option on clickhouse init (default: 0.2 — moves data when local disk is 80% full)
  • End-to-end test verifies data appears in S3 after forced move

Closes #582

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

PR Review posted via separate message - see next comment for full review

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

PR Review: Add s3_tier storage policy with renamed disks

Overall this is a clean, well-structured implementation. The architecture is solid -- constant -> config -> event -> manifest builder -> K8s env var -> XML -- and the disk renames (default->local, s3_cache->s3) improve self-documentation. A few things worth addressing.

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

Bug / Correctness

Missing input validation for s3TierMoveFactor (ClickHouseInit.kt)

The valid range for ClickHouse move_factor is 0.0 to 1.0. Nothing prevents a user from passing a value outside this range (e.g. 1.5 or a negative), which would produce a broken ClickHouse config at runtime. Recommend adding a guard in execute():

require(s3TierMoveFactor in 0.0..1.0) {
    error message here
}

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

Clarity

Option description is slightly backwards (ClickHouseInit.kt line ~43)

Current: "Fraction of local disk free space that triggers data move to S3 tier"

This reads as though the amount of free space triggers the move, but ClickHouse moves data when free space falls below this fraction. At 0.2, data moves when less than 20% of local disk is free (80% full). Suggested wording:

"Move data to S3 tier when local disk free space falls below this fraction (0.0-1.0, default: DEFAULT-VALUE)"

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

Minor

No table cleanup in E2E test (bin/end-to-end-test)

test_s3_tier is created but never dropped. Not a functional problem, but adding a DROP TABLE IF EXISTS test_s3_tier at the end keeps the cluster state tidy between step reruns.


Everything else looks correct: Fabric8 for K8s manifests, typed events, constants in Constants.ClickHouse, meaningful tests (not mock-echo), OpenSpec docs all present.

@rustyrazorblade
Copy link
Copy Markdown
Owner Author

@claude assess the failing tests and propose fixes

@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

Claude finished @rustyrazorblade's task in 2m 2s —— View job


Failing Tests Fixed

  • Get CI status and identify failing tests
  • Download and analyze failure logs
  • Fix failing tests
  • Push fixes

Root cause: buildAllResources gained a 5th parameter (s3TierMoveFactor) in this PR, but ClickHouseStartTest still used only 4 any() matchers, causing Mockito to throw InvalidUseOfMatchersException: 5 matchers expected, 4 recorded in every test.

Fix (commit e0ff13f): Two changes in ClickHouseStartTest.kt:

  1. whenever(mockClickHouseManifestBuilder.buildAllResources(any(), any(), any(), any())) → added 5th any()
  2. verify(...).buildAllResources(...) → added s3TierMoveFactor = any()
    | Branch

@rustyrazorblade rustyrazorblade force-pushed the claude/issue-582-20260318-1757 branch from e0ff13f to 4fd5060 Compare March 18, 2026 19:20
@claude
Copy link
Copy Markdown

claude bot commented Mar 18, 2026

PR Review: add s3_tier storage policy with renamed disks

The overall structure is solid and follows established patterns well. The data flow (constant to config data class to event to manifest builder to ConfigMap key to env var to XML from_env attribute) is consistent with how existing ClickHouse parameters like s3CacheSize and s3CacheOnWrite are handled. Fabric8 is used correctly, constants live in Constants.ClickHouse, and tests exercise real logic.

A few issues to address before merging:

BUG: No input validation on --s3-tier-move-factor

ClickHouse requires move_factor to be in [0.0, 1.0]. Nothing currently prevents passing -1.0 or 2.0, which would produce a broken runtime config with no CLI-level error. Add a guard in ClickHouseInit.execute():

A unit test covering the invalid range case should accompany this.

UX: Option description is semantically backwards

This implies more free space triggers a move, but ClickHouse move_factor semantics are the opposite: 0.2 means move data when less than 20 percent of local disk is free (80% full). Suggested replacement: Move data to S3 tier when local disk free space falls below this fraction (0.0-1.0)

DOCS: docs/user-guide/clickhouse.md not updated

The storage policies section documents only local and s3_main. The new s3_tier policy and --s3-tier-move-factor option should be added to the policies comparison table and the clickhouse init options table. Per project convention, user-facing feature changes require doc updates.

MINOR: E2E test table not cleaned up

test_s3_tier is created in step_clickhouse_s3_tier_test but never dropped. Adding DROP TABLE IF EXISTS test_s3_tier at the end keeps reruns clean and avoids leftover state between runs.

TESTS: Coverage is good but missing validation test

The added tests are meaningful:

  • ConfigMap key assertion for s3-tier-move-factor: good
  • config.xml contains s3_tier and the env var reference: good
  • ClickHouseStartTest mock updated for the new 5th parameter: good

What is missing: a test verifying that an invalid move_factor (e.g. 1.5) is rejected at the CLI level. This pairs with the validation fix above.

The disk rename (s3_cache to s3) and the new tiered policy are architecturally sound. Addressing the validation gap and docs are the main blockers.

@rustyrazorblade rustyrazorblade force-pushed the claude/issue-582-20260318-1757 branch from 4fd5060 to dcd1ca2 Compare March 18, 2026 23:59
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Review Summary

A previous review pass has already identified the main issues. Confirming and prioritizing:

Must fix before merge:

  1. Input validation missing (ClickHouseInit.kt) - s3TierMoveFactor accepts any Double, but ClickHouse requires move_factor in [0.0, 1.0]. Add require(s3TierMoveFactor in 0.0..1.0) in execute() with a unit test covering the invalid range.

  2. Docs not updated (docs/user-guide/clickhouse.md) - The storage policies comparison table documents only local and s3_main. s3_tier and --s3-tier-move-factor need to be added, including a description of the hot/cold tiering semantics.

Should fix:

  1. Option description is backwards (ClickHouseInit.kt) - "Fraction of local disk free space that triggers data move" implies more free space triggers the move. The actual semantics: 0.2 means move data when local disk is 80%+ full (free space falls below 20%). Suggested wording: "Move data to S3 tier when local disk free space falls below this fraction (0.0-1.0)"

Minor:

  1. E2E test leaves table behind - test_s3_tier is never dropped in step_clickhouse_s3_tier_test. Add DROP TABLE IF EXISTS test_s3_tier ON CLUSTER easy_db_lab SYNC at the end of the step to keep reruns clean.

The Kotlin implementation is correct - the constant to config to event to ConfigMap to env var to XML from_env chain follows existing patterns, Fabric8 is used throughout, and the two new manifest builder tests exercise real logic.

Add a new s3_tier storage policy that provides automatic data movement
from local disks to S3 based on disk space availability. This combines
local performance for hot data with S3's unlimited capacity for cold data.

Key features:
- Starts with local NVMe storage for best write performance
- Automatically moves oldest data to S3 when disk free space falls below
  configured threshold (default: 20%)
- Configurable via --s3-tier-move-factor (range: 0.0-1.0)
- Data on S3 remains queryable with cache-assisted reads

Implementation:
- Add s3_tier policy with local and s3 disk volumes to storage.xml
- Add --s3-tier-move-factor option to clickhouse init command
- Include input validation to reject values outside [0.0, 1.0]
- Update ClickHouse config generation to use renamed disk identifiers
  (local_nvme, s3) for consistency across all policies
- Add comprehensive unit tests for boundary validation
- Add E2E test verifying data movement to S3

Documentation:
- Update docs/user-guide/clickhouse.md with s3_tier policy guide
- Add policy comparison table including all three policies
- Document when to use tiered storage and how it works
@rustyrazorblade rustyrazorblade force-pushed the claude/issue-582-20260318-1757 branch from 11479f3 to 3360b74 Compare March 19, 2026 13:56
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

PR Review: Add s3_tier storage policy with renamed disks

Good implementation overall -- clean layering, proper use of the event bus, constants in the right place, and solid test coverage for the validation logic. A few items worth discussing:

Potential Breaking Change: Disk Rename

The rename of default to local and s3_cache to s3 in config.xml is a breaking change for any existing cluster with data on disk. ClickHouse part metadata stores the disk name, so parts written before the rename will reference the old names and may not be readable after upgrade until the cluster is re-initialized. This should be called out in the PR description and/or docs (a migration note or a warning in the CLI output when clickhouse start detects an existing cluster).

End-to-End Test: Cleanup on Failure

In step_clickhouse_s3_tier_test, the DROP TABLE IF EXISTS test_s3_tier cleanup only runs if all preceding steps succeed. If the MOVE PARTITION command fails, the test table is left behind and could affect subsequent test steps. Consider adding cleanup before each return 1 in the failure branches.

Borderline Mock-Echo Test

init uses default s3TierMoveFactor when not specified (ClickHouseInitTest.kt line 616) asserts s3TierMoveFactor == DEFAULT_S3_TIER_MOVE_FACTOR. Per the project guidelines, asserting a default field value is discouraged because the compiler already guarantees it. The test verifies the execute() flow runs without error, but the final assertion is weak. Consider strengthening it or relying on the boundary tests (0.0, 1.0) to cover the config-saving path.

Minor: buildAllResources Default Parameter

s3TierMoveFactor has a default value in both buildAllResources and buildClusterConfigMap. Since ClickHouseStart always passes it explicitly, the default is never exercised in production. Removing the default would cause a compile error at any call site that forgets it rather than silently using 0.2 -- worth considering if you want to enforce the contract.

Nit: --replicas-per-shard Doc Addition

The docs table now includes --replicas-per-shard (clickhouse.md), which was previously missing. Good catch -- just worth noting it is an unrelated addition bundled into this PR.


The core feature (config.xml policy, env var injection, ConfigMap wiring, Kotlin config class, validation) looks correct and follows existing patterns well. The design doc is a useful addition.

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Code Review: The data flow from CLI option to state to ConfigMap to env var to config.xml is clean. A few things worth addressing: (1) Disk renames are a breaking change for existing clusters - worth noting in the PR so users know to recreate rather than restart. (2) Test cleanup in step_clickhouse_s3_tier_test only runs on success - consider a trap. (3) Validation via require() throws a stack trace; prefer eventBus.emit() per project conventions. (4) The test asserting default s3TierMoveFactor is saved is a banned mock-echo pattern per CLAUDE.md. (5) Minor: default parameter value on s3TierMoveFactor in buildAllResources/buildClusterConfigMap lets callers silently omit it.

@rustyrazorblade
Copy link
Copy Markdown
Owner Author

@claude do a thorough review of the branch. Clickhouse pod crashing on startup during end to end test. Look for configuration related errors. Can the same disk be configured as a cold tier and the main disk in different policies? Also look at the logs and double check the mount paths

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Claude finished @rustyrazorblade's task in 4m 49s —— View job


Thorough Review: ClickHouse Startup Crash Investigation

  • Read config.xml and understand disk/policy configuration
  • Check mount paths in K8s manifests
  • Investigate whether same disk can appear in multiple policies
  • Review end-to-end test for configuration issues
  • Post findings

Root Cause of the Startup Crash

local disk path conflicts with ClickHouse's implicit default disk.

In config.xml lines 24-28, the local disk is defined as:

<local>
    <type>local</type>
    <path>/mnt/db1/clickhouse/</path>
    <keep_free_space_bytes>1073741824</keep_free_space_bytes>
</local>

ClickHouse always creates an implicit default disk from the server-level <path> setting (line 15: <path>/mnt/db1/clickhouse/</path>). ClickHouse enforces unique paths for all local disks. The local disk tries to claim /mnt/db1/clickhouse/ at startup, but that path is already owned by the implicit default disk. Result: ClickHouse fails with something like:

DB::Exception: Disk 'local' is trying to use path '/mnt/db1/clickhouse/' which is already used by disk 'default'

Why the old code worked: The original <default> entry in <disks> is special — using the name default configures properties of the implicit disk (like keep_free_space_bytes) rather than creating a new disk object. The rename to local combined with an explicit <path> creates a second, conflicting disk at the same path.

Fix

Revert the disk definition to use default as the disk name (the policy names like local and s3_tier can stay). Remove the redundant <type> and <path> lines — those are implicit for the default disk:

<local>
    <type>local</type>
    <path>/mnt/db1/clickhouse/</path>
    <keep_free_space_bytes>1073741824</keep_free_space_bytes>
</local>


<default>
    <keep_free_space_bytes>1073741824</keep_free_space_bytes>
</default>

Then update the two policy volume references from <disk>local</disk><disk>default</disk> in both the local policy and the s3_tier hot volume. The policy names (local, s3_tier) are fine as-is — only the disk reference inside needs to change.

Fix this →


Can the Same Disk Be in Multiple Policies?

Yes — this is valid ClickHouse behavior. The s3 cache disk appearing in both s3_main (as main disk) and s3_tier (as cold tier) is fully supported. Policies are independent routing rules; the same disk object can be referenced in multiple policies. This is not contributing to the crash.


Mount Path Audit — No Issues Found

All paths in config.xml are under /mnt/db1/clickhouse/, which is the PVC mount point in buildServerStatefulSet() (line 644 of ClickHouseManifestBuilder.kt). The init container (buildServerInitDataDirContainer(), lines 512-531) creates all required subdirectories:

config.xml path created by init container
/mnt/db1/clickhouse/ mkdir -p /mnt/db1/clickhouse
/mnt/db1/clickhouse/tmp/ mkdir -p /mnt/db1/clickhouse/tmp
/mnt/db1/clickhouse/disks/s3_disk/ mkdir -p /mnt/db1/clickhouse/disks/s3_disk
/mnt/db1/clickhouse/disks/s3/ mkdir -p /mnt/db1/clickhouse/disks/s3

Mount paths are correct.


Secondary Issues

E2E test cleanup is now present but only on success. step_clickhouse_s3_tier_test does DROP TABLE IF EXISTS test_s3_tier at line 938, but it only runs if the preceding steps succeed. If the MOVE PARTITION command at line 924 fails, the table is left behind. Consider adding clickhouse-query "DROP TABLE IF EXISTS test_s3_tier ON CLUSTER easy_db_lab SYNC" before return 1 in the failure branches (or via a bash trap).

The MOVE PARTITION ... TO DISK 's3' at line 924 moves data to the s3 cache disk. This is correct — s3 is the disk name in config.xml (the renamed s3_cache). This will work once the startup crash is fixed.


Summary: The crash is caused by the local disk rename introducing a path conflict with ClickHouse's implicit default disk. The fix is to revert the disk name to default (keeping it as a property-only entry without explicit <type> or <path>), and update the two <disk>local</disk> references in the policy definitions. Everything else — the s3_tier policy design, disk sharing across policies, mount paths, env var wiring, and test logic — looks correct.
| Branch

@rustyrazorblade
Copy link
Copy Markdown
Owner Author

@claude do the fix in this branch

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Claude finished @rustyrazorblade's task in 1m 31s —— View job


Fix: ClickHouse startup crash (disk path conflict)

  • Analyze the fix needed
  • Read config.xml and related files
  • Fix disk name conflict in config.xml
  • Update policy volume disk references
  • Commit and push

Root cause: The <local> disk entry in config.xml explicitly set <path>/mnt/db1/clickhouse/</path>, which conflicts with ClickHouse's implicit default disk (automatically created from the server-level <path> setting). ClickHouse enforces unique paths for all local disks and crashes on startup with a path conflict error.

Fix (commit 49439ef): In config.xml:

  1. Renamed <local> disk entry → <default>, removing the redundant <type> and <path> fields (these are implicit for the default disk)
  2. Updated both <disk>local</disk> policy references → <disk>default</disk> (in the local policy and the s3_tier hot volume)

The policy names (local, s3_tier) are unchanged — only the disk name referenced inside the volume tags was corrected.
| Branch

github-actions bot and others added 3 commits March 19, 2026 16:03
…flict

ClickHouse enforces unique paths for all local disks. The renamed 'local'
disk was defining path '/mnt/db1/clickhouse/' which conflicts with the
implicit 'default' disk (created from the server-level <path> setting).
This caused ClickHouse to crash on startup with a path conflict error.

Fix: rename the disk entry back to 'default' (removing the redundant
<type> and <path> which are implicit for the default disk) and update
the two <disk>local</disk> policy references to <disk>default</disk>.
The policy names ('local', 's3_tier') are unchanged.

Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
Adds five new skills to enhance development workflow and knowledge access:

Execution Skills:
- debug-environment: Active cluster debugging with auto-diagnosis
  - Validates environment files and connectivity
  - Analyzes pod/service failures with structured reports
  - Provides 50+ common issue troubleshooting scenarios
- e2e-test: Intelligent test runner with auto-detection
  - Auto-detects test scope from code changes
  - Runs non-interactively with --no-teardown support
  - Automatically debugs failures using debug-environment

Expert Skills (Q&A agents):
- easy-db-lab-expert: General architecture and feature knowledge
  - Commands, configuration, observability, development patterns
- k8-expert: Kubernetes and Fabric8 manifest builder specialist
  - K3s architecture, StatefulSets, pod troubleshooting
- e2e-test-expert: Test suite specialist
  - Test scope selection, failure interpretation, cost estimates

Script Enhancement:
- bin/end-to-end-test: Add --no-teardown flag for automation
  - Enables non-interactive test execution
  - Exit code indicates pass (0) or fail (1)
  - Leaves cluster running for debugging

Each skill includes comprehensive documentation:
- SKILL.md: Agent instructions with dynamic context
- README.md: User documentation with examples
- reference/: Detailed troubleshooting and command guides
- templates/: Structured output formats

Integration:
- e2e-test automatically invokes debug-environment on failures
- Skills prevent duplicate cluster creation
- Experts reference each other for specialized questions
- Supports full automated test-diagnose-fix workflow
The end-to-end-test script handles building automatically and uses
--clean flag to remove old files. No need for the skill to check
builds or suggest cleaning - the script does everything.
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Code Review - PR 589 (s3_tier storage policy)

Summary of findings:

  1. BUG - PR description inaccuracy: The description claims "Disk renames for clarity: default to local, s3_cache to s3" but only s3_cache was renamed to s3 in config.xml. The default disk retains its name and is correctly referenced in the s3_tier policy. Update the description to remove the default-to-local claim, or implement the rename.

  2. MOCK-ECHO TEST - ClickHouseStartTest: The verify() call uses s3TierMoveFactor = any() which violates the project no-mock-echo rule. It proves the mock was called but not that the correct value flows through. Should use eq(Constants.ClickHouse.DEFAULT_S3_TIER_MOVE_FACTOR) to catch regressions.

  3. MINOR - s3TierMoveFactor = 0.0 silently disables tiering: A value of 0.0 means ClickHouse never automatically moves data. This should be documented in the CLI option description or user docs.

  4. NIT - Redundant default on buildClusterConfigMap: Both buildAllResources and buildClusterConfigMap declare the same default for s3TierMoveFactor. Since buildClusterConfigMap is only called from buildAllResources with an explicit value, its default is dead code.

  5. POSITIVE - The --no-teardown flag is clean and useful for CI pipelines.

The skill now actively monitors test execution and reports:
- Step transitions as they occur
- Step outcomes (pass/fail)
- Progress updates every few steps
- Immediate error detection
- Milestone completions

Provides full transparency during 15-60 minute test runs instead of
silent execution.
CRITICAL SAFETY RULE: The e2e-test skill is now explicitly prohibited
from using rm commands for any reason.

Why:
- Test script handles cleanup via --clean flag
- Files are never stale (clusters are fresh)
- Manual deletion can break active clusters
- User may need files for debugging

Added prominent warnings in both SKILL.md and README.md with no
exceptions clause.
When tests fail, the e2e-test skill now uses agent teams (if available)
for immediate parallel investigation:

Main agent:
- Processes test results
- Summarizes failed steps
- Prepares preliminary report

Team member agent (in parallel):
- Checks K8s pods and services
- Reviews logs and events
- SSHs to nodes to check systemd services
- Inspects cluster resources
- Identifies failure patterns

Team member is read-only (can inspect but not modify).

Combined findings from both agents provide faster root cause diagnosis.

Fallback: If agent teams unavailable, uses sequential debugging with
debug-environment skill.
Changed from single-agent execution to coordinated multi-agent pattern:

Agent Roles:
1. Main Agent (Coordinator):
   - Determines test scope from code changes
   - Delegates test execution to team member
   - Relays progress to user
   - Coordinates investigation on failures
   - Synthesizes findings from multiple agents
   - Provides final recommendations

2. Test Runner Team Member:
   - Executes bin/end-to-end-test
   - Monitors output in real-time
   - Reports step transitions and outcomes
   - Provides test-level failure context

3. Investigation Team Member (on failure):
   - Investigates live cluster state (read-only)
   - Checks pods, services, logs, events
   - SSHs to nodes for systemd status
   - Reports cluster-level findings

Benefits:
- Parallel investigation starts immediately
- Multiple perspectives (test + cluster state)
- Faster time to diagnosis
- Coordinated single analysis from multiple sources

Fallback to sequential debugging if agent teams unavailable.
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

PR Review: ClickHouse S3 Tier Storage Policy. Overall clean, well-structured implementation consistent with existing patterns. Issues: (1) Inaccurate PR description - only s3_cache->s3 rename happened, not default->local. (2) Test using .first should use firstOrNull for clearer failures. (3) E2E test should verify row count after S3 move. (4) move_factor=1.0 semantics should be documented. Positives: require() validation is idiomatic, boundary tests are thorough, s3TierMoveFactor has correct default for backward compat, from_env pattern is consistent, no-teardown flag is useful for CI, docs are comprehensive.

@rustyrazorblade rustyrazorblade merged commit 9b208c8 into main Mar 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support s3 cold tier in clickhouse

1 participant