feat: OpenSpec proposal for bcache support (#562)#563
feat: OpenSpec proposal for bcache support (#562)#563rustyrazorblade wants to merge 5 commits intomainfrom
Conversation
Add openspec change for #562: --bcache flag on init that uses local NVMe as write-back cache in front of EBS via Linux bcache. Includes proposal, design, tasks, and specs for: - bcache capability (new spec) - instance-storage-validation (updated with bcache scenarios) Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
Code Coverage
|
PR Review: bcache Support OpenSpec ProposalThis is a well-structured proposal. The design is thorough, the non-goals are appropriately scoped, and the architectural decisions align with the codebase conventions (TemplateService injection, validation in InstanceSpecFactory, fail-fast). A few things worth addressing before implementation starts: Issues1. Missing template variable for EBS device name (tasks.md §3.1–3.2) The design acknowledges that 2. cset_uuid extraction is underspecified (tasks.md §3.3) The task lists: but doesn't specify how to obtain 3. Idempotency / already-registered devices (tasks.md §3.3)
4. Validation errors should use the event bus, not The tasks specify throwing Suggestions5. Consider injecting The spec locks the cache mode to 6. The Packer install script should install 7. Spec gap: The spec says the system SHALL load the bcache module but doesn't specify what happens if What's Good
Overall this is solid planning work. The issues above are worth addressing in the tasks before implementation begins, especially items 1 (EBS device template variable) and 2 (cset_uuid extraction), which could cause the implementation to diverge from the spec. |
|
Needs to support both write back and write through modes. add a --bcache.mode and default to write through, since it's the safer version that I'll almost certainly use. @claude |
|
Claude finished @rustyrazorblade's task in 2m 26s —— View job Adding
|
Add --bcache.mode flag to the bcache OpenSpec proposal, defaulting to writethrough (the safer option). Update design, tasks, and spec to reflect both modes, including validation for invalid mode values and template variable injection of __BCACHE_MODE__ into setup_instance.sh. Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
|
@claude apply |
|
Claude finished @rustyrazorblade's task in 15m 30s —— View job bcache Support Implementation Complete
What Was Implemented13 files changed, 419 insertions, 31 deletions across: CLI (Init.kt)
Validation (InstanceSpecFactory.kt)
Script template (setup_instance.sh)
Packer provisioning
State (ClusterState.kt)
Tests
Docs
OpenSpec
|
- Add --bcache boolean flag to init command (default: false) - Add --bcache.mode flag with writethrough/writeback options (default: writethrough) - Add bcache validation in DefaultInstanceSpecFactory (requires instance store + EBS) - Add bcache setup block in setup_instance.sh template with __BCACHE_ENABLED__, __BCACHE_MODE__, __EBS_DEVICE__ variables - Add packer provisioning script for bcache-tools installation - Add tests for CLI flags, validation, and bcache configuration - Update docs with bcache usage guide Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
PR Review: bcache SupportWhat This PR DoesAdds Linux bcache caching support so local NVMe instance store can transparently accelerate EBS volumes. Includes two new Strengths
Issues1. Tests don't verify script content (Missing)The Suggest adding: @Test
fun `bcache flags are substituted into generated setup script`() {
val command = Init()
command.bcache = true
command.bcacheMode = "writeback"
command.execute()
val scriptContent = File("setup_instance.sh").readText()
assertThat(scriptContent)
.contains("BCACHE_ENABLED=true")
.contains("BCACHE_MODE=writeback")
.doesNotContain("__BCACHE_") // no unreplaced placeholders
}2. Hardcoded
|
|
@claude Address the failures and the feedback |
|
Claude finished @rustyrazorblade's task in 3m 35s —— View job Addressing PR Review Feedback
Changes Made1. Fixed hardcoded
2. Added script content verification tests (Issue 1)
All tests use the real 3. Added packer test for | |
- Fix hardcoded /dev/xvdf by introducing Constants.EBS.DEFAULT_DEVICE_NAME - Add script content verification tests that check TemplateService substitution produces correct output in setup_instance.sh for bcache on/off scenarios - Add packer test service for install_bcache.sh in docker-compose.yml Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
Project OrganizationArchitecture OverviewThe project follows a layered architecture:
Project ModulesThe Gradle project has multiple modules:
Layer ResponsibilitiesCommands (
See Services (
See Server & REPLTwo commands run as long-lived processes instead of the typical run-and-exit pattern:
Dependency InjectionUse Koin for dependency injection throughout the codebase:
Design PrinciplesFollow SOLID principles:
Testing ApproachPractice reasonable TDD:
No mock-echo tests. Every test must verify real logic. These patterns are banned:
A good test exercises a code path where the system under test makes a decision, transforms data, or could fail in a meaningful way. See Never mock Quality tools workflow: # Find test coverage gaps
./gradlew koverHtmlReport
# Report at build/reports/kover/html/index.html
# Format code before committing
./gradlew ktlintFormat
# Find potential code issues
./gradlew detektDevelopment RulesCode Style
Architecture
Testing
Workflow & Planning
Development SetupJava Version Management (SDKMAN)The devcontainer uses SDKMAN to manage Java versions:
SDKMAN is pre-configured in the devcontainer with both versions installed and Java 21 as the default. Why two versions? The Common commands: # Check current Java version
java -version
# List installed versions
sdk list java
# Temporarily use a different version (current shell only)
sdk use java 11.0.25-tem
# Switch default version permanently
sdk default java 21.0.5-temBuilding Cassandra Analytics DependenciesThe # Build cassandra-analytics (auto-skips if already built)
bin/dev build-analytics
# Force rebuild
bin/dev build-analytics --forceThis clones the cassandra-analytics repo, builds with JDK 11, and publishes artifacts to the local Maven repository ( Pre-commit Hook InstallationInstall the ktlint pre-commit hook to automatically check code style before commits: ./gradlew addKtlintCheckGitPreCommitHookImportant: Pre-commit hooks are stored in The hook automatically runs Configuration CacheThe project uses Gradle configuration cache for faster builds, enabled via
When to clear the cache:
# Clear configuration cache
rm -rf .gradle/configuration-cache
# Or clean everything
./gradlew cleanWhy this matters: If you modify Local ValidationBefore pushing code, verify it passes all checks: # Run all checks (matches CI)
./gradlew check
# Run only ktlint check (verify style compliance)
./gradlew ktlintCheck
# Auto-fix ktlint violations (when possible)
./gradlew ktlintFormatNote: Packer Script TestingTest packer provisioning scripts locally using Docker (no AWS required): # Test base provisioning scripts
./gradlew testPackerBase
# Test Cassandra provisioning scripts
./gradlew testPackerCassandra
# Run all packer tests
./gradlew testPacker
# Test a specific script
./gradlew testPackerScript -Pscript=cassandra/install/install_cassandra_easy_stress.shFor more details, see packer/README.md and packer/TESTING.md. Documentation & SpecificationsUser documentation is in This project relies on OpenSpec to maintain product specifications. Specs live in After running If I refer to Kubernetes configs or k8 configs, I am referring to these: ObservabilityThe cluster runs a full observability stack on the control node. When modifying any part of this stack, keep the related K8s manifests, Kotlin services, and user docs in sync. All observability K8s resources are built programmatically using Fabric8 manifest builders in Collectors (run on cluster nodes): OTel Collector, Fluent Bit (journald), Grafana Alloy (eBPF profiling), Beyla (L7 RED metrics), ebpf_exporter (TCP/block I/O/VFS), YACE (CloudWatch), MAAC agent (Cassandra metrics) Storage backends (control node): VictoriaMetrics (metrics, port 8428), VictoriaLogs (logs, port 9428), Tempo (traces, port 3200), Pyroscope (profiles, port 4040) Grafana (port 3000): Dashboards built via OTel Instrumentation in Kotlin: The CLI commands: All builder paths are relative to Subdirectory DocumentationDetailed patterns live in package-level CLAUDE.md files:
|
User Documentation (
|
|
@claude i don't think the disks are known ahead of time right now. Is there a way we can know this deterministically? Either by providing the devices or by inspection? |
|
Claude finished @rustyrazorblade's task in 1m 19s —— View job
Disk Device Discovery AnalysisYou're right that device paths aren't known ahead of time in a reliable way. Here's the breakdown of both devices involved: EBS backing device (
|
|
What about multiple disks? |
|
@claude what about multiple disks? |
|
@claude Option a, but still need to address the case where there are 2 ebs volumes or 2 local disks. |
|
Claude finished @rustyrazorblade's task in 6m 37s —— View job Implementing multi-disk bcache support with nvme id-ctrl detection
What changedDevice discovery ( Replaced the fragile "first unpartitioned NVMe" heuristic with
Multiple instance store disks All discovered instance store NVMe devices are passed to Multiple EBS volumes The first EBS NVMe device is used as the bcache backing device. If more are found, a log message identifies which one was selected and lists the others. Non-bcache disk selection Uses the same discovery: prefers instance store NVMe, falls back to EBS NVMe, then
The EBS device is now discovered at runtime via
|
…BS disks Replace heuristic "first unpartitioned NVMe" detection with nvme id-ctrl model string inspection to correctly identify device types on Nitro instances: - "Amazon EC2 NVMe Instance Storage" -> instance store (cache device) - "Amazon Elastic Block Store" -> EBS (backing device) Multiple instance store NVMe disks are passed to make-bcache -C together, forming a single bcache cache set (native bcache multi-device support). Multiple EBS volumes: first is used as bcache backing, others are logged. Non-bcache path now uses the same discovery: prefers instance store NVMe, falls back to EBS NVMe, then xvdb/xvdc for non-Nitro instances. The __EBS_DEVICE__ template injection is removed; device is now discovered at runtime. Adds nvme-cli to install_bcache.sh for nvme id-ctrl availability. Co-authored-by: Jon Haddad <rustyrazorblade@users.noreply.github.com>
PR Review: bcache SupportOverall this is a solid, well-structured feature addition. The design is thoughtful (TemplateService injection, fail-fast validation, NVMe device discovery via Blocking: CI
|
Add OpenSpec change for #562: -- bcache flag on init that configures local NVMe as write-back cache in front of EBS via Linux bcache.
Includes proposal, design, tasks, and specs for the bcache capability and updated instance-storage-validation spec.
Closes #562
Generated with Claude Code