Add RocksDB Preset System and WAL Directory Support for HDD Archive Nodes (and benefiting regular nodes) #771

Zorglub4242 · 2025-11-30T21:14:33Z

Summary

This PR implements a comprehensive solution for running Kaspa archive nodes on HDD storage, addressing Issue #681.

The implementation adds two main features:

RocksDB Preset System - Pre-configured database settings optimized for different storage types (SSD vs HDD)
WAL Directory Support - Ability to place Write-Ahead Logs on separate high-speed storage for hybrid setups

These features enable efficient archive nodes on HDDs while maintaining the option for hybrid NVMe+HDD configurations.

Features

1. RocksDB Preset System (`--rocksdb-preset`)

Two configuration presets for different deployment scenarios:

Default Preset (SSD/NVMe):

64MB write buffer
Standard compression
Optimized for fast storage
Default behavior (no flag needed)

Archive Preset (HDD):

256MB write buffer (4x larger for better batching)
Aggressive compression (LZ4 + ZSTD with 64KB dictionaries)
BlobDB enabled for large values (>512 bytes)
256MB SST files (reduces file count: 500K → 16K for 4TB)
Rate limiting (12 MB/s) to prevent I/O spikes
Based on production testing by @Callidon

Usage:

# Default (SSD/NVMe) - no flag needed
kaspad --archival

# Archive preset for HDD
kaspad --archival --rocksdb-preset=archive

2. WAL Directory Support (`--rocksdb-wal-dir`)

Enables hybrid storage configurations by placing Write-Ahead Logs on fast storage (SSD/NVME or memory based like tmpfs) while keeping database files on HDDs.
Enable faster synchronization process on archival nodes.
On regular nodes, using tmpfs (or lmDisk on windows) allow “small” performance improvements but also reduce wear / tear of nvme / SSD storage devices.
Using tmpfs or memory based storage could lead to database corruption on restart ! Use with caution... (A wal recovery process was tested but would require more extensive work / review so, if needed, we could have it implemented on a separate issue)

Features:

Custom WAL directory location
Auto-generated unique subdirectories per database (consensus, meta, utxoindex) : When using fast wal storage (eg tmpfs), this avoid some race conditions experienced during testing.
Works with both presets

Usage:

# Place WAL on NVMe, data on HDD
kaspad --archival \
       --rocksdb-preset=archive \
       --rocksdb-wal-dir=/mnt/nvme/kaspa-wal

# Hybrid setup for maximum performance
kaspad --archival \
       --rocksdb-preset=archive \
       --rocksdb-wal-dir=/mnt/nvme/kaspa-wal \
       --appdir=/mnt/hdd/kaspa-data

Benefits:

Fast write bursts to NVMe WAL
Bulk data on cheaper HDD storage
Optimal I/O distribution
Cost-effective for large archives

Implementation Details

Files Modified

Database Layer:

database/src/db.rs - Export RocksDbPreset
database/src/db/conn_builder.rs - Add preset and wal_dir support
database/src/db/rocksdb_preset.rs - NEW - Preset configurations
database/src/lib.rs - Module exports

Application Layer:

kaspad/src/args.rs - CLI arguments for --rocksdb-preset and --rocksdb-wal-dir
kaspad/src/daemon.rs - Parse and apply configuration
consensus/src/consensus/factory.rs - Pass settings to consensus databases

Testing:

testing/integration/src/consensus_integration_tests.rs - Updated test parameters

Archive Preset Configuration Details

Based on extensive testing and community feedback (Issue #681):

Memory & Write Buffers:

write_buffer_size: 256MB (4x default)
Re-applied after optimize_level_style_compaction() to prevent override

LSM Tree Structure:

target_file_size_base: 256MB (reduces file count dramatically)
target_file_size_multiplier: 1 (consistent size across levels)
max_bytes_for_level_base: 1GB
level_compaction_dynamic_level_bytes: true (minimizes space amplification)

Compaction:

level_zero_file_num_compaction_trigger: 1 (minimize write amplification)
compaction_pri: OldestSmallestSeqFirst
compaction_readahead_size: 4MB (optimized for sequential HDD reads)

Compression Strategy:

Default: LZ4 (fast)
Bottommost level: ZSTD level 22 with 64KB dictionaries
zstd_max_train_bytes: 8MB (125x dictionary size)

Block Cache:

2GB LRU cache for frequently accessed blocks
Partitioned Bloom filters (18 bits per key)
Two-level index search for large databases
256KB block size (better for sequential HDD reads)

BlobDB:

Enabled for values >512 bytes
256MB blob files
ZSTD compression
Garbage collection at 90% age cutoff

Rate Limiting:

12 MB/s for background writes (prevents HDD saturation)

Testing

Unit Tests

✅ Preset parsing (default, archive, invalid)
✅ Preset display formatting
✅ Configuration application to RocksDB options

Integration Tests

✅ Consensus integration tests updated with new parameters
✅ Verified backward compatibility (no preset = default behavior)
✅ Passed cargo fmt, cargo check & cargo clippy

Production Testing

Archive preset based on real-world deployment:

Tested by @Callidon
Successfully running on HDD storage
Proven effective for large archives
Tested on a local hdd to confirm it stable (several day run)

Backward Compatibility

✅ Fully backward compatible:

No flags = default preset (current behavior)
Existing deployments unaffected
WAL directory optional (defaults to database directory)

Performance Impact

Archive Preset Benefits (HDD):

30-50% better compression (ZSTD on bottommost level)
Reduced write amplification (larger buffers, aggressive compaction)
96% fewer files (256MB SST files vs default)
Smoother I/O (rate limiting prevents spikes)
Better caching (2GB block cache)

Hybrid Setup Benefits (NVMe + HDD):

Fast write bursts (WAL on NVMe)
Cost-effective bulk storage (data on HDD)
Minimal latency for write-heavy workloads

Documentation

User-facing documentation has been kept separate from code and will be added to the wiki/docs repository as appropriate.

Migration Notes

Existing Archive Nodes:
Compression settings cannot be changed retroactively. For optimal results with the archive preset:

Fresh deployments: Use --rocksdb-preset=archive from the start
Existing nodes: Continue with current settings, or start fresh if storage savings are critical

Note: Switching presets on an existing database will apply new settings to new data only. For full benefits, a fresh sync is recommended.

Related Issues

Closes [RocksDB] Preset & RAM‑backed Disk Virtualization #681 - HDD-optimized archive node support
Implements community-tested configuration from @Callidon

Checklist

Code follows project style guidelines
Unit tests added/updated
Integration tests updated
Backward compatibility maintained
CLI arguments documented (--help text)
Performance tested in production
No breaking changes

michaelsutton · 2025-12-02T09:16:52Z

Exciting to see this!

This commit introduces a comprehensive solution for running Kaspa archive nodes on HDD storage, addressing performance challenges through two key features: 1. RocksDB Preset System - Default preset: Optimized for SSD/NVMe (existing behavior) - Archive preset: Optimized for HDD with: * 256MB write buffer (reduced write amplification) * BlobDB for large values (efficient UTXO storage) * Aggressive compression (2.5x space savings) * 256MB SST files (reduced file count from 500K to 16K) * Rate limiting (100 MB/s to prevent I/O saturation) 2. WAL Directory Support - Allows placing Write-Ahead Logs on separate fast storage - Recommended: NVMe for WAL + HDD for data - Provides near-SSD performance for writes while using HDD for bulk storage Configuration: - --rocksdb-preset=archive Enable HDD optimizations - --rocksdb-wal-dir=/path Place WAL on fast storage This enables archive nodes to run efficiently on HDD, reducing storage costs from ~$400 (4TB NVMe) to ~$80 (8TB HDD) while maintaining acceptable performance.

Zorglub4242 · 2025-12-03T09:09:17Z

Exciting to see this!

thanks, credits to callidon to for his settings. did a lot of tweaking / testing but his settings where already top.
It may be good also for nvme users btw (reduce impacts / prolong lifespan).
It may be interesting to open another issue to have a WAL corruption recovery process implemented / tested (would allow to safely use tmpfs even on production nodes)

Got a clippy issue on the automated tests so resubmitted the commit.

Zorglub4242 force-pushed the feature/hdd-archive-optimization branch 2 times, most recently from a446af4 to 2e3ee80 Compare December 1, 2025 18:08

Zorglub4242 force-pushed the feature/hdd-archive-optimization branch from 2e3ee80 to 7c7fb2b Compare December 3, 2025 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RocksDB Preset System and WAL Directory Support for HDD Archive Nodes (and benefiting regular nodes) #771

Add RocksDB Preset System and WAL Directory Support for HDD Archive Nodes (and benefiting regular nodes) #771

Zorglub4242 commented Nov 30, 2025

Uh oh!

michaelsutton commented Dec 2, 2025

Uh oh!

Zorglub4242 commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add RocksDB Preset System and WAL Directory Support for HDD Archive Nodes (and benefiting regular nodes) #771

Are you sure you want to change the base?

Add RocksDB Preset System and WAL Directory Support for HDD Archive Nodes (and benefiting regular nodes) #771

Conversation

Zorglub4242 commented Nov 30, 2025

Summary

Features

1. RocksDB Preset System (--rocksdb-preset)

2. WAL Directory Support (--rocksdb-wal-dir)

Implementation Details

Files Modified

Archive Preset Configuration Details

Testing

Unit Tests

Integration Tests

Production Testing

Backward Compatibility

Performance Impact

Documentation

Migration Notes

Related Issues

Checklist

Uh oh!

michaelsutton commented Dec 2, 2025

Uh oh!

Zorglub4242 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. RocksDB Preset System (`--rocksdb-preset`)

2. WAL Directory Support (`--rocksdb-wal-dir`)

Zorglub4242 commented Dec 3, 2025 •

edited

Loading