19x faster than kernel zram on AVX-512 systems
- 19x faster decompression than kernel zram via AVX-512 ZSTD
- Adaptive compression with entropy-based algorithm selection (LZ4/ZSTD)
- Userspace block device via Linux ublk for zero-copy I/O
- SIMD-accelerated compression on x86_64 (AVX2/AVX-512) and AArch64 (NEON)
Kernel zram uses scalar LZ4 limited to 1.3 GB/s. trueno-zram uses AVX-512 ZSTD achieving 25 GB/s decompression. Same workload, 19x faster.
┌─────────────────────────────────────────────────────────────┐
│ Decompression Throughput (same-fill pages) │
├─────────────────────────────────────────────────────────────┤
│ trueno-zram ZSTD ████████████████████████████ 25.3 GB/s │
│ Kernel zram LZ4 █░ 1.3 GB/s │
└─────────────────────────────────────────────────────────────┘
# Install
cargo install trueno-ublk
# Create 8GB compressed swap (requires root)
sudo trueno-ublk create --size 8G --algorithm zstd1 --high-perf
# Enable as swap
sudo mkswap /dev/ublkb0 && sudo swapon -p 150 /dev/ublkb0
# Verify
swapon --show| Problem | Without trueno-zram | With trueno-zram |
|---|---|---|
| ML Training Memory Spikes | OOM killer terminates training | 8GB+ compressed swap absorbs spikes at 25 GB/s |
| Parallel Compilation | cargo build with 48 threads → OOM | Overflow handled, builds complete |
| Unused AVX-512 | Kernel ignores your CPU's SIMD | Full AVX-512 utilization → 19x speedup |
| GPU Memory Pressure | Slow disk swap stalls CUDA | Fast RAM swap keeps GPU fed |
| Multi-Project Development | System freezes, requires reboot | Smooth operation under memory pressure |
| Problem | Without | With trueno-zram |
|---|---|---|
| Cloud VMs (4-8GB RAM) | Pay 2x for larger instance | Effective 2x RAM via compression |
| Laptop SSD Lifespan | 10-100GB swap writes/day | 90% stays in RAM, SSD lasts 5x longer |
| Raspberry Pi / Edge | SD card swap at 10 MB/s | Compressed RAM at GB/s speeds |
| Database Latency | Query timeouts when swapping | <1ms swap latency, queries complete |
| CI/CD Build Servers | OOM kills parallel builds | Builds succeed without RAM upgrades |
Benchmarked on AMD Threadripper 7960X with AVX-512:
| Metric | trueno-zram | Kernel zram | Speedup |
|---|---|---|---|
| ZSTD Decompress | 25.3 GB/s | 1.3 GB/s | 19x |
| ZSTD Compress | 15.5 GB/s | 1.3 GB/s | 12x |
| Same-fill Read | 7.9 GB/s | 171 GB/s* | - |
| Random 4K IOPS | 666K | ~1.5M | 0.4x |
*Kernel same-fill uses direct memset, not compression.
# Algorithm comparison benchmark
cargo run --release --example zstd_vs_lz4
# Tiered storage architecture demo
cargo run --release --example tiered_storage
# Visualization integration demo
cargo run --release --example visualization_demoW0-ZEROS (same-fill pages)
Algorithm Compress Decompress Ratio
--------------------------------------------------------
LZ4 5.78 GiB/s 0.78 GiB/s 157.5x
ZSTD-1 3.29 GiB/s 25.28 GiB/s 372.4x
ZSTD-1 speedup: 32x faster decompression
use trueno_zram_core::{CompressorBuilder, Algorithm, PageCompressor, PAGE_SIZE};
// Create compressor with ZSTD level 1 (fastest)
let compressor = CompressorBuilder::new()
.algorithm(Algorithm::Zstd1)
.build()?;
// Compress a 4KB page
let page = [0u8; PAGE_SIZE];
let compressed = compressor.compress(&page)?;
let decompressed = compressor.decompress(&compressed)?;
assert_eq!(page, decompressed);trueno-ublk create # Create compressed RAM device
trueno-ublk list # List devices
trueno-ublk stat # Show statistics
trueno-ublk top # Interactive TUI dashboard
trueno-ublk benchmark # Run benchmarks with JSON/HTML output
trueno-ublk entropy # Analyze file entropy# High-perf: polling mode, larger batches
sudo trueno-ublk create --size 8G --algorithm zstd1 --high-perf
# Max-perf: highest throughput (high CPU usage)
sudo trueno-ublk create --size 8G --algorithm zstd1 --max-perf --queues 4
# Tiered: entropy-based routing to kernel zram
sudo trueno-ublk create --size 8G --backend tiered --entropy-routing┌─────────────────────────────────────────────┐
│ trueno-ublk CLI │
│ (create, list, stat, top, benchmark) │
├─────────────────────────────────────────────┤
│ Userspace Block Device │
│ (io_uring, ublk protocol, queue mgmt) │
├──────────┬──────────────────────────────────┤
│ Adaptive │ trueno-zram-core │
│ (entropy │ (SIMD compression engine) │
│ routing)│ │
├──────────┴──────────┬───────────────────────┤
│ ZSTD (AVX-512) │ LZ4 (SIMD) │
│ 25 GB/s decompress │ 5.8 GB/s compress │
└─────────────────────┴───────────────────────┘
- Core: SIMD-accelerated page compression/decompression (4KB pages)
- Adaptive: Entropy-based algorithm selection routes pages to optimal compressor
- Ublk: Linux userspace block device using io_uring for zero-copy I/O
- Tiered: Entropy routing to kernel zram for incompressible data
cargo test -p trueno-zram-core # Core compression tests
cargo test # All workspace tests
cargo bench -p trueno-zram-core # Compression benchmarks
make quality-gate # Full quality gate: lint + test + coverageProperty-based tests verify compression roundtrip invariants across page patterns (zeros, random, mixed).
| Crate | Description |
|---|---|
trueno-zram-core |
SIMD compression library (LZ4, ZSTD, AVX-512) |
trueno-zram-adaptive |
Entropy-based algorithm selection |
trueno-ublk |
Userspace block device daemon |
- Linux Kernel >= 6.0 (ublk support)
- x86_64 with AVX2/AVX-512 or AArch64 with NEON
- Rust >= 1.82.0
# Compression library benchmark
cargo bench -p trueno-zram-core
# Device I/O benchmark (requires root)
sudo fio --name=test --filename=/dev/ublkb0 --rw=randread \
--bs=4k --numjobs=4 --iodepth=32 --runtime=10
# Generate HTML benchmark report
trueno-ublk benchmark --size 4G --format html -o report.html- Book - Comprehensive guide
- API Docs - Library documentation
- Visualization Guide - TUI and metrics
Contributions are welcome! Please see the CONTRIBUTING.md guide for details.
Areas of interest:
- Performance optimizations (SIMD, compression algorithms)
- New compression algorithm support
- Documentation improvements
- Bug fixes
Minimum Supported Rust Version: 1.82
MIT OR Apache-2.0