Skip to content

Add configurable concurrency control to fix lock contention#87

Merged
delandtj merged 1 commit intomasterfrom
fix-concurrency-contention
Nov 22, 2025
Merged

Add configurable concurrency control to fix lock contention#87
delandtj merged 1 commit intomasterfrom
fix-concurrency-contention

Conversation

@delandtj
Copy link
Copy Markdown

Summary

Fixes severe performance degradation on high-core-count machines (32+ threads) caused by unbounded parallel chunk processing creating massive mutex contention.

Root Cause

  • Unbounded concurrency in store_object() spawns all chunks in parallel
  • On 32-thread machines: 100+ tasks compete for single partition_cache Mutex
  • Small files "hang" waiting for large file chunks to complete
  • Performance degrades with more cores (paradoxically worse on faster hardware)

Solution

Implemented three-layer defense:

  1. Configurable Concurrency Limit (primary fix)

    • New CLI arg: --max-concurrent-block-writes (default: 5)
    • Limits concurrent chunk processing using buffer_unordered(N)
    • Prevents 100+ tasks serializing on mutex
  2. Partition Cache Optimization (eliminates residual contention)

    • Changed Mutex<HashMap>RwLock<HashMap>
    • Concurrent reads after warmup (zero contention on cache hits)
    • Double-checked locking pattern for thread-safe lazy init
  3. Partition Cache Pre-warming (guarantees fast path)

    • Pre-loads _BLOCKS, _PATHS, _BUCKETS, _MULTIPART_PARTS at startup
    • Ensures 100% read-only access for system partitions

Changes

  • src/cas/fs.rs - CasFS struct + buffer_unordered() in store_object()
  • src/main.rs - CLI parameter + call sites
  • src/auth/router.rs - UserRouter multi-user support
  • src/metastore/stores/fjall.rs - RwLock + pre-warming
  • src/metastore/stores/fjall_notx.rs - RwLock + pre-warming
  • Tests/benchmarks updated

Impact

Before:

  • 32-thread machine: massive lock contention, small files hang
  • Mutex hold queue: 20-25+ tasks
  • Performance degrades with more cores

After:

  • Maximum 5 tasks compete for locks (instead of 100+)
  • Partition cache hits use shared read locks (no contention)
  • Small files process predictably
  • Consistent performance across all hardware

Expected performance gains: 5-20x improvement on 32-thread machines with slow storage

Test Plan

  • Code compiles (lib, bins, tests)
  • Test on 32-thread machine with concurrent uploads
  • Verify small files no longer hang during large uploads
  • Benchmark with different --max-concurrent-block-writes values (3, 5, 8, 12)
  • Confirm no regression on low-core machines

Fixes severe performance degradation on high-core-count machines (32+ threads)
caused by unbounded parallel chunk processing creating massive mutex contention.

Changes:
- Add --max-concurrent-block-writes CLI parameter (default: 5)
- Limit concurrent chunk writes using buffer_unordered(N) in store_object()
- Replace Mutex with RwLock for partition_cache in fjall.rs and fjall_notx.rs
- Implement double-checked locking pattern for cache access
- Pre-warm partition cache with common partitions (_BLOCKS, _PATHS, etc.)

Impact:
- Reduces mutex contention from 100+ tasks to 5 on large uploads
- Enables concurrent partition cache reads (zero contention after warmup)
- Eliminates small file 'hanging' behavior during concurrent uploads
- Consistent performance across different hardware configurations

Performance gains expected: 5-20x improvement on 32-thread machines with
slow storage, predictable behavior regardless of CPU core count.
@delandtj delandtj merged commit 642a601 into master Nov 22, 2025
1 check failed
@delandtj delandtj deleted the fix-concurrency-contention branch November 22, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant