Add configurable concurrency control to fix lock contention#87
Merged
Add configurable concurrency control to fix lock contention#87
Conversation
Fixes severe performance degradation on high-core-count machines (32+ threads) caused by unbounded parallel chunk processing creating massive mutex contention. Changes: - Add --max-concurrent-block-writes CLI parameter (default: 5) - Limit concurrent chunk writes using buffer_unordered(N) in store_object() - Replace Mutex with RwLock for partition_cache in fjall.rs and fjall_notx.rs - Implement double-checked locking pattern for cache access - Pre-warm partition cache with common partitions (_BLOCKS, _PATHS, etc.) Impact: - Reduces mutex contention from 100+ tasks to 5 on large uploads - Enables concurrent partition cache reads (zero contention after warmup) - Eliminates small file 'hanging' behavior during concurrent uploads - Consistent performance across different hardware configurations Performance gains expected: 5-20x improvement on 32-thread machines with slow storage, predictable behavior regardless of CPU core count.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes severe performance degradation on high-core-count machines (32+ threads) caused by unbounded parallel chunk processing creating massive mutex contention.
Root Cause
store_object()spawns all chunks in parallelSolution
Implemented three-layer defense:
Configurable Concurrency Limit (primary fix)
--max-concurrent-block-writes(default: 5)buffer_unordered(N)Partition Cache Optimization (eliminates residual contention)
Mutex<HashMap>→RwLock<HashMap>Partition Cache Pre-warming (guarantees fast path)
_BLOCKS,_PATHS,_BUCKETS,_MULTIPART_PARTSat startupChanges
src/cas/fs.rs- CasFS struct +buffer_unordered()in store_object()src/main.rs- CLI parameter + call sitessrc/auth/router.rs- UserRouter multi-user supportsrc/metastore/stores/fjall.rs- RwLock + pre-warmingsrc/metastore/stores/fjall_notx.rs- RwLock + pre-warmingImpact
Before:
After:
Expected performance gains: 5-20x improvement on 32-thread machines with slow storage
Test Plan
--max-concurrent-block-writesvalues (3, 5, 8, 12)