feat: Comprehensive concurrent testing for AT-102 #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
Summary
This PR implements comprehensive concurrent and multi-threaded testing infrastructure for llama.cpp to detect race conditions and validate thread safety across critical components, specifically targeting KV cache operations and context management for JIRA ticket AT-102.
Changes
New Test Files
tests/test-concurrent-stress.cpp
- Sustained concurrent load testing with 3 test suites:tests/test-kv-cache-concurrent.cpp
- Dedicated KV cache race condition testing with 4 test suites:Enhanced Existing Tests
tests/test-thread-safety.cpp
- Added rapid context recreation stress test with random timing delays to increase race condition exposure probabilitytools/server/tests/unit/test_completion.py
- Added 3 high-volume concurrent server tests:Build System Updates
tests/CMakeLists.txt
- Added new test targets using establishedllama_build_and_test
patterns, labeled as 'concurrent' for filteringKey Design Decisions
llama_context
rather than sharing contexts (proper llama.cpp threading model)Local Testing Results
✅
test-concurrent-stress
: All 3 stress tests passed (80 contexts created/destroyed, 0 errors)✅
test-kv-cache-concurrent
: All 4 KV cache tests passed (28 contexts in mixed operations)✅
test-thread-safety
: Enhanced test with new stress patterns passed✅ Regression tests: 36/36 existing tests still pass
ThreadSanitizer Integration
Existing
LLAMA_SANITIZE_THREAD
CMake option can be used for automated race detection:Human Review Checklist
🔴 High Priority:
🟡 Medium Priority:
Notes
wget
module) but should work in CI with proper dependenciesLink to Devin run: https://app.devin.ai/sessions/d1f6bdf15aa141e3aceec6c8e65e5750
Requested by: @alexpeng-cognition
Related: JIRA ticket AT-102