Skip to content

Conversation

devin-ai-integration[bot]
Copy link

Make sure to read the contributing guidelines before submitting a PR

Summary

This PR implements comprehensive concurrent and multi-threaded testing infrastructure for llama.cpp to detect race conditions and validate thread safety across critical components, specifically targeting KV cache operations and context management for JIRA ticket AT-102.

Changes

New Test Files

  • tests/test-concurrent-stress.cpp - Sustained concurrent load testing with 3 test suites:

    • Rapid context creation/destruction cycles (20 iterations per thread)
    • Parallel context operations with batch processing
    • Backend resource allocation stress with varying context parameters
  • tests/test-kv-cache-concurrent.cpp - Dedicated KV cache race condition testing with 4 test suites:

    • Concurrent KV cache prepare operations
    • Concurrent KV cache update operations with varying context sizes
    • Concurrent sequence operations (copy, remove)
    • Mixed concurrent operations combining all patterns

Enhanced Existing Tests

  • tests/test-thread-safety.cpp - Added rapid context recreation stress test with random timing delays to increase race condition exposure probability

  • tools/server/tests/unit/test_completion.py - Added 3 high-volume concurrent server tests:

    • High-volume concurrent requests (8-50 requests across 4-8 slots)
    • Parallel decoding scenarios with multiple streams
    • Cache consistency validation under concurrent load

Build System Updates

  • tests/CMakeLists.txt - Added new test targets using established llama_build_and_test patterns, labeled as 'concurrent' for filtering

Key Design Decisions

  • Threading Model: Each thread creates and manages its own llama_context rather than sharing contexts (proper llama.cpp threading model)
  • Race Detection: Uses atomic counters and random timing delays (1-10ms) to maximize race condition exposure
  • Resource Leak Detection: Tracks context creation/destruction counts to detect leaks
  • CI Compatibility: Limited thread counts and iterations to respect CI resource constraints

Local Testing Results

test-concurrent-stress: All 3 stress tests passed (80 contexts created/destroyed, 0 errors)
test-kv-cache-concurrent: All 4 KV cache tests passed (28 contexts in mixed operations)
test-thread-safety: Enhanced test with new stress patterns passed
✅ Regression tests: 36/36 existing tests still pass

ThreadSanitizer Integration

Existing LLAMA_SANITIZE_THREAD CMake option can be used for automated race detection:

cmake -B build -DLLAMA_SANITIZE_THREAD=ON

Human Review Checklist

🔴 High Priority:

  • Threading model correctness: Verify each thread creates its own context (no shared contexts across threads)
  • Resource cleanup logic: Check atomic counter logic for context creation/destruction tracking
  • Test parameters: Validate thread counts and iterations are appropriate for CI environments

🟡 Medium Priority:

  • Race condition detection: Review random delay strategy and atomic counter usage
  • CMake integration: Confirm new test targets follow established patterns
  • Python test compatibility: Verify server tests handle concurrent scenarios correctly

Notes

  • Python server tests encountered local environment issues (missing wget module) but should work in CI with proper dependencies
  • Tests focus on realistic concurrent usage patterns rather than artificial stress scenarios
  • All tests include comprehensive error reporting and cleanup verification

Link to Devin run: https://app.devin.ai/sessions/d1f6bdf15aa141e3aceec6c8e65e5750
Requested by: @alexpeng-cognition
Related: JIRA ticket AT-102

- Create test-concurrent-stress.cpp for sustained concurrent load testing
  * Rapid context creation/destruction cycles
  * Parallel context operations with batch processing
  * Backend resource allocation stress testing
  * All tests verify no context leaks or errors

- Create test-kv-cache-concurrent.cpp for dedicated KV cache race detection
  * Concurrent KV cache prepare operations
  * Concurrent KV cache update operations with varying context sizes
  * Concurrent sequence operations (copy, remove)
  * Mixed concurrent operations combining all patterns
  * Each thread creates its own context (proper threading model)

- Enhance test-thread-safety.cpp with race condition detection
  * Add rapid context recreation stress test
  * Use random timing delays to increase race condition exposure
  * Track context creation/destruction with atomic counters
  * Verify no resource leaks under stress

- Extend test_completion.py with high-volume concurrent server tests
  * test_completion_high_volume_concurrent: 8-50 concurrent requests
  * test_completion_parallel_decoding: Multiple parallel decode streams
  * test_completion_cache_consistency_concurrent: Cache validation under load

- Update CMakeLists.txt with new test targets
  * Add test-concurrent-stress with appropriate test parameters
  * Add test-kv-cache-concurrent with appropriate test parameters
  * Both use established llama_build_and_test pattern
  * Tests labeled 'concurrent' for easy filtering

Targets critical concurrent areas:
- KV cache prepare() and update() operations
- Context initialization and management under concurrent access
- Server task queue and slot management (Python tests)
- Backend resource allocation under high concurrency

All tests follow proper llama.cpp threading model where each thread
manages its own context rather than sharing contexts across threads.

Tests validated locally:
- test-concurrent-stress: PASSED (80 contexts created/destroyed, 0 errors)
- test-kv-cache-concurrent: PASSED (all 4 test suites, 0 errors)
- test-thread-safety: PASSED (including new stress test)
- Regression tests: 36/36 existing tests passed

ThreadSanitizer integration already exists in CMakeLists.txt via
LLAMA_SANITIZE_THREAD option for automated race detection.

Related to JIRA ticket AT-102

Co-Authored-By: Alex Peng <[email protected]>
@devin-ai-integration
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Addresses editorconfig CI check failures by removing trailing whitespace
from all modified test files. No functional changes.

- tests/test-concurrent-stress.cpp
- tests/test-kv-cache-concurrent.cpp
- tests/test-thread-safety.cpp
- tools/server/tests/unit/test_completion.py

Co-Authored-By: Alex Peng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants