feat: Comprehensive concurrent testing for AT-102 #16

devin-ai-integration · 2025-09-29T20:23:44Z

Make sure to read the contributing guidelines before submitting a PR

Summary

This PR implements comprehensive concurrent and multi-threaded testing infrastructure for llama.cpp to detect race conditions and validate thread safety across critical components, specifically targeting KV cache operations and context management for JIRA ticket AT-102.

Changes

New Test Files

tests/test-concurrent-stress.cpp - Sustained concurrent load testing with 3 test suites:
- Rapid context creation/destruction cycles (20 iterations per thread)
- Parallel context operations with batch processing
- Backend resource allocation stress with varying context parameters
tests/test-kv-cache-concurrent.cpp - Dedicated KV cache race condition testing with 4 test suites:
- Concurrent KV cache prepare operations
- Concurrent KV cache update operations with varying context sizes
- Concurrent sequence operations (copy, remove)
- Mixed concurrent operations combining all patterns

Enhanced Existing Tests

tests/test-thread-safety.cpp - Added rapid context recreation stress test with random timing delays to increase race condition exposure probability
tools/server/tests/unit/test_completion.py - Added 3 high-volume concurrent server tests:
- High-volume concurrent requests (8-50 requests across 4-8 slots)
- Parallel decoding scenarios with multiple streams
- Cache consistency validation under concurrent load

Build System Updates

tests/CMakeLists.txt - Added new test targets using established llama_build_and_test patterns, labeled as 'concurrent' for filtering

Key Design Decisions

Threading Model: Each thread creates and manages its own llama_context rather than sharing contexts (proper llama.cpp threading model)
Race Detection: Uses atomic counters and random timing delays (1-10ms) to maximize race condition exposure
Resource Leak Detection: Tracks context creation/destruction counts to detect leaks
CI Compatibility: Limited thread counts and iterations to respect CI resource constraints

Local Testing Results

✅ test-concurrent-stress: All 3 stress tests passed (80 contexts created/destroyed, 0 errors)
✅ test-kv-cache-concurrent: All 4 KV cache tests passed (28 contexts in mixed operations)
✅ test-thread-safety: Enhanced test with new stress patterns passed
✅ Regression tests: 36/36 existing tests still pass

ThreadSanitizer Integration

Existing LLAMA_SANITIZE_THREAD CMake option can be used for automated race detection:

cmake -B build -DLLAMA_SANITIZE_THREAD=ON

Human Review Checklist

🔴 High Priority:

Threading model correctness: Verify each thread creates its own context (no shared contexts across threads)
Resource cleanup logic: Check atomic counter logic for context creation/destruction tracking
Test parameters: Validate thread counts and iterations are appropriate for CI environments

🟡 Medium Priority:

Race condition detection: Review random delay strategy and atomic counter usage
CMake integration: Confirm new test targets follow established patterns
Python test compatibility: Verify server tests handle concurrent scenarios correctly

Notes

Python server tests encountered local environment issues (missing wget module) but should work in CI with proper dependencies
Tests focus on realistic concurrent usage patterns rather than artificial stress scenarios
All tests include comprehensive error reporting and cleanup verification

Link to Devin run: https://app.devin.ai/sessions/d1f6bdf15aa141e3aceec6c8e65e5750
Requested by: @alexpeng-cognition
Related: JIRA ticket AT-102

- Create test-concurrent-stress.cpp for sustained concurrent load testing * Rapid context creation/destruction cycles * Parallel context operations with batch processing * Backend resource allocation stress testing * All tests verify no context leaks or errors - Create test-kv-cache-concurrent.cpp for dedicated KV cache race detection * Concurrent KV cache prepare operations * Concurrent KV cache update operations with varying context sizes * Concurrent sequence operations (copy, remove) * Mixed concurrent operations combining all patterns * Each thread creates its own context (proper threading model) - Enhance test-thread-safety.cpp with race condition detection * Add rapid context recreation stress test * Use random timing delays to increase race condition exposure * Track context creation/destruction with atomic counters * Verify no resource leaks under stress - Extend test_completion.py with high-volume concurrent server tests * test_completion_high_volume_concurrent: 8-50 concurrent requests * test_completion_parallel_decoding: Multiple parallel decode streams * test_completion_cache_consistency_concurrent: Cache validation under load - Update CMakeLists.txt with new test targets * Add test-concurrent-stress with appropriate test parameters * Add test-kv-cache-concurrent with appropriate test parameters * Both use established llama_build_and_test pattern * Tests labeled 'concurrent' for easy filtering Targets critical concurrent areas: - KV cache prepare() and update() operations - Context initialization and management under concurrent access - Server task queue and slot management (Python tests) - Backend resource allocation under high concurrency All tests follow proper llama.cpp threading model where each thread manages its own context rather than sharing contexts across threads. Tests validated locally: - test-concurrent-stress: PASSED (80 contexts created/destroyed, 0 errors) - test-kv-cache-concurrent: PASSED (all 4 test suites, 0 errors) - test-thread-safety: PASSED (including new stress test) - Regression tests: 36/36 existing tests passed ThreadSanitizer integration already exists in CMakeLists.txt via LLAMA_SANITIZE_THREAD option for automated race detection. Related to JIRA ticket AT-102 Co-Authored-By: Alex Peng <[email protected]>

devin-ai-integration · 2025-09-29T20:23:47Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Addresses editorconfig CI check failures by removing trailing whitespace from all modified test files. No functional changes. - tests/test-concurrent-stress.cpp - tests/test-kv-cache-concurrent.cpp - tests/test-thread-safety.cpp - tools/server/tests/unit/test_completion.py Co-Authored-By: Alex Peng <[email protected]>

github-actions bot added testing examples server python labels Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Comprehensive concurrent testing for AT-102 #16

feat: Comprehensive concurrent testing for AT-102 #16

Uh oh!

devin-ai-integration bot commented Sep 29, 2025

Uh oh!

devin-ai-integration bot commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

feat: Comprehensive concurrent testing for AT-102 #16

Are you sure you want to change the base?

feat: Comprehensive concurrent testing for AT-102 #16

Uh oh!

Conversation

devin-ai-integration bot commented Sep 29, 2025

Summary

Changes

New Test Files

Enhanced Existing Tests

Build System Updates

Key Design Decisions

Local Testing Results

ThreadSanitizer Integration

Human Review Checklist

Notes

Uh oh!

devin-ai-integration bot commented Sep 29, 2025

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants