Add systematic error testing framework (AT-101) #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
Summary
This PR addresses JIRA ticket AT-101 by expanding test coverage for error handling and edge cases. It creates a systematic error testing framework with three main components:
test-memory-exhaustion.cpp
) - Tests controlled OOM scenarios including small/medium/large allocations, buffer overflow, and recovery mechanismstest-invalid-inputs.cpp
) - Tests malformed tensor shapes, type mismatches, dimension limits, and parameter validationtest-backend-ops.cpp
with error injection capabilitiesKey Changes
New Test Files (377 lines total)
tests/test-memory-exhaustion.cpp
- 6 systematic memory pressure scenarios with controlled failure injectiontests/test-invalid-inputs.cpp
- 6 input validation scenarios testing tensor creation edge casesError Injection Infrastructure
ggml/src/ggml-alloc.c
- Adds environment variable-based error injection:GGML_ALLOC_FAIL_THRESHOLD
- Fail allocations >= specified byte sizeGGML_ALLOC_FAIL_COUNT
- Fail after specified number of allocationsExtended Backend Testing
tests/test-backend-ops.cpp
- AddsGGML_TEST_ERRORS
environment variable to enable error injection test casestests/CMakeLists.txt
- Registers new tests using existingllama_build_and_test
patternTesting Results
Important Review Areas
🔴 Critical: The error injection mechanism modifies production code (
ggml-alloc.c
). While gated by environment variables, reviewers should evaluate whether this approach is acceptable or if a different testing strategy should be used.🟡 Thread Safety: The error injection uses a static counter variable that could have race conditions in multi-threaded scenarios.
🟡 Test Isolation: Tests rely on global environment variables - verify that parallel test execution won't cause interference.
🟢 Integration: CMake integration follows existing patterns and all tests are properly registered.
Human Review Checklist
ggml-alloc.c
) is acceptable architecturealloc_count
variable inggml_alloc_should_fail()
Link to Devin run: https://app.devin.ai/sessions/55de95f979c14d71aa4b6b3125a2ccf1
Requested by: Alex Peng (@alexpeng-cognition)
JIRA: AT-101