Skip to content

Conversation

devin-ai-integration[bot]
Copy link

Make sure to read the contributing guidelines before submitting a PR

Summary

This PR addresses JIRA ticket AT-101 by expanding test coverage for error handling and edge cases. It creates a systematic error testing framework with three main components:

  1. Memory exhaustion testing (test-memory-exhaustion.cpp) - Tests controlled OOM scenarios including small/medium/large allocations, buffer overflow, and recovery mechanisms
  2. Invalid input validation (test-invalid-inputs.cpp) - Tests malformed tensor shapes, type mismatches, dimension limits, and parameter validation
  3. Backend operations error scenarios - Extends existing test-backend-ops.cpp with error injection capabilities

Key Changes

New Test Files (377 lines total)

  • tests/test-memory-exhaustion.cpp - 6 systematic memory pressure scenarios with controlled failure injection
  • tests/test-invalid-inputs.cpp - 6 input validation scenarios testing tensor creation edge cases

Error Injection Infrastructure

  • ggml/src/ggml-alloc.c - Adds environment variable-based error injection:
    • GGML_ALLOC_FAIL_THRESHOLD - Fail allocations >= specified byte size
    • GGML_ALLOC_FAIL_COUNT - Fail after specified number of allocations
    • Non-intrusive design that only activates when env vars are set

Extended Backend Testing

  • tests/test-backend-ops.cpp - Adds GGML_TEST_ERRORS environment variable to enable error injection test cases
  • tests/CMakeLists.txt - Registers new tests using existing llama_build_and_test pattern

Testing Results

  • All 38 existing tests continue to pass (0 regressions)
  • New tests pass all 12 combined scenarios (6 memory + 6 input validation)
  • Error injection properly integrates with existing tensor operations

Important Review Areas

🔴 Critical: The error injection mechanism modifies production code (ggml-alloc.c). While gated by environment variables, reviewers should evaluate whether this approach is acceptable or if a different testing strategy should be used.

🟡 Thread Safety: The error injection uses a static counter variable that could have race conditions in multi-threaded scenarios.

🟡 Test Isolation: Tests rely on global environment variables - verify that parallel test execution won't cause interference.

🟢 Integration: CMake integration follows existing patterns and all tests are properly registered.

Human Review Checklist

  • Evaluate if error injection in production code (ggml-alloc.c) is acceptable architecture
  • Review thread safety of static alloc_count variable in ggml_alloc_should_fail()
  • Verify test scenarios actually test meaningful error conditions vs superficial checks
  • Confirm environment variable cleanup prevents test interference
  • Check if new test files follow existing code style and patterns

Link to Devin run: https://app.devin.ai/sessions/55de95f979c14d71aa4b6b3125a2ccf1
Requested by: Alex Peng (@alexpeng-cognition)

JIRA: AT-101

- Create test-memory-exhaustion.cpp for controlled OOM scenarios
  * Tests small/medium/large allocation failures
  * Tests many allocations and buffer overflow scenarios
  * Includes recovery testing to verify normal operation
  * All 6 test scenarios passing

- Create test-invalid-inputs.cpp for malformed tensor validation
  * Tests zero dimensions, shape mismatches, type incompatibility
  * Tests dimension limits and parameter validation
  * All 6 test scenarios passing

- Extend test-backend-ops.cpp with error injection test cases
  * Adds GGML_TEST_ERRORS environment variable control
  * Enables testing allocation failures during tensor operations
  * Integrates seamlessly with existing test infrastructure

- Add environment variable-based error injection in ggml-alloc.c
  * GGML_ALLOC_FAIL_THRESHOLD: Fail allocations >= specified size
  * GGML_ALLOC_FAIL_COUNT: Fail after specified number of allocations
  * Non-intrusive design that doesn't affect production code paths
  * Proper error logging for debugging

- Update CMakeLists.txt with new test targets
  * Uses existing llama_build_and_test pattern
  * Tests run as part of main test suite
  * No regressions introduced (38/38 tests passing)

Addresses JIRA ticket AT-101 for systematic error scenario testing.
Tests follow handcrafted error pattern from test-gguf.cpp with clear
scenario naming and expected outcomes.

Co-Authored-By: Alex Peng <[email protected]>
@devin-ai-integration
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- Add portable setenv/unsetenv macros for Windows (_putenv_s)
- Replace all setenv/unsetenv calls with portable versions
- Fix trailing whitespace issues for editorconfig compliance

This fixes Windows and macOS build failures caused by POSIX-only functions.

Co-Authored-By: Alex Peng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants