COG-GTM · devin-ai-integration · Oct 21, 2025 · Oct 21, 2025 · Oct 21, 2025
diff --git a/README.md b/README.md
@@ -542,6 +542,81 @@ To learn more about model quantization, [read this documentation](tools/quantize
 - [Performance troubleshooting](docs/development/token_generation_performance_tips.md)
 - [GGML tips & tricks](https://github.com/ggml-org/llama.cpp/wiki/GGML-Tips-&-Tricks)
 
+#### Testing
+
+##### Memory Leak Testing
+
+The repository includes comprehensive memory leak regression tests to ensure proper memory management across various lifecycle scenarios. These tests go beyond the existing AddressSanitizer (ASan) integration by providing dedicated leak detection test suites.
+
+**Running with AddressSanitizer:**
+
+The primary memory leak detection mechanism uses AddressSanitizer, which is configured as a build option:
+
+```bash
+# Build with AddressSanitizer enabled
+cmake -B build -DLLAMA_SANITIZE_ADDRESS=ON -DCMAKE_BUILD_TYPE=Debug
+cmake --build build
+
+# Run the memory leak regression tests
+cd build
+ctest -R test-memory-leaks --output-on-failure
+
+# Or run directly
+./bin/test-memory-leaks
+```
+
+Other available sanitizers:
+- `LLAMA_SANITIZE_THREAD=ON` - Detects data races (note: runs without OpenMP)
+- `LLAMA_SANITIZE_UNDEFINED=ON` - Detects undefined behavior
+
+**Running with Valgrind:**
+
+Optional Valgrind integration is available for additional leak checking:
+
+```bash
+# Build the tests (Valgrind target is automatically configured if valgrind is installed)
+cmake -B build
+cmake --build build
+
+# Run memory leak tests with Valgrind
+cd build
+make test-valgrind
+```
+
+The Valgrind target runs with comprehensive leak detection flags:
+- `--leak-check=full` - Detailed leak information
+- `--show-leak-kinds=all` - Reports all leak types
+- `--track-origins=yes` - Tracks origin of uninitialized values
+
+**Test Coverage:**
+
+The `test-memory-leaks.cpp` suite includes 10 comprehensive tests covering:
+
+1. **Backend initialization cycles** - Repeated `llama_backend_init()` / `llama_backend_free()` cycles
+2. **Model load/unload cycles** - Repeated model loading and cleanup (10 iterations)
+3. **Context lifecycle** - Context creation and destruction patterns (10 iterations)
+4. **Multiple contexts per model** - Creating multiple contexts from the same model (5 contexts)
+5. **Sampler lifecycle** - Sampler creation, chain operations, and cleanup
+6. **Batch operations** - Batch allocation and deallocation patterns
+7. **KV cache clearing** - Memory clearing operations on contexts
+8. **Threaded contexts** - Concurrent model usage with multiple threads
+9. **Model load cancellation** - Cleanup when canceling model loading mid-process
+10. **Error condition cleanup** - Proper cleanup when operations fail (e.g., invalid model path)
+
+All tests follow proper cleanup order: sampler → context → model → backend.
+
+**Environment Variables:**
+
+- `LLAMACPP_TEST_MODELFILE` - Path to test model file (required for running tests)
+
+**Continuous Integration:**
+
+The GitHub Actions CI automatically runs all tests with all three sanitizers (ADDRESS, THREAD, UNDEFINED) on every pull request to catch memory issues before they reach production.
+
+**Known Issues:**
+
+- `test-opt.cpp` is currently disabled with `LLAMA_SANITIZE_ADDRESS` due to a known memory leak in `ggml_opt_alloc()` called within a loop (see `tests/test-opt.cpp:300`)
+
 #### Seminal papers and background on the models
 
 If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
@@ -201,6 +201,7 @@ llama_build_and_test(test-backend-ops.cpp)
 
 llama_build_and_test(test-model-load-cancel.cpp  LABEL "model")
 llama_build_and_test(test-autorelease.cpp        LABEL "model")
+llama_build_and_test(test-memory-leaks.cpp       LABEL "model")
 
 if (NOT GGML_BACKEND_DL)
     # these tests use the backends directly and cannot be built with dynamic loading
@@ -219,3 +220,23 @@ target_link_libraries(${LLAMA_TEST_NAME} PRIVATE mtmd)
 get_filename_component(TEST_TARGET test-c.c NAME_WE)
 add_executable(${TEST_TARGET} test-c.c)
 target_link_libraries(${TEST_TARGET} PRIVATE llama)
+
+# Optional Valgrind target for memory leak checking
+find_program(VALGRIND_EXECUTABLE valgrind)
+if(VALGRIND_EXECUTABLE)
+    add_custom_target(test-valgrind
+        COMMAND ${VALGRIND_EXECUTABLE}
+            --leak-check=full
+            --show-leak-kinds=all
+            --track-origins=yes
+            --error-exitcode=1
+            ${CMAKE_CURRENT_BINARY_DIR}/test-memory-leaks
+        DEPENDS test-memory-leaks
+        COMMENT "Running memory leak tests with Valgrind"
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+    )
+    message(STATUS "Valgrind found: ${VALGRIND_EXECUTABLE}")
+    message(STATUS "Run 'make test-valgrind' to check for memory leaks with Valgrind")
+else()
+    message(STATUS "Valgrind not found - install it for additional leak checking")
+endif()