Skip to content

Commit 5ad91e8

Browse files
Add comprehensive E2E test suite for llama.cpp (AT-104)
Implement end-to-end testing framework extending existing ServerProcess infrastructure: Framework Extensions: - Add PipelineTestProcess class with pipeline testing capabilities - Implement CLI tool execution wrappers (llama-cli, llama-bench) - Add methods for context management and KV cache validation - Create pytest fixtures for E2E test configurations E2E Test Suites (38 tests total): - test_pipeline_workflows.py: Complete pipeline testing (8 tests) - Model download, loading, and inference workflows - State transition validation - Context management and KV cache behavior - Streaming pipeline and embedding model support - test_tool_integration.py: CLI tool testing (10 tests) - llama-cli execution with various parameters - llama-bench performance testing - Tool parameter validation and error handling - Server/CLI coordination - test_multimodal_workflows.py: Multimodal testing (9 tests) - Vision + text model integration - Image input processing with text completion - Cross-modal context management - Multimodal streaming and error handling - test_concurrent_scenarios.py: Concurrent testing (11 tests) - Multi-user simulation and request queuing - Multi-turn conversation with context preservation - LoRA adapter switching during active sessions - Request slot management under load Documentation: - Comprehensive README with usage examples - Test execution guidelines and configuration - Best practices and troubleshooting Jira: AT-104 Co-Authored-By: Alex Peng <[email protected]>
1 parent 661ae31 commit 5ad91e8

File tree

8 files changed

+2029
-4
lines changed

8 files changed

+2029
-4
lines changed

tools/server/tests/conftest.py

Lines changed: 82 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,92 @@
22
from utils import *
33

44

5-
# ref: https://stackoverflow.com/questions/22627659/run-code-before-and-after-each-test-in-py-test
65
@pytest.fixture(autouse=True)
76
def stop_server_after_each_test():
8-
# do nothing before each test
97
yield
10-
# stop all servers after each test
118
instances = set(
129
server_instances
13-
) # copy the set to prevent 'Set changed size during iteration'
10+
)
1411
for server in instances:
1512
server.stop()
13+
14+
15+
@pytest.fixture
16+
def pipeline_process():
17+
"""
18+
Fixture providing a PipelineTestProcess instance for E2E testing.
19+
Automatically cleaned up after test completion.
20+
"""
21+
process = PipelineTestProcess()
22+
yield process
23+
if process.process is not None:
24+
process.stop()
25+
26+
27+
@pytest.fixture
28+
def e2e_small_model_config():
29+
"""
30+
Fixture providing configuration for a small model suitable for E2E testing.
31+
Uses tinyllama for fast execution in CI environments.
32+
"""
33+
return {
34+
"model_hf_repo": "ggml-org/models",
35+
"model_hf_file": "tinyllamas/stories260K.gguf",
36+
"model_alias": "tinyllama-e2e",
37+
"n_ctx": 512,
38+
"n_batch": 32,
39+
"n_slots": 2,
40+
"n_predict": 32,
41+
"seed": 42,
42+
"temperature": 0.8,
43+
}
44+
45+
46+
@pytest.fixture
47+
def e2e_embedding_model_config():
48+
"""
49+
Fixture providing configuration for embedding model E2E testing.
50+
"""
51+
return {
52+
"model_hf_repo": "ggml-org/models",
53+
"model_hf_file": "bert-bge-small/ggml-model-f16.gguf",
54+
"model_alias": "bert-e2e",
55+
"n_ctx": 512,
56+
"n_batch": 128,
57+
"n_ubatch": 128,
58+
"n_slots": 2,
59+
"seed": 42,
60+
"server_embeddings": True,
61+
}
62+
63+
64+
@pytest.fixture
65+
def e2e_multimodal_model_config():
66+
"""
67+
Fixture providing configuration for multimodal model E2E testing.
68+
"""
69+
return {
70+
"model_hf_repo": "ggml-org/tinygemma3-GGUF",
71+
"model_hf_file": "tinygemma3-Q8_0.gguf",
72+
"mmproj_url": "https://huggingface.co/ggml-org/tinygemma3-GGUF/resolve/main/mmproj-tinygemma3.gguf",
73+
"model_alias": "tinygemma3-e2e",
74+
"n_ctx": 1024,
75+
"n_batch": 32,
76+
"n_slots": 2,
77+
"n_predict": 16,
78+
"seed": 42,
79+
}
80+
81+
82+
@pytest.fixture
83+
def concurrent_test_prompts():
84+
"""
85+
Fixture providing a list of prompts for concurrent testing scenarios.
86+
"""
87+
return [
88+
"Once upon a time",
89+
"In a distant land",
90+
"There was a brave knight",
91+
"The dragon soared",
92+
"Magic filled the air",
93+
]

tools/server/tests/e2e/README.md

Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
# End-to-End Test Suite
2+
3+
This directory contains comprehensive end-to-end (E2E) tests for llama.cpp, extending beyond unit-focused API testing to validate complete user workflows and component integration.
4+
5+
## Overview
6+
7+
The E2E test suite provides comprehensive coverage of:
8+
9+
1. **Pipeline Workflows** - Complete model download, loading, and inference workflows
10+
2. **Tool Integration** - CLI tool testing (llama-cli, llama-bench)
11+
3. **Multimodal Workflows** - Vision + text processing coordination
12+
4. **Concurrent Scenarios** - Multi-user simulation and parallel request handling
13+
14+
## Test Files
15+
16+
### test_pipeline_workflows.py
17+
18+
Tests complete pipeline workflows from model acquisition to inference:
19+
20+
- **Model Download & Loading**: Validates HuggingFace model download and loading
21+
- **State Transitions**: Tracks server state progression (INITIAL → LOADING_MODEL → READY → GENERATING)
22+
- **Context Management**: Tests extended inference sessions with context preservation
23+
- **KV Cache Behavior**: Validates cache utilization during workflows
24+
- **Streaming Pipeline**: Tests streaming inference through complete pipeline
25+
- **Embedding Models**: Validates embedding model pipelines
26+
27+
**Example:**
28+
```bash
29+
./tests.sh e2e/test_pipeline_workflows.py::test_basic_pipeline_workflow
30+
```
31+
32+
### test_tool_integration.py
33+
34+
Tests CLI tool integration and coordination:
35+
36+
- **llama-cli Execution**: Basic and advanced CLI usage patterns
37+
- **llama-bench Testing**: Performance benchmark execution
38+
- **Embedding Generation**: CLI-based embedding workflows
39+
- **Parameter Validation**: Error handling and validation
40+
- **Server/CLI Coordination**: Resource sharing between tools
41+
42+
**Example:**
43+
```bash
44+
./tests.sh e2e/test_tool_integration.py::test_cli_basic_execution
45+
```
46+
47+
### test_multimodal_workflows.py
48+
49+
Tests multimodal (vision + text) processing:
50+
51+
- **Model Loading**: Multimodal model initialization with vision projection
52+
- **Image Processing**: Image input handling with text completion
53+
- **Context Preservation**: Cross-modal context management
54+
- **Sequential Requests**: Mixed text-only and multimodal requests
55+
- **Streaming**: Multimodal streaming responses
56+
- **Error Handling**: Invalid input handling
57+
58+
**Example:**
59+
```bash
60+
./tests.sh e2e/test_multimodal_workflows.py::test_multimodal_chat_with_image
61+
```
62+
63+
### test_concurrent_scenarios.py
64+
65+
Tests concurrent request handling and real-world scenarios:
66+
67+
- **Concurrent Requests**: Multiple simultaneous completion/chat requests
68+
- **Multi-turn Conversations**: Context preservation across conversation turns
69+
- **Slot Management**: Request queuing and slot allocation under load
70+
- **Streaming Concurrency**: Multiple streaming sessions
71+
- **LoRA Switching**: Adapter loading/switching during active sessions
72+
- **Mixed Workloads**: Different request types running concurrently
73+
74+
**Example:**
75+
```bash
76+
./tests.sh e2e/test_concurrent_scenarios.py::test_concurrent_completion_requests
77+
```
78+
79+
## Framework Extensions
80+
81+
### PipelineTestProcess Class
82+
83+
The `PipelineTestProcess` class extends `ServerProcess` with E2E testing capabilities:
84+
85+
```python
86+
from utils import PipelineTestProcess
87+
88+
# Create pipeline test instance
89+
pipeline = PipelineTestProcess()
90+
91+
# Test complete pipeline workflow
92+
results = pipeline.test_full_pipeline({
93+
"model_hf_repo": "ggml-org/models",
94+
"model_hf_file": "tinyllamas/stories260K.gguf",
95+
"n_ctx": 512,
96+
})
97+
98+
# Run CLI commands
99+
result = pipeline.run_cli_command(["-m", model_path, "-p", "Hello", "-n", "16"])
100+
101+
# Run benchmarks
102+
bench_results = pipeline.run_bench_command(model_path, ["-p", "8", "-n", "8"])
103+
```
104+
105+
**Key Methods:**
106+
107+
- `test_full_pipeline(model_config)` - Execute complete pipeline workflow
108+
- `run_cli_command(args, input_text, timeout)` - Execute llama-cli
109+
- `run_bench_command(model_path, args, timeout)` - Execute llama-bench
110+
- `test_context_management(prompts, max_context)` - Test context handling
111+
- `validate_kv_cache_behavior(context_size, tokens)` - Validate cache usage
112+
113+
### Test Fixtures
114+
115+
New pytest fixtures in `conftest.py`:
116+
117+
- **`pipeline_process`** - PipelineTestProcess instance with automatic cleanup
118+
- **`e2e_small_model_config`** - Small model config for fast E2E tests
119+
- **`e2e_embedding_model_config`** - Embedding model configuration
120+
- **`e2e_multimodal_model_config`** - Multimodal model configuration
121+
- **`concurrent_test_prompts`** - Prompts for concurrent testing
122+
123+
## Running E2E Tests
124+
125+
### Run All E2E Tests
126+
127+
```bash
128+
./tests.sh e2e/
129+
```
130+
131+
### Run Specific Test File
132+
133+
```bash
134+
./tests.sh e2e/test_pipeline_workflows.py
135+
```
136+
137+
### Run Single Test
138+
139+
```bash
140+
./tests.sh e2e/test_pipeline_workflows.py::test_basic_pipeline_workflow
141+
```
142+
143+
### Run with Verbose Output
144+
145+
```bash
146+
DEBUG=1 ./tests.sh e2e/ -s -v
147+
```
148+
149+
### Run Slow Tests
150+
151+
Some tests are marked as slow and require the `SLOW_TESTS` environment variable:
152+
153+
```bash
154+
SLOW_TESTS=1 ./tests.sh e2e/
155+
```
156+
157+
## Configuration
158+
159+
### Environment Variables
160+
161+
| Variable | Description | Default |
162+
|----------|-------------|---------|
163+
| `LLAMA_CLI_BIN_PATH` | Path to llama-cli binary | `../../../build/bin/llama-cli` |
164+
| `LLAMA_BENCH_BIN_PATH` | Path to llama-bench binary | `../../../build/bin/llama-bench` |
165+
| `LLAMA_CACHE` | Model cache directory | `tmp` |
166+
| `SLOW_TESTS` | Enable slow tests | `0` |
167+
| `DEBUG` | Enable verbose output | `0` |
168+
169+
### Model Selection
170+
171+
E2E tests use smaller models for CI compatibility:
172+
173+
- **Text Generation**: tinyllama (stories260K.gguf) - Fast, small footprint
174+
- **Embeddings**: bert-bge-small - Efficient embedding generation
175+
- **Multimodal**: tinygemma3 - Compact vision+text model
176+
177+
For local testing with larger models, modify the fixture configurations in `conftest.py`.
178+
179+
## Writing New E2E Tests
180+
181+
### Example Test Structure
182+
183+
```python
184+
def test_my_e2e_workflow(pipeline_process, e2e_small_model_config):
185+
"""
186+
Test description here.
187+
188+
Validates:
189+
- Point 1
190+
- Point 2
191+
"""
192+
# Configure pipeline
193+
for key, value in e2e_small_model_config.items():
194+
if hasattr(pipeline_process, key):
195+
setattr(pipeline_process, key, value)
196+
197+
# Start server
198+
pipeline_process.start()
199+
200+
# Test workflow
201+
res = pipeline_process.make_request("POST", "/completion", data={
202+
"prompt": "Test",
203+
"n_predict": 8,
204+
})
205+
206+
# Assertions
207+
assert res.status_code == 200
208+
assert "content" in res.body
209+
```
210+
211+
### Best Practices
212+
213+
1. **Use Fixtures**: Leverage existing fixtures for model configs and test data
214+
2. **Small Models**: Use small models for fast execution in CI
215+
3. **Resource Cleanup**: Fixtures handle cleanup automatically
216+
4. **Test Isolation**: Each test should be independent
217+
5. **Descriptive Names**: Use clear, descriptive test names
218+
6. **Documentation**: Include docstrings explaining what is validated
219+
7. **Slow Tests**: Mark expensive tests with `@pytest.mark.skipif(not is_slow_test_allowed())`
220+
221+
## CI Integration
222+
223+
E2E tests are designed to run in CI environments with:
224+
225+
- 4 vCPU GitHub runners
226+
- Limited memory footprint
227+
- Fast model downloads from HuggingFace
228+
- Reasonable timeout configurations
229+
230+
Tests automatically skip slow scenarios unless `SLOW_TESTS=1` is set.
231+
232+
## Troubleshooting
233+
234+
### Tests Timeout
235+
236+
- Increase timeout in test: `pipeline_process.start(timeout_seconds=120)`
237+
- Use smaller models in CI
238+
- Check network connectivity for model downloads
239+
240+
### Model Download Issues
241+
242+
- Set `LLAMA_CACHE` to a persistent directory
243+
- Pre-download models before running tests
244+
- Check HuggingFace availability
245+
246+
### CLI Tool Not Found
247+
248+
- Ensure binaries are built: `cmake --build build --target llama-cli llama-bench`
249+
- Set `LLAMA_CLI_BIN_PATH` and `LLAMA_BENCH_BIN_PATH`
250+
- Check binary permissions
251+
252+
### Concurrent Test Failures
253+
254+
- Increase `n_slots` for higher concurrency
255+
- Adjust timing expectations for slower systems
256+
- Enable `server_continuous_batching` for better scheduling
257+
258+
## Contributing
259+
260+
When adding new E2E tests:
261+
262+
1. Place tests in appropriate file based on category
263+
2. Use existing fixtures when possible
264+
3. Add new fixtures to `conftest.py` if needed
265+
4. Update this README with new test descriptions
266+
5. Ensure tests pass in CI environment
267+
6. Document special requirements or configurations
268+
269+
## Related Documentation
270+
271+
- [Main Test README](../README.md) - General testing documentation
272+
- [Server Documentation](../../README.md) - llama-server documentation
273+
- [Contributing Guide](../../../../CONTRIBUTING.md) - Project contribution guidelines

tools/server/tests/e2e/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""
2+
End-to-end test suite for llama.cpp server.
3+
4+
This module provides comprehensive E2E testing covering:
5+
- Complete pipeline workflows (download, conversion, loading, inference)
6+
- Tool integration testing (llama-cli, llama-bench)
7+
- Multimodal workflows (vision + text)
8+
- Concurrent scenario simulation
9+
"""

0 commit comments

Comments
 (0)