Skip to content

Commit 3c8ce02

Browse files
committed
Add comprehensive test suite and CI workflow
Introduces a new tests/ directory with unit and integration tests, a requirements file, and a test runner script. Adds a GitHub Actions workflow for automated testing on multiple Python versions and pull requests. Updates README with detailed testing instructions. Refactors optillm.py and eval_optillmbench.py to improve n parameter handling and test-time compute evaluation logic.
1 parent 1daa2a0 commit 3c8ce02

17 files changed

+1117
-190
lines changed

.github/workflows/test.yml

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
name: Run Tests
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ['3.10', '3.11', '3.12']
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Cache pip packages
25+
uses: actions/cache@v3
26+
with:
27+
path: ~/.cache/pip
28+
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
29+
restore-keys: |
30+
${{ runner.os }}-pip-
31+
32+
- name: Install dependencies
33+
run: |
34+
python -m pip install --upgrade pip
35+
pip install -r requirements.txt
36+
pip install -r tests/requirements.txt
37+
38+
- name: Run unit tests
39+
run: |
40+
# Run quick CI tests
41+
python tests/test_ci_quick.py
42+
43+
# Run plugin tests with pytest if available
44+
python -m pytest tests/test_plugins.py -v --tb=short || python tests/test_plugins.py
45+
46+
# Run approach tests
47+
python tests/test_approaches.py
48+
49+
integration-test:
50+
runs-on: ubuntu-latest
51+
needs: test
52+
if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
53+
# Only run integration tests on PRs from the same repository (not forks)
54+
# This ensures secrets are available
55+
56+
steps:
57+
- uses: actions/checkout@v4
58+
59+
- name: Set up Python
60+
uses: actions/setup-python@v4
61+
with:
62+
python-version: '3.11'
63+
64+
- name: Install dependencies
65+
run: |
66+
python -m pip install --upgrade pip
67+
pip install -r requirements.txt
68+
69+
- name: Run integration test with OpenAI
70+
if: env.OPENAI_API_KEY != ''
71+
run: |
72+
# Start OptILLM server
73+
python optillm.py &
74+
SERVER_PID=$!
75+
76+
# Wait for server
77+
sleep 5
78+
79+
# Run simple integration test
80+
python tests/test.py --approaches none --single-test "Simple Math Problem" --base-url http://localhost:8000/v1 --model gpt-4o-mini || true
81+
82+
# Stop server
83+
kill $SERVER_PID || true
84+
env:
85+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
86+
continue-on-error: true

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,4 @@ cython_debug/
170170

171171
scripts/results/
172172
results/
173+
test_results.json

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,46 @@ called patchflows. We saw huge performance gains across all the supported patchf
565565

566566
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
567567

568+
## Testing
569+
570+
OptILLM includes a comprehensive test suite to ensure reliability and compatibility.
571+
572+
### Running Tests
573+
574+
The main test suite can be run from the project root:
575+
```bash
576+
# Test all approaches with default test cases
577+
python tests/test.py
578+
579+
# Test specific approaches
580+
python tests/test.py --approaches moa bon mcts
581+
582+
# Run a single test
583+
python tests/test.py --single-test "Simple Math Problem"
584+
```
585+
586+
### Unit and Integration Tests
587+
588+
Additional tests are available in the `tests/` directory:
589+
```bash
590+
# Run all tests (requires pytest)
591+
./tests/run_tests.sh
592+
593+
# Run specific test modules
594+
pytest tests/test_plugins.py -v
595+
pytest tests/test_api_compatibility.py -v
596+
```
597+
598+
### CI/CD
599+
600+
All tests are automatically run on pull requests via GitHub Actions. The workflow tests:
601+
- Multiple Python versions (3.10, 3.11, 3.12)
602+
- Unit tests for plugins and core functionality
603+
- API compatibility tests
604+
- Integration tests with various approaches
605+
606+
See `tests/README.md` for more details on the test structure and how to write new tests.
607+
568608
## References
569609
- [Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques](https://arxiv.org/abs/2506.08060)
570610
- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)

optillm.py

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -302,9 +302,9 @@ def execute_single_approach(approach, system_prompt, initial_query, client, mode
302302
if hasattr(request, 'json'):
303303
data = request.get_json()
304304
messages = data.get('messages', [])
305-
# Copy all parameters except 'stream', 'model' , 'n' and 'messages'
305+
# Copy all parameters except 'stream', 'model' and 'messages'
306306
kwargs = {k: v for k, v in data.items()
307-
if k not in ['model', 'messages', 'stream', 'n', 'optillm_approach']}
307+
if k not in ['model', 'messages', 'stream', 'optillm_approach']}
308308
response = none_approach(original_messages=messages, client=client, model=model, **kwargs)
309309
# For none approach, we return the response and a token count of 0
310310
# since the full token count is already in the response
@@ -641,17 +641,8 @@ def proxy():
641641
contains_none = any(approach == 'none' for approach in approaches)
642642

643643
if operation == 'SINGLE' and approaches[0] == 'none':
644-
# For none approach with n>1, make n separate calls
645-
if n > 1:
646-
responses = []
647-
completion_tokens = 0
648-
for _ in range(n):
649-
result, tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
650-
responses.append(result)
651-
completion_tokens += tokens
652-
result = responses
653-
else:
654-
result, completion_tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
644+
# Pass through the request including the n parameter
645+
result, completion_tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
655646

656647
logger.debug(f'Direct proxy response: {result}')
657648

0 commit comments

Comments
 (0)