Skip to content

Commit ffb87d9

Browse files
viraatcclaude
andcommitted
fix: CLI crash on --load-pattern + --target-qps, add fuzz tests and CI
Bug: LoadPattern.type had alias= instead of name= on cyclopts.Parameter, and class was missing @cyclopts.Parameter(name="*"). This caused any CLI invocation with --load-pattern to crash with IndexError. Tests: - Hypothesis fuzz tests auto-discover all CLI flags from cyclopts assemble_argument_collection() and test 4000 random combinations (offline + online/poisson + online/concurrency) - Added test_concurrency_benchmark with streaming on/off - hypothesis==6.151.10 added to test deps, schema_fuzz pytest marker CI & tooling: - schema-updated CI job: fuzz tests + template validation on schema changes - regenerate_templates.py: auto-generates YAML templates from schema defaults - Pre-commit checks templates are up to date (--check mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 096c868 commit ffb87d9

File tree

11 files changed

+669
-101
lines changed

11 files changed

+669
-101
lines changed

.github/workflows/test.yml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,42 @@ jobs:
5353
python -m pip install pip==26.0.1
5454
pip install -e ".[dev,test,performance]"
5555
pip-audit
56+
57+
schema-updated:
58+
runs-on: ubuntu-latest
59+
if: github.event_name == 'pull_request'
60+
steps:
61+
- uses: actions/checkout@v4
62+
with:
63+
fetch-depth: 0
64+
65+
- name: Check for schema changes
66+
id: schema
67+
run: |
68+
CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- \
69+
'src/inference_endpoint/config/schema.py' \
70+
'src/inference_endpoint/endpoint_client/config.py' \
71+
'src/inference_endpoint/commands/benchmark/cli.py')
72+
echo "changed=$([[ -n "$CHANGED" ]] && echo true || echo false)" >> "$GITHUB_OUTPUT"
73+
74+
- name: Set up Python 3.12
75+
if: steps.schema.outputs.changed == 'true'
76+
uses: actions/setup-python@v4
77+
with:
78+
python-version: "3.12"
79+
80+
- name: Install dependencies
81+
if: steps.schema.outputs.changed == 'true'
82+
run: |
83+
python -m pip install --upgrade pip
84+
pip install -e .[test]
85+
86+
- name: Run schema fuzz tests
87+
if: steps.schema.outputs.changed == 'true'
88+
run: |
89+
pytest -xv -m schema_fuzz
90+
91+
- name: Check YAML templates are up to date
92+
if: steps.schema.outputs.changed == 'true'
93+
run: |
94+
python scripts/regenerate_templates.py --check

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ repos:
3535
hooks:
3636
- id: prettier
3737
types_or: [yaml, json, markdown]
38-
exclude: ^(src/inference_endpoint/openai/openai_types_gen.py|src/inference_endpoint/openai/openapi.yaml)$
38+
exclude: ^(src/inference_endpoint/openai/openai_types_gen.py|src/inference_endpoint/openai/openapi.yaml|src/inference_endpoint/config/templates/)
3939

4040
- repo: local
4141
hooks:
@@ -48,12 +48,12 @@ repos:
4848
args: ["--tb=short", "--strict-markers"]
4949
stages: [manual]
5050

51-
- id: validate-templates
52-
name: Validate YAML templates against schema
53-
entry: python -c "from pathlib import Path; from inference_endpoint.config.schema import BenchmarkConfig; [BenchmarkConfig.from_yaml_file(f) for f in sorted(Path('src/inference_endpoint/config/templates').glob('*.yaml'))]"
51+
- id: check-templates
52+
name: Check YAML templates match schema defaults
53+
entry: python scripts/regenerate_templates.py --check
5454
language: system
5555
pass_filenames: false
56-
files: ^src/inference_endpoint/config/(schema\.py|templates/)
56+
files: ^(src/inference_endpoint/config/schema\.py|scripts/regenerate_templates\.py)$
5757

5858
- id: add-license-header
5959
name: Add license headers

docs/DEVELOPMENT.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,18 @@ pytest -s -v
276276
python -m pdb -m pytest test_file.py
277277
```
278278

279+
## 📄 YAML Config Templates
280+
281+
Config templates in `src/inference_endpoint/config/templates/` are auto-generated from schema defaults. When you change `config/schema.py`, regenerate them:
282+
283+
```bash
284+
python scripts/regenerate_templates.py
285+
```
286+
287+
Pre-commit and CI will fail if committed templates are out of sync with the schema (`--check` mode).
288+
289+
The script applies overrides (model name, endpoint URL, dataset path) defined in `scripts/regenerate_templates.py` on top of `BenchmarkConfig.create_default_config()` defaults. To change a template override, edit `_COMMON` or `_TEMPLATES` in the script and re-run.
290+
279291
## 📦 Package Management
280292

281293
### Adding Dependencies

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ test = [
9797
"aiohttp==3.13.4",
9898
# Plotting for benchmark sweep mode
9999
"matplotlib==3.10.8",
100+
# Property-based testing (CLI fuzz)
101+
"hypothesis==6.151.10",
100102
]
101103
performance = [
102104
"pytest-benchmark==5.2.3",
@@ -184,6 +186,7 @@ markers = [
184186
"integration: marks tests as integration tests",
185187
"unit: marks tests as unit tests",
186188
"run_explicitly: mark test to only run explicitly",
189+
"schema_fuzz: hypothesis CLI fuzz tests (run in CI on schema changes)",
187190
]
188191
filterwarnings = [
189192
"ignore:Session timeout reached:RuntimeWarning",

scripts/regenerate_templates.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
#!/usr/bin/env python3
2+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
"""Regenerate YAML config templates from schema defaults + overrides.
18+
19+
Used by pre-commit to keep templates in sync when schema.py changes.
20+
Overrides below are to make templates more readable, rest default.
21+
"""
22+
23+
from pathlib import Path
24+
25+
import yaml
26+
from inference_endpoint.config.schema import (
27+
BenchmarkConfig,
28+
Dataset,
29+
EndpointConfig,
30+
LoadPattern,
31+
LoadPatternType,
32+
ModelParams,
33+
OnlineSettings,
34+
TestType,
35+
)
36+
from inference_endpoint.exceptions import CLIError
37+
38+
TEMPLATES_DIR = Path(__file__).parent.parent / "src/inference_endpoint/config/templates"
39+
40+
_COMMON = {
41+
"model_params": ModelParams(
42+
name="<MODEL_NAME eg: meta-llama/Llama-3.1-8B-Instruct>",
43+
temperature=0.7,
44+
top_p=0.9,
45+
max_new_tokens=1024,
46+
),
47+
"endpoint_config": EndpointConfig(
48+
endpoints=["<ENDPOINT_URL eg: http://localhost:8000>"],
49+
),
50+
"datasets": [
51+
Dataset(
52+
name="perf-test",
53+
type="performance",
54+
path="<DATASET_PATH eg: tests/datasets/dummy_1k.jsonl>",
55+
samples=1000,
56+
parser={"prompt": "text_input"},
57+
)
58+
],
59+
}
60+
61+
_TEMPLATES = {
62+
"offline": {**_COMMON},
63+
"online": {
64+
**_COMMON,
65+
"settings": OnlineSettings(
66+
load_pattern=LoadPattern(type=LoadPatternType.POISSON, target_qps=10.0),
67+
),
68+
},
69+
"concurrency": {
70+
**_COMMON,
71+
"settings": OnlineSettings(
72+
load_pattern=LoadPattern(
73+
type=LoadPatternType.CONCURRENCY, target_concurrency=32
74+
),
75+
),
76+
},
77+
}
78+
79+
_EXCLUDE = {"verbose", "submission_ref", "benchmark_mode"}
80+
81+
82+
def _clean(full: dict) -> dict:
83+
out = {k: v for k, v in full.items() if k not in _EXCLUDE}
84+
if "settings" in out and "client" in out["settings"]:
85+
out["settings"]["client"]["num_workers"] = 4
86+
return out
87+
88+
89+
def main(check_only: bool = False):
90+
"""Regenerate templates, or check they're up to date (--check flag)."""
91+
stale = False
92+
for name, overrides in _TEMPLATES.items():
93+
test_type = TestType.OFFLINE if name == "offline" else TestType.ONLINE
94+
try:
95+
base = BenchmarkConfig.create_default_config(test_type)
96+
cfg = base.with_updates(**overrides)
97+
except (CLIError, ValueError) as e:
98+
print(f" FAIL: {name} ({e})")
99+
stale = True
100+
continue
101+
102+
expected = yaml.dump(
103+
_clean(cfg.model_dump(mode="json")),
104+
default_flow_style=False,
105+
sort_keys=False,
106+
)
107+
path = TEMPLATES_DIR / f"{name}_template.yaml"
108+
109+
if check_only:
110+
current = path.read_text() if path.exists() else ""
111+
if current != expected:
112+
print(f" STALE: {path.name}")
113+
stale = True
114+
else:
115+
print(f" OK: {path.name}")
116+
else:
117+
path.write_text(expected)
118+
print(f" Generated: {path.name}")
119+
120+
if stale:
121+
print("\nRun: python scripts/regenerate_templates.py")
122+
raise SystemExit(1)
123+
124+
125+
if __name__ == "__main__":
126+
import sys
127+
128+
main(check_only="--check" in sys.argv)

src/inference_endpoint/config/schema.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,7 @@ def _validate_durations(self) -> Self:
339339
return self
340340

341341

342+
@cyclopts.Parameter(name="*")
342343
class LoadPattern(BaseModel):
343344
"""Load pattern configuration.
344345
@@ -352,7 +353,7 @@ class LoadPattern(BaseModel):
352353

353354
type: Annotated[
354355
LoadPatternType,
355-
cyclopts.Parameter(alias="--load-pattern", help="Load pattern type"),
356+
cyclopts.Parameter(name="--load-pattern", help="Load pattern type"),
356357
] = LoadPatternType.MAX_THROUGHPUT
357358
target_qps: Annotated[
358359
float | None, cyclopts.Parameter(alias="--target-qps", help="Target QPS")
Lines changed: 54 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,69 @@
1-
# Online Concurrency-Based Benchmark (NOT YET IMPLEMENTED)
2-
# This template shows the future concurrency-based online mode
3-
name: "concurrency-benchmark"
4-
version: "1.0"
5-
type: "online"
6-
1+
name: online_benchmark
2+
version: '1.0'
3+
type: online
74
model_params:
8-
name: "meta-llama/Llama-3.1-8B-Instruct"
5+
name: '<MODEL_NAME eg: meta-llama/Llama-3.1-8B-Instruct>'
96
temperature: 0.7
7+
top_k: null
108
top_p: 0.9
9+
repetition_penalty: null
1110
max_new_tokens: 1024
12-
11+
osl_distribution: null
12+
streaming: 'on'
1313
datasets:
14-
- name: "concurrency-test"
15-
type: "performance"
16-
path: "datasets/queries.jsonl"
17-
samples: 500
18-
14+
- name: perf-test
15+
type: performance
16+
path: '<DATASET_PATH eg: tests/datasets/dummy_1k.jsonl>'
17+
format: null
18+
samples: 1000
19+
eval_method: null
20+
parser:
21+
prompt: text_input
22+
accuracy_config: null
1923
settings:
2024
runtime:
21-
min_duration_ms: 600000 # 10 minutes
22-
max_duration_ms: 1800000 # 30 minutes
23-
scheduler_random_seed: 42 # For Poisson/distribution sampling
24-
dataloader_random_seed: 42 # For dataset shuffling
25-
25+
min_duration_ms: 600000
26+
max_duration_ms: 0
27+
n_samples_to_issue: null
28+
scheduler_random_seed: 42
29+
dataloader_random_seed: 42
2630
load_pattern:
27-
type: "concurrency" # NOT YET IMPLEMENTED
28-
target_concurrency: 32 # Maintain 32 concurrent requests
29-
# Note: target_qps is not used in this mode
30-
# QPS will be determined by: concurrency / avg_latency
31-
31+
type: concurrency
32+
target_qps: null
33+
target_concurrency: 32
3234
client:
3335
num_workers: 4
34-
36+
record_worker_events: false
37+
log_level: INFO
38+
warmup_connections: -1
39+
max_connections: -1
40+
transport:
41+
type: zmq
42+
recv_buffer_size: 16777216
43+
send_buffer_size: 16777216
44+
io_threads: 4
45+
worker_io_threads: 1
46+
high_water_mark: 0
47+
linger: -1
48+
immediate: 1
49+
stream_all_chunks: false
50+
worker_initialization_timeout: 60.0
51+
worker_graceful_shutdown_wait: 0.5
52+
worker_force_kill_timeout: 0.5
53+
max_idle_time: 4.0
54+
min_required_connections: -1
55+
worker_gc_mode: relaxed
3556
metrics:
3657
collect:
37-
- "throughput" # Will be concurrency / avg_latency
38-
- "latency" # p50, p90, p95, p99, p999 at this concurrency level
39-
- "ttft"
40-
- "tpot"
41-
58+
- throughput
59+
- latency
60+
- ttft
61+
- tpot
4262
endpoint_config:
4363
endpoints:
44-
- "http://localhost:8000"
64+
- '<ENDPOINT_URL eg: http://localhost:8000>'
4565
api_key: null
46-
api_type: "openai" # Options: openai or sglang
47-
# How this differs from Poisson mode:
48-
# - Poisson: Fixed QPS target, concurrency varies based on latency
49-
# - Concurrency: Fixed N requests in-flight, QPS varies based on latency
50-
# - Useful for: Measuring latency at specific concurrency levels
66+
api_type: openai
67+
report_dir: null
68+
timeout: null
69+
enable_cpu_affinity: true

0 commit comments

Comments
 (0)