Skip to content

Commit 86cbf86

Browse files
authored
feat: use cyclopts to enforce CLI/YAML parity (#193)
1 parent 08476d7 commit 86cbf86

40 files changed

+3289
-2945
lines changed

.pre-commit-config.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,13 @@ repos:
4747
args: ["--tb=short", "--strict-markers"]
4848
stages: [manual]
4949

50+
- id: validate-templates
51+
name: Validate YAML templates against schema
52+
entry: python -c "from pathlib import Path; from inference_endpoint.config.schema import BenchmarkConfig; [BenchmarkConfig.from_yaml_file(f) for f in sorted(Path('src/inference_endpoint/config/templates').glob('*.yaml'))]"
53+
language: system
54+
pass_filenames: false
55+
files: ^src/inference_endpoint/config/(schema\.py|templates/)
56+
5057
- id: add-license-header
5158
name: Add license headers
5259
entry: python scripts/add_license_header.py

AGENTS.md

Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -46,16 +46,16 @@ Dataset Manager --> Load Generator --> Endpoint Client --> External Endpoint
4646

4747
### Key Components
4848

49-
| Component | Location | Purpose |
50-
| ------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
51-
| **Load Generator** | `src/inference_endpoint/load_generator/` | Central orchestrator: `BenchmarkSession` owns the lifecycle, `Scheduler` controls timing, `LoadGenerator` issues queries |
52-
| **Endpoint Client** | `src/inference_endpoint/endpoint_client/` | Multi-process HTTP workers communicating via ZMQ IPC. `HTTPEndpointClient` is the main entry point |
53-
| **Dataset Manager** | `src/inference_endpoint/dataset_manager/` | Loads pickle, HuggingFace, JSONL datasets. `Dataset` base class with `load_sample()`/`num_samples()` interface |
54-
| **Metrics** | `src/inference_endpoint/metrics/` | `EventRecorder` writes to SQLite, `MetricsReporter` reads and aggregates (QPS, latency, TTFT, TPOT) |
55-
| **Config** | `src/inference_endpoint/config/` | Pydantic-based YAML schema (`schema.py`), ruleset registry for MLCommons compliance, `RuntimeSettings` for runtime state |
56-
| **CLI** | `src/inference_endpoint/cli.py` | argparse-based with subcommands dispatched from `commands/` |
57-
| **Async Utils** | `src/inference_endpoint/async_utils/` | `LoopManager` (uvloop + eager_task_factory), ZMQ transport layer, event publisher |
58-
| **OpenAI/SGLang** | `src/inference_endpoint/openai/`, `sglang/` | Protocol adapters and response accumulators for different API formats |
49+
| Component | Location | Purpose |
50+
| ------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
51+
| **Load Generator** | `src/inference_endpoint/load_generator/` | Central orchestrator: `BenchmarkSession` owns the lifecycle, `Scheduler` controls timing, `LoadGenerator` issues queries |
52+
| **Endpoint Client** | `src/inference_endpoint/endpoint_client/` | Multi-process HTTP workers communicating via ZMQ IPC. `HTTPEndpointClient` is the main entry point |
53+
| **Dataset Manager** | `src/inference_endpoint/dataset_manager/` | Loads pickle, HuggingFace, JSONL datasets. `Dataset` base class with `load_sample()`/`num_samples()` interface |
54+
| **Metrics** | `src/inference_endpoint/metrics/` | `EventRecorder` writes to SQLite, `MetricsReporter` reads and aggregates (QPS, latency, TTFT, TPOT) |
55+
| **Config** | `src/inference_endpoint/config/` | Pydantic-based YAML schema (`schema.py`), ruleset registry for MLCommons compliance, `RuntimeSettings` for runtime state |
56+
| **CLI** | `src/inference_endpoint/main.py`, `commands/benchmark/cli.py` | cyclopts-based, auto-generated from `schema.py` Pydantic models. Flat shorthands via `cyclopts.Parameter(alias=...)` |
57+
| **Async Utils** | `src/inference_endpoint/async_utils/` | `LoopManager` (uvloop + eager_task_factory), ZMQ transport layer, event publisher |
58+
| **OpenAI/SGLang** | `src/inference_endpoint/openai/`, `sglang/` | Protocol adapters and response accumulators for different API formats |
5959

6060
### Hot-Path Architecture
6161

@@ -69,9 +69,32 @@ Multi-process, event-loop design optimized for throughput:
6969

7070
### CLI Modes
7171

72-
- **CLI mode** (`offline`/`online`): Parameters from command-line arguments
73-
- **YAML mode** (`from-config`): All config from file, no CLI overrides except `--timeout`
74-
- **eval**: Accuracy evaluation — subcommand exists but is not yet implemented (raises `NotImplementedError`)
72+
CLI is auto-generated from `config/schema.py` Pydantic models via cyclopts. Fields annotated with `cyclopts.Parameter(alias="--flag")` get flat shorthands; all other fields get auto-generated dotted flags (kebab-case).
73+
74+
- **CLI mode** (`offline`/`online`): cyclopts constructs `OfflineBenchmarkConfig`/`OnlineBenchmarkConfig` (subclasses in `config/schema.py`) directly from CLI args. Type locked via `Literal`. `--dataset` is repeatable with TOML-style format `[perf|acc:]<path>[,key=value...]` (e.g. `--dataset data.csv,samples=500,parser.prompt=article`). Full accuracy support via `accuracy_config.eval_method=pass_at_1` etc.
75+
- **YAML mode** (`from-config`): `BenchmarkConfig.from_yaml_file()` loads YAML, resolves env vars, and auto-selects the right subclass via Pydantic discriminated union. Optional `--timeout`/`--mode` overrides via `config.with_updates()`.
76+
- **eval**: Not yet implemented (raises `NotImplementedError`)
77+
78+
### Config Construction & Validation
79+
80+
Both CLI and YAML produce the same subclass via Pydantic discriminated union on `type`:
81+
82+
```
83+
CLI offline/online: cyclopts → OfflineBenchmarkConfig/OnlineBenchmarkConfig → with_updates(datasets) → run_benchmark
84+
YAML from-config: from_yaml_file(path) → discriminated union → same subclass → run_benchmark
85+
```
86+
87+
`OfflineBenchmarkConfig` and `OnlineBenchmarkConfig` (in `config/schema.py`) inherit `BenchmarkConfig`:
88+
89+
- `type`: locked via `Literal[TestType.OFFLINE]` / `Literal[TestType.ONLINE]`
90+
- `settings`: `OfflineSettings` (hides load pattern) / `OnlineSettings`
91+
- `submission_ref`, `benchmark_mode`: `show=False` on base class
92+
93+
Validation is layered:
94+
95+
1. **Field-level** (Pydantic): `Field(ge=0)` on durations, `Field(ge=-1)` on workers, `Literal` on `benchmark_mode`
96+
2. **Field validators**: `workers != 0` check
97+
3. **Model validator** (`_resolve_and_validate`): streaming AUTO resolution, model name from `submission_ref`, load pattern vs test type, cross-field duration check, duplicate datasets
7598

7699
### Load Patterns
77100

@@ -83,14 +106,17 @@ Multi-process, event-loop design optimized for throughput:
83106

84107
```
85108
src/inference_endpoint/
86-
├── main.py # Entry point (run())
87-
├── cli.py # CLI parser & dispatcher
109+
├── main.py # Entry point + CLI app: cyclopts app, commands, error formatter, run()
88110
├── exceptions.py # CLIError, ExecutionError, InputValidationError, SetupError
89-
├── commands/ # benchmark, eval, probe, info, validate, init
90-
│ ├── benchmark.py # Core benchmark command implementation
91-
│ ├── eval.py # Accuracy evaluation command (not yet implemented)
92-
│ ├── probe.py # Endpoint health checking
93-
│ └── utils.py # info, validate, init command implementations
111+
├── commands/ # Command execution logic
112+
│ ├── benchmark/
113+
│ │ ├── __init__.py
114+
│ │ ├── cli.py # benchmark_app: offline, online, from-config subcommands
115+
│ │ └── execute.py # Phased execution: setup/run_threaded/finalize + BenchmarkContext
116+
│ ├── probe.py # ProbeConfig + execute_probe()
117+
│ ├── info.py # execute_info()
118+
│ ├── validate.py # execute_validate()
119+
│ └── init.py # execute_init()
94120
├── core/types.py # Query, QueryResult, StreamChunk, QueryStatus (msgspec Structs)
95121
├── load_generator/
96122
│ ├── session.py # BenchmarkSession - top-level orchestrator
@@ -126,8 +152,7 @@ src/inference_endpoint/
126152
│ ├── reporter.py # MetricsReporter (aggregation)
127153
│ └── metric.py # Metric types (Throughput, etc.)
128154
├── config/
129-
│ ├── schema.py # Pydantic models: LoadPattern, APIType, DatasetType, etc.
130-
│ ├── yaml_loader.py # YAML config loading
155+
│ ├── schema.py # Single source of truth: Pydantic models + cyclopts annotations
131156
│ ├── runtime_settings.py # RuntimeSettings dataclass
132157
│ ├── ruleset_base.py # BenchmarkSuiteRuleset base
133158
│ ├── ruleset_registry.py # Ruleset registry
@@ -244,6 +269,7 @@ These apply especially to code in the hot path (load generator, endpoint client,
244269
| `msgspec` | Fast serialization for core types and ZMQ transport |
245270
| `pyzmq` | ZMQ IPC between main process and workers |
246271
| `pydantic` | Configuration validation |
272+
| `cyclopts` | CLI framework — auto-generates flags from Pydantic |
247273
| `duckdb` | Data aggregation |
248274
| `transformers` | Tokenization for OSL reporting |
249275

0 commit comments

Comments
 (0)