You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .pre-commit-config.yaml
+7Lines changed: 7 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -47,6 +47,13 @@ repos:
47
47
args: ["--tb=short", "--strict-markers"]
48
48
stages: [manual]
49
49
50
+
- id: validate-templates
51
+
name: Validate YAML templates against schema
52
+
entry: python -c "from pathlib import Path; from inference_endpoint.config.schema import BenchmarkConfig; [BenchmarkConfig.from_yaml_file(f) for f in sorted(Path('src/inference_endpoint/config/templates').glob('*.yaml'))]"
|**Load Generator**|`src/inference_endpoint/load_generator/`| Central orchestrator: `BenchmarkSession` owns the lifecycle, `Scheduler` controls timing, `LoadGenerator` issues queries |
52
-
|**Endpoint Client**|`src/inference_endpoint/endpoint_client/`| Multi-process HTTP workers communicating via ZMQ IPC. `HTTPEndpointClient` is the main entry point |
53
-
|**Dataset Manager**|`src/inference_endpoint/dataset_manager/`| Loads pickle, HuggingFace, JSONL datasets. `Dataset` base class with `load_sample()`/`num_samples()` interface |
54
-
|**Metrics**|`src/inference_endpoint/metrics/`|`EventRecorder` writes to SQLite, `MetricsReporter` reads and aggregates (QPS, latency, TTFT, TPOT) |
55
-
|**Config**|`src/inference_endpoint/config/`| Pydantic-based YAML schema (`schema.py`), ruleset registry for MLCommons compliance, `RuntimeSettings` for runtime state |
56
-
|**CLI**|`src/inference_endpoint/cli.py`| argparse-based with subcommands dispatched from `commands/`|
|**Load Generator**|`src/inference_endpoint/load_generator/`| Central orchestrator: `BenchmarkSession` owns the lifecycle, `Scheduler` controls timing, `LoadGenerator` issues queries |
52
+
|**Endpoint Client**|`src/inference_endpoint/endpoint_client/`| Multi-process HTTP workers communicating via ZMQ IPC. `HTTPEndpointClient` is the main entry point |
53
+
|**Dataset Manager**|`src/inference_endpoint/dataset_manager/`| Loads pickle, HuggingFace, JSONL datasets. `Dataset` base class with `load_sample()`/`num_samples()` interface |
54
+
|**Metrics**|`src/inference_endpoint/metrics/`|`EventRecorder` writes to SQLite, `MetricsReporter` reads and aggregates (QPS, latency, TTFT, TPOT) |
55
+
|**Config**|`src/inference_endpoint/config/`| Pydantic-based YAML schema (`schema.py`), ruleset registry for MLCommons compliance, `RuntimeSettings` for runtime state |
56
+
|**CLI**|`src/inference_endpoint/main.py`, `commands/benchmark/cli.py`| cyclopts-based, auto-generated from `schema.py` Pydantic models. Flat shorthands via `cyclopts.Parameter(alias=...)`|
|**OpenAI/SGLang**|`src/inference_endpoint/openai/`, `sglang/`| Protocol adapters and response accumulators for different API formats |
59
59
60
60
### Hot-Path Architecture
61
61
@@ -69,9 +69,32 @@ Multi-process, event-loop design optimized for throughput:
69
69
70
70
### CLI Modes
71
71
72
-
-**CLI mode** (`offline`/`online`): Parameters from command-line arguments
73
-
-**YAML mode** (`from-config`): All config from file, no CLI overrides except `--timeout`
74
-
-**eval**: Accuracy evaluation — subcommand exists but is not yet implemented (raises `NotImplementedError`)
72
+
CLI is auto-generated from `config/schema.py` Pydantic models via cyclopts. Fields annotated with `cyclopts.Parameter(alias="--flag")` get flat shorthands; all other fields get auto-generated dotted flags (kebab-case).
73
+
74
+
-**CLI mode** (`offline`/`online`): cyclopts constructs `OfflineBenchmarkConfig`/`OnlineBenchmarkConfig` (subclasses in `config/schema.py`) directly from CLI args. Type locked via `Literal`. `--dataset` is repeatable with TOML-style format `[perf|acc:]<path>[,key=value...]` (e.g. `--dataset data.csv,samples=500,parser.prompt=article`). Full accuracy support via `accuracy_config.eval_method=pass_at_1` etc.
75
+
-**YAML mode** (`from-config`): `BenchmarkConfig.from_yaml_file()` loads YAML, resolves env vars, and auto-selects the right subclass via Pydantic discriminated union. Optional `--timeout`/`--mode` overrides via `config.with_updates()`.
76
+
-**eval**: Not yet implemented (raises `NotImplementedError`)
77
+
78
+
### Config Construction & Validation
79
+
80
+
Both CLI and YAML produce the same subclass via Pydantic discriminated union on `type`:
-`submission_ref`, `benchmark_mode`: `show=False` on base class
92
+
93
+
Validation is layered:
94
+
95
+
1.**Field-level** (Pydantic): `Field(ge=0)` on durations, `Field(ge=-1)` on workers, `Literal` on `benchmark_mode`
96
+
2.**Field validators**: `workers != 0` check
97
+
3.**Model validator** (`_resolve_and_validate`): streaming AUTO resolution, model name from `submission_ref`, load pattern vs test type, cross-field duration check, duplicate datasets
75
98
76
99
### Load Patterns
77
100
@@ -83,14 +106,17 @@ Multi-process, event-loop design optimized for throughput:
0 commit comments