Skip to content

Commit c5e2ed6

Browse files
committed
Documentation cleanup, refactor
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
1 parent 86cbf86 commit c5e2ed6

File tree

27 files changed

+1815
-140
lines changed

27 files changed

+1815
-140
lines changed

AGENTS.md

Lines changed: 8 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,7 @@ High-performance benchmarking tool for LLM inference endpoints targeting 50k+ QP
99
## Common Commands
1010

1111
```bash
12-
# Development setup
13-
python3.12 -m venv venv && source venv/bin/activate
14-
pip install -e ".[dev,test]"
15-
pre-commit install
12+
# Development setup — see docs/DEVELOPMENT.md for full instructions
1613

1714
# Testing
1815
pytest # All tests (excludes slow/performance)
@@ -137,10 +134,13 @@ src/inference_endpoint/
137134
│ └── utils.py # Port range helpers
138135
├── async_utils/
139136
│ ├── loop_manager.py # LoopManager (uvloop + eager_task_factory)
137+
│ ├── runner.py # run_async() — uvloop + eager_task_factory entry point for CLI commands
140138
│ ├── event_publisher.py # Async event pub/sub
139+
│ ├── services/
140+
│ │ ├── event_logger/ # EventLoggerService: writes EventRecords to JSONL/SQLite
141+
│ │ └── metrics_aggregator/ # MetricsAggregatorService: real-time metrics (TTFT, TPOT, ISL, OSL)
141142
│ └── transport/ # ZMQ-based IPC transport layer
142143
│ ├── protocol.py # Transport protocol definitions
143-
│ ├── record.py # Transport records
144144
│ └── zmq/ # ZMQ implementation (context, pubsub, transport)
145145
├── dataset_manager/
146146
│ ├── dataset.py # Dataset base class, DatasetFormat enum
@@ -192,25 +192,9 @@ tests/
192192

193193
## Development Standards
194194

195-
### Code Style
195+
### Code Style and Pre-commit Hooks
196196

197-
- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
198-
- **Type checking**: `mypy` (via pre-commit)
199-
- **Formatting**: `ruff-format` (double quotes, space indent)
200-
- **License headers**: Required on all Python files (enforced by pre-commit hook `scripts/add_license_header.py`)
201-
- **Conventional commits**: `feat:`, `fix:`, `docs:`, `test:`, `chore:`
202-
203-
### Pre-commit Hooks
204-
205-
All of these run automatically on commit:
206-
207-
- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
208-
- `ruff` (lint + autofix) and `ruff-format`
209-
- `mypy` type checking
210-
- `prettier` for YAML/JSON/Markdown
211-
- License header enforcement
212-
213-
**Always run `pre-commit run --all-files` before committing.**
197+
See [Development Guide](docs/DEVELOPMENT.md) for formatting, linting, and pre-commit hook details.
214198

215199
### Data Types & Serialization
216200

@@ -291,7 +275,7 @@ Update AGENTS.md as part of any PR that includes a **significant refactor**, mea
291275
- **Added or removed CLI commands/subcommands** — update CLI Modes and Common Commands
292276
- **Changed test infrastructure** (new fixtures, changed markers, new test directories) — update Testing section
293277
- **Added or removed key dependencies** — update Key Dependencies table
294-
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update Code Style and Pre-commit Hooks
278+
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)
295279
- **Changed hot-path patterns** (new transport, changed serialization, new performance constraints) — update Performance Guidelines
296280

297281
### How to Update

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@ Generally we encourage people to become MLCommons members if they wish to contri
77
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.
88

99
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.
10+
11+
For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).

README.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ inference-endpoint benchmark offline \
6868

6969
```bash
7070
# Start local echo server
71-
python -m inference_endpoint.testing.echo_server --port 8765 &
71+
python3 -m inference_endpoint.testing.echo_server --port 8765 &
7272

7373
# Test with dummy dataset (included in repo)
7474
inference-endpoint benchmark offline \
@@ -96,33 +96,51 @@ pytest -m "not performance and not run_explicitly"
9696

9797
## 📚 Documentation
9898

99+
- [AGENTS.md](AGENTS.md) - Architecture, conventions, and AI agent guidelines
99100
- [CLI Quick Reference](docs/CLI_QUICK_REFERENCE.md) - Command-line interface guide
100101
- [Local Testing Guide](docs/LOCAL_TESTING.md) - Test with echo server
101102
- [Development Guide](docs/DEVELOPMENT.md) - How to contribute and develop
103+
- [Performance Architecture](docs/PERF_ARCHITECTURE.md) - Hot-path design and tuning
104+
- [Performance Tuning](docs/CLIENT_PERFORMANCE_TUNING.md) - CPU affinity and client tuning
102105
- [GitHub Setup Guide](docs/GITHUB_SETUP.md) - GitHub authentication and setup
103106

107+
### Component Design Specs
108+
109+
Each top-level component under `src/inference_endpoint/` has a corresponding spec:
110+
111+
| Component | Spec |
112+
| ----------------- | ---------------------------------------------------------------- |
113+
| Core types | [docs/core/Design.md](docs/core/Design.md) |
114+
| Load generator | [docs/load_generator/Design.md](docs/load_generator/Design.md) |
115+
| Endpoint client | [docs/endpoint_client/Design.md](docs/endpoint_client/Design.md) |
116+
| Metrics | [docs/metrics/Design.md](docs/metrics/Design.md) |
117+
| Config | [docs/config/Design.md](docs/config/Design.md) |
118+
| Async utils | [docs/async_utils/Design.md](docs/async_utils/Design.md) |
119+
| Dataset manager | [docs/dataset_manager/Design.md](docs/dataset_manager/Design.md) |
120+
| Commands (CLI) | [docs/commands/Design.md](docs/commands/Design.md) |
121+
| OpenAI adapter | [docs/openai/Design.md](docs/openai/Design.md) |
122+
| SGLang adapter | [docs/sglang/Design.md](docs/sglang/Design.md) |
123+
| Evaluation | [docs/evaluation/Design.md](docs/evaluation/Design.md) |
124+
| Testing utilities | [docs/testing/Design.md](docs/testing/Design.md) |
125+
| Profiling | [docs/profiling/Design.md](docs/profiling/Design.md) |
126+
| Plugins | [docs/plugins/Design.md](docs/plugins/Design.md) |
127+
| Utils | [docs/utils/Design.md](docs/utils/Design.md) |
128+
104129
## 🎯 Architecture
105130

106131
The system follows a modular, event-driven architecture:
107132

108133
```
109-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
110-
│ Dataset │ │ Load │ │ Endpoint │
111-
│ Manager │───▶│ Generator │───▶│ Client │
112-
└─────────────────┘ └─────────────────┘ └─────────────────┘
113-
│ │ │
114-
▼ ▼ ▼
115-
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
116-
│ Metrics │ │ Configuration │ │ Endpoint │
117-
│ Collector │◄───│ Manager │ │ (External) │
118-
└─────────────────┘ └─────────────────┘ └─────────────────┘
134+
Dataset Manager ──► Load Generator ──► Endpoint Client ──► External Endpoint
135+
136+
Metrics Collector
137+
(EventRecorder + MetricsReporter)
119138
```
120139

121-
- **Load Generator**: Central orchestrator managing query lifecycle
122-
- **Dataset Manager**: Handles benchmark datasets and preprocessing
123-
- **Endpoint Client**: Abstract interface for endpoint communication
124-
- **Metrics Collector**: Performance measurement and analysis
125-
- **Configuration Manager**: System configuration (TBD)
140+
- **Dataset Manager**: Loads benchmark datasets and applies transform pipelines
141+
- **Load Generator**: Central orchestrator — controls timing (scheduler), issues queries, and emits sample events
142+
- **Endpoint Client**: Multi-process HTTP worker pool communicating over ZMQ IPC
143+
- **Metrics Collector**: Receives sample events from Load Generator; writes to SQLite (EventRecorder), aggregates after the run (MetricsReporter)
126144

127145
## Accuracy Evaluation
128146

@@ -134,14 +152,13 @@ configuration. Currently, Inference Endpoints provides the following pre-defined
134152
- LiveCodeBench (default: lite, release_v6)
135153

136154
However, LiveCodeBench will not work out-of-the-box and requires some additional setup. See the
137-
[LiveCodeBench](src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md) documentation
138-
for details and explanations.
155+
[LiveCodeBench](src/inference_endpoint/evaluation/livecodebench/README.md) documentation for
156+
details and explanations.
139157

140158
## 🚧 Pending Features
141159

142160
The following features are planned for future releases:
143161

144-
- [ ] **Performance Tuning** - Advanced performance optimization features
145162
- [ ] **Submission Ruleset Integration** - Full MLPerf submission workflow support
146163
- [ ] **Documentation Generation and Hosting** - Sphinx-based API documentation with GitHub Pages
147164

@@ -168,7 +185,8 @@ We are grateful to these communities for their contributions to LLM benchmarking
168185

169186
## 📄 License
170187

171-
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
188+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE.md) file for
189+
details.
172190

173191
## 🔗 Links
174192

docs/CLI_QUICK_REFERENCE.md

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,6 @@
11
# CLI Quick Reference
22

3-
## Architecture
4-
5-
The CLI is auto-generated from Pydantic models in `config/schema.py` using
6-
cyclopts. schema.py is the single source of truth for both YAML configs and CLI flags.
7-
8-
- **All schema fields** available as CLI flags on each subcommand (dotted kebab-case)
9-
- **Shorthand aliases** declared via `cyclopts.Parameter(alias="--flag")` on schema fields
10-
- **`${VAR}` interpolation** in YAML files (with `${VAR:-default}` fallback)
3+
Command-line reference for all `inference-endpoint` subcommands, flags, load patterns, and usage examples.
114

125
## Commands
136

@@ -109,6 +102,8 @@ Flag names shown as `--full.dotted.path --alias`. Both forms work.
109102
- `--endpoint-config.api-key --api-key` - API authentication
110103
- `--endpoint-config.api-type --api-type` - API type: openai/sglang (default: openai)
111104
- `--report-dir` - Report output directory
105+
Note: applies to CLI-driven `benchmark offline` / `benchmark online`; `benchmark from-config`
106+
currently expects `report_dir` to be set in the YAML.
112107
- `--timeout` - Global timeout in seconds
113108
- `--enable-cpu-affinity / --no-cpu-affinity` - NUMA-aware CPU pinning (default: true)
114109

@@ -242,10 +237,9 @@ inference-endpoint init submission
242237
243238
# 2. Edit submission_template.yaml (set model, datasets, ruleset, endpoint)
244239
245-
# 3. Run (YAML mode)
240+
# 3. Run (YAML mode - config-driven; CLI only allows --config, --timeout, and --mode; set report-dir in the YAML)
246241
inference-endpoint benchmark from-config \
247-
--config submission_template.yaml \
248-
--report-dir official_results
242+
--config submission_template.yaml
249243
```
250244

251245
### Validate First

0 commit comments

Comments
 (0)