Skip to content

Commit f13db30

Browse files
authored
#114 doc: Fixup markdown documentation (#180)
* Fix some doc Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com> * Fix 114 Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com> --------- Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
1 parent 935d055 commit f13db30

File tree

5 files changed

+38
-67
lines changed

5 files changed

+38
-67
lines changed

AGENTS.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,8 @@ Multi-process, event-loop design optimized for throughput:
7070
### CLI Modes
7171

7272
- **CLI mode** (`offline`/`online`): Parameters from command-line arguments
73-
- **YAML mode** (`from-config`): All config from file, no CLI overrides except `--output`
73+
- **YAML mode** (`from-config`): All config from file, no CLI overrides except `--timeout`
74+
- **eval**: Accuracy evaluation — subcommand exists but is not yet implemented (raises `NotImplementedError`)
7475

7576
### Load Patterns
7677

@@ -87,7 +88,9 @@ src/inference_endpoint/
8788
├── exceptions.py # CLIError, ExecutionError, InputValidationError, SetupError
8889
├── commands/ # benchmark, eval, probe, info, validate, init
8990
│ ├── benchmark.py # Core benchmark command implementation
90-
│ └── probe.py # Endpoint health checking
91+
│ ├── eval.py # Accuracy evaluation command (not yet implemented)
92+
│ ├── probe.py # Endpoint health checking
93+
│ └── utils.py # info, validate, init command implementations
9194
├── core/types.py # Query, QueryResult, StreamChunk, QueryStatus (msgspec Structs)
9295
├── load_generator/
9396
│ ├── session.py # BenchmarkSession - top-level orchestrator
@@ -104,7 +107,8 @@ src/inference_endpoint/
104107
│ ├── config.py # HTTPClientConfig
105108
│ ├── adapter_protocol.py # HttpRequestAdapter protocol
106109
│ ├── accumulator_protocol.py # Response accumulation protocol
107-
│ └── cpu_affinity.py # CPU pinning
110+
│ ├── cpu_affinity.py # CPU pinning
111+
│ └── utils.py # Port range helpers
108112
├── async_utils/
109113
│ ├── loop_manager.py # LoopManager (uvloop + eager_task_factory)
110114
│ ├── event_publisher.py # Async event pub/sub
@@ -127,6 +131,7 @@ src/inference_endpoint/
127131
│ ├── runtime_settings.py # RuntimeSettings dataclass
128132
│ ├── ruleset_base.py # BenchmarkSuiteRuleset base
129133
│ ├── ruleset_registry.py # Ruleset registry
134+
│ ├── user_config.py # UserConfig dataclass for ruleset user overrides
130135
│ ├── rulesets/mlcommons/ # MLCommons-specific rules, datasets, models
131136
│ └── templates/ # YAML config templates (offline, online, eval, etc.)
132137
├── openai/ # OpenAI-compatible API types and adapters
@@ -146,7 +151,8 @@ src/inference_endpoint/
146151
└── utils/
147152
├── logging.py # Logging setup
148153
├── version.py # Version info
149-
└── dataset_utils.py # Dataset utilities
154+
├── dataset_utils.py # Dataset utilities
155+
└── benchmark_httpclient.py # HTTP client throughput benchmarking utility
150156
151157
tests/
152158
├── conftest.py # Shared fixtures (echo/oracle servers, datasets, settings)

docs/CLI_QUICK_REFERENCE.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -159,8 +159,7 @@ inference-endpoint benchmark online \
159159
--target-qps 100 \
160160
--num-samples 10000 \
161161
--workers 16 \
162-
--output results.json \
163-
--report-path production_report \
162+
--report-dir production_report \
164163
-v
165164

166165
# Or with duration (calculates samples from target_qps * duration)
@@ -172,8 +171,7 @@ inference-endpoint benchmark online \
172171
--target-qps 100 \
173172
--duration 300 \
174173
--workers 16 \
175-
--output results.json \
176-
--report-path production_report \
174+
--report-dir production_report \
177175
-v
178176
```
179177

@@ -260,8 +258,8 @@ endpoint_config:
260258

261259
- All configuration from YAML file
262260
- Reproducible, shareable configs
263-
- No CLI parameter mixing (only --output auxiliary allowed)
264-
- Example: `benchmark from-config --config file.yaml --output results.json`
261+
- No CLI parameter mixing (only `--timeout` auxiliary allowed)
262+
- Example: `benchmark from-config --config file.yaml --timeout 600`
265263

266264
## Tips
267265

@@ -281,6 +279,6 @@ endpoint_config:
281279
**Best Practices:**
282280

283281
- Share YAML configs for reproducible results across systems
284-
- Use `--report-path` for detailed metrics with TTFT, TPOT, and token analysis
282+
- Use `--report-dir` for detailed metrics with TTFT, TPOT, and token analysis
285283
- Set `HF_TOKEN` environment variable for non-public models
286284
- Use `--min-output-tokens` and `--max-output-tokens` to control output length

docs/DEVELOPMENT.md

Lines changed: 14 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,7 @@ python3.12 -m venv venv
2323
source venv/bin/activate # On Windows: venv\Scripts\activate
2424

2525
# 3. Install development dependencies
26-
pip install -e .
27-
pip install -r requirements/base.txt
26+
pip install -e ".[dev,test]"
2827

2928
# 4. Install pre-commit hooks
3029
pre-commit install
@@ -49,7 +48,6 @@ inference-endpoint/
4948
│ ├── metrics/ # Performance measurement and reporting
5049
│ ├── openai/ # OpenAI API compatibility
5150
│ ├── profiling/ # Performance profiling tools
52-
│ ├── runtime/ # Runtime configuration
5351
│ ├── testing/ # Test utilities (echo server, etc.)
5452
│ └── utils/ # Common utilities
5553
├── tests/ # Test suite
@@ -59,7 +57,6 @@ inference-endpoint/
5957
│ └── datasets/ # Test datasets
6058
├── docs/ # Documentation
6159
├── examples/ # Usage examples
62-
├── requirements/ # Dependency management
6360
└── scripts/ # Utility scripts
6461
```
6562

@@ -112,7 +109,7 @@ class TestQuery:
112109
assert query.prompt == "Test"
113110
assert query.model == "test-model"
114111

115-
@pytest.mark.asyncio
112+
@pytest.mark.asyncio(mode="strict")
116113
async def test_async_operation(self):
117114
"""Test async operations."""
118115
# Your async test here
@@ -142,22 +139,18 @@ git commit --no-verify
142139
### Code Formatting
143140

144141
```bash
145-
# Format code with Black
146-
black src/ tests/
147-
148-
# Sort imports with isort
149-
isort src/ tests/
142+
# Format code with ruff
143+
ruff format src/ tests/
150144

151145
# Check formatting without changing files
152-
black --check src/ tests/
153-
isort --check-only src/ tests/
146+
ruff format --check src/ tests/
154147
```
155148

156149
### Linting
157150

158151
```bash
159-
# Run flake8
160-
flake8 src/ tests/
152+
# Run ruff linter
153+
ruff check src/ tests/
161154

162155
# Run mypy for type checking
163156
mypy src/
@@ -195,7 +188,7 @@ When developing a new component:
195188
3. **Implement the component** following the established patterns
196189
4. **Add tests** in the corresponding `tests/unit/` directory
197190
5. **Update main package** `__init__.py` if needed
198-
6. **Add dependencies** to appropriate `requirements/` files
191+
6. **Add dependencies** to `pyproject.toml` under `[project.dependencies]` or `[project.optional-dependencies]`
199192

200193
### 3. Testing Strategy
201194

@@ -287,20 +280,18 @@ python -m pdb -m pytest test_file.py
287280

288281
### Adding Dependencies
289282

290-
1. **Base Dependencies** (`requirements/base.txt`): Required for package to function, development tools, linters, and pre-commit hooks
291-
2. **Test Dependencies** (`requirements/test.txt`): Testing framework and utilities (pytest, pytest-asyncio, etc.)
283+
Add dependencies to `pyproject.toml`:
292284

293-
### Updating Dependencies
285+
- **Runtime dependencies**: `[project.dependencies]`
286+
- **Optional groups** (dev, test, etc.): `[project.optional-dependencies]`
287+
288+
Install after updating:
294289

295290
```bash
296-
# Update all dependencies
297-
pip install --upgrade -r requirements/base.txt
291+
pip install -e ".[dev,test]"
298292

299293
# Check for outdated packages
300294
pip list --outdated
301-
302-
# Update specific package
303-
pip install --upgrade package-name
304295
```
305296

306297
## 🚨 Troubleshooting

docs/LOCAL_TESTING.md

Lines changed: 8 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,7 @@ inference-endpoint -v benchmark offline \
8181
--dataset tests/datasets/dummy_1k.pkl \
8282
--num-samples 5000 \
8383
--workers 4 \
84-
--output benchmark_results.json \
85-
--report-path benchmark_report
84+
--report-dir benchmark_report
8685

8786
# Note: Set HF_TOKEN environment variable if using non-public models
8887
# export HF_TOKEN=your_huggingface_token
@@ -115,7 +114,7 @@ inference-endpoint -v benchmark online \
115114
--dataset tests/datasets/dummy_1k.pkl \
116115
--load-pattern poisson \
117116
--target-qps 100 \
118-
--report-path online_benchmark_report
117+
--report-dir online_benchmark_report
119118
```
120119

121120
**Expected Output:**
@@ -157,26 +156,7 @@ inference-endpoint benchmark offline \
157156

158157
### 6. View Results
159158

160-
```bash
161-
# View benchmark results
162-
cat benchmark_results.json | jq
163-
164-
# Example output:
165-
{
166-
"config": {
167-
"endpoint": "http://localhost:8765",
168-
"mode": null,
169-
"qps": 10
170-
},
171-
"results": {
172-
"total": 1000,
173-
"successful": 1000,
174-
"failed": 0,
175-
"elapsed_time": 1.8,
176-
"qps": 555.6
177-
}
178-
}
179-
```
159+
When run with `--report-dir`, a directory is created containing benchmark metrics files (JSON/CSV) with detailed QPS, latency, TTFT, and TPOT data.
180160

181161
### 7. Stop the Echo Server
182162

@@ -262,13 +242,9 @@ inference-endpoint -v benchmark offline \
262242
--model Qwen/Qwen3-8B \
263243
--dataset tests/datasets/dummy_1k.pkl \
264244
--workers 4 \
265-
--output benchmark_results.json \
266-
--report-path benchmark_report
267-
268-
# 6. Check results
269-
cat benchmark_results.json | jq '.results'
245+
--report-dir benchmark_report
270246

271-
# 7. Stop server
247+
# 6. Stop server
272248
pkill -f echo_server
273249
```
274250

@@ -280,7 +256,7 @@ inference-endpoint benchmark offline \
280256
--endpoints http://localhost:8765 \
281257
--model Qwen/Qwen3-8B \
282258
--dataset tests/datasets/dummy_1k.pkl \
283-
--report-path offline_report
259+
--report-dir offline_report
284260

285261
# Online (Poisson distribution)
286262
inference-endpoint benchmark online \
@@ -289,7 +265,7 @@ inference-endpoint benchmark online \
289265
--dataset tests/datasets/dummy_1k.pkl \
290266
--load-pattern poisson \
291267
--target-qps 500 \
292-
--report-path online_report
268+
--report-dir online_report
293269

294270
# With explicit sample count
295271
inference-endpoint benchmark offline \
@@ -339,5 +315,5 @@ inference-endpoint benchmark online \
339315
**Advanced:**
340316

341317
- Streaming: `auto` (default), `on`, or `off` - auto enables for online, disables for offline
342-
- Use `--report-path` for detailed metrics reports with TTFT, TPOT, and token analysis
318+
- Use `--report-dir` for detailed metrics reports with TTFT, TPOT, and token analysis
343319
- Dataset format auto-inferred from file extension

examples/07_GPT-OSS-120B_SGLang_Example/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ If you already have the model weights or prefer a direct approach, follow the [i
6767
LiveCodeBench has a few security concerns and dependency conflicts, so it is recommended to run LiveCodeBench via the
6868
containerized workflow.
6969

70-
Follow the instructions in the [LiveCodeBench README](../../src/inference_endpoint/dataset_manager/predefined/livecodebench/README.md#running-the-container)
70+
Follow the instructions in the [LiveCodeBench README](../../src/inference_endpoint/evaluation/livecodebench/README.md#running-the-container)
7171

7272
#### Non-containerized run (NOT RECOMMENDED)
7373

0 commit comments

Comments
 (0)