Python 3.10+ async AI benchmarking tool for measuring LLM inference server performance. 9 services communicate via ZMQ message bus.
Reference documentation:
docs/architecture.md- Three-plane architecture, core components, credit system, data flow, communication patternsdocs/dev/patterns.md- Code examples for CLI commands, services, models, messages, plugins, error handling, logging, testingdocs/cli_options.md- Complete CLI command and option referencedocs/environment_variables.md- AllAIPERF_*environment variables by subsystemdocs/metrics_reference.md- Metric definitions, formulas, and requirementsdocs/plugins/plugin-system.md- Plugin architecture, categories, creation guideCONTRIBUTING.md- Development setup, available commands, pre-commit hooks, DCO
- async/await for ALL I/O - no
time.sleep, no blocking calls. Field(description="...")on EVERY Pydantic field. Docstrings on dataclass fields.- Type hints on ALL functions (params and return).
- KISS + DRY: minimal code, optimize for reader.
AIPerfBaseModelfor data,BaseConfigfor configuration.@dataclass(slots=True)for hot-path inner models created at high volume (e.g. SSE chunks, parsed responses) where Pydantic overhead matters. Use__pydantic_config__ = ConfigDict(extra="forbid")on dataclasses that participate in Pydantic union discrimination.BaseComponentServicefor services,BaseServicefor SystemController only.- Message bus for inter-service communication - no shared mutable state.
- CLI commands: one file per command in
cli_commands/, lazily loaded via import strings incli.py. Seedocs/dev/patterns.md. - YAML plugin registry for extensible features (
plugins.yaml). - Lambda for expensive logs:
self.debug(lambda: f"{self._x()}"). Direct string for cheap ones. - Always
orjson.loads(s),orjson.dumps(d)for JSON. - No
Optional[X]orUnion[X, Y]- useX | Y. - Comments only for "why?" not "what".
- Enums are string-based - use
MessageType.Xdirectly, never.value. - Dependencies: always use
uv(never pip) -uv add package,uv run pytest. - Use mermaid diagrams instead of ASCII art in markdown files.
- Do not create markdown files to document code changes or decisions.
- Do not over-comment code. Removing code is fine without adding comments to explain why.
- No emojis in code or comments.
make first-time-setup # Initial environment setup
make install # Install project + mock server
uv run pytest tests/unit/ -n auto # Unit tests (fast, isolated)
uv run pytest -m integration -n auto # Integration tests (real services, multiprocess)
uv run pytest -m component_integration -n auto # Component integration tests (single process)
ruff format . && ruff check --fix . # Format and lint
make validate-plugin-schemas # Validate plugin registry
pre-commit run # Pre-commit on staged files
pre-commit run --all-files # Pre-commit on all files
make generate-all-docs # Regenerate CLI + env var docs
make generate-all-plugin-files # Regenerate plugin enums, overloads, schemasRun pre-commit after every code change, even before creating commits:
pre-commit run # Staged files only
pre-commit run --all-files # All files (recommended after significant changes)Hooks: check-ast, debug-statements, detect-private-key, check-added-large-files, check-case-conflict, check-merge-conflict, check-json, check-toml, check-yaml, end-of-file-fixer, trailing-whitespace, codespell, add-license, generate-cli-docs, generate-env-vars-docs, generate-plugin-artifacts, validate-plugin-schemas, test-imports, ruff (lint + format).
- Create class extending
BaseComponentServicewith@on_messagehandlers - Register in
plugins.yamlunderservicecategory withclass,description,metadata - Add message type to
common/enums/enums.pyif new messages needed - Create message class in
messages/withmessage_typefield - Validate with
aiperf plugins --validate
- Add enum value to
MessageTypeincommon/enums/enums.py - Create message class in
messages/inheriting fromMessagewithmessage_typefield set - Add
@on_message(MessageType.X)handler in the receiving service - Auto-subscription happens during
@on_initphase
- Create plugin class implementing the appropriate base
- Add entry to
plugins.yamlwithclass,description,metadata - Validate with
make validate-plugin-schemas - Use via
plugins.get_class(PluginType.X, 'name')
@pytest.mark.asynciofor async tests,@pytest.mark.parametrizefor data-drivenfrom tests.harness import mock_pluginfor plugin mocking- Name:
test_<function>_<scenario>_<expected>e.g.test_parse_config_missing_field_raises_error - Imports at file top, fixtures for setup, one focus per test
- Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset between tests
Feature branches use <username>/feature-name format, forked from main. One PR = one concern.
- SystemController uses
BaseService(notBaseComponentService) - it's the orchestrator. - Worker/TimingManager disable GC for latency - see
service_metadata.disable_gc. - macOS child processes close terminal FDs to prevent Textual UI corruption.
- Plugin priority resolves conflicts: higher wins, external beats built-in at equal priority.
- Decorators:
@on_init,@on_start,@on_stop,@on_message,@on_command,@background_task,@on_pull_message,@on_request. - Communication:
publish()for broadcast,@on_messageto subscribe,send_command_and_wait_for_response()for sync. AIPerfLifecycleMixinfor standalone components:CREATED->INITIALIZING->INITIALIZED->STARTING->RUNNING->STOPPING->STOPPED;FAILEDterminal.
- Review diff: all lines required?
ruff format . && ruff check --fix .uv run pytest tests/unit/ -n auto- Type hints on all functions
Field(description=...)on all Pydantic fieldsgit commit -s
CLAUDE.md, .github/copilot-instructions.md, and .cursor/rules/python.mdc must contain identical content (only headers/frontmatter differ). When updating one, update all three. Always diff them after editing to confirm sync.
When making changes, update the appropriate documentation files. When adding a new tutorial, also add it to README.md's tutorial index.
| Change type | Files to update |
|---|---|
| Architecture, components, data flow, communication | docs/architecture.md |
| Coding standards, build commands, new patterns | CLAUDE.md + .github/copilot-instructions.md + .cursor/rules/python.mdc |
| Code patterns, examples, base classes | docs/dev/patterns.md |
| CLI arguments or commands | docs/cli_options.md (auto-generated via make generate-cli-docs) |
| Environment variables | docs/environment_variables.md (auto-generated via make generate-env-vars-docs) |
| Metrics definitions or formulas | docs/metrics_reference.md |
| Plugin system, categories, creation | docs/plugins/plugin-system.md |
| Accuracy benchmarks, graders | docs/accuracy/ |
| Server metrics, schemas | docs/server_metrics/ |
| Benchmark modes, timing, traces | docs/benchmark_modes/ |
| Tokenizer, reference docs | docs/reference/ |
| Dataset synthesis API | docs/api/synthesis.md |
| Dev setup, make targets, pre-commit | CONTRIBUTING.md |
| Contribution process, DCO | CONTRIBUTING.md |
| New services, message types, plugin types | docs/architecture.md + docs/dev/patterns.md |
| Tutorials and feature guides | docs/tutorials/ + README.md tutorial index |
A feature is incomplete until documentation is updated.