Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .claude/skills/msgspec-patterns/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
name: msgspec-patterns
description: Reference guide for msgspec.Struct usage patterns and performance tips. Use when writing or reviewing code that defines msgspec Structs, encodes/decodes data, or needs performance optimization for serialization.
user-invocable: false
---

## Use Structs for Structured Data

Always prefer `msgspec.Struct` over `dict`, `dataclasses`, or `attrs` for structured data with a known schema. Structs are 5-60x faster for common operations.

## Struct Configuration Options

| Option | Description | Default |
| ----------------------- | --------------------------------------------- | ------- |
| `omit_defaults` | Omit fields with default values when encoding | `False` |
| `forbid_unknown_fields` | Error on unknown fields when decoding | `False` |
| `frozen` | Make instances immutable and hashable | `False` |
| `kw_only` | Make all fields keyword-only | `False` |
| `tag` | Enable tagged union support | `None` |
| `array_like` | Encode/decode as arrays instead of objects | `False` |
| `gc` | Enable garbage collector tracking | `True` |

## Omit Default Values

Set `omit_defaults=True` when default values are known on both encoding and decoding ends. Reduces encoded message size and improves performance.

```python
class Config(msgspec.Struct, omit_defaults=True):
host: str = "localhost"
port: int = 8080
```

## Avoid Decoding Unused Fields

Define smaller "view" Struct types that only contain the fields you actually need. msgspec skips decoding fields not defined in your Struct.

## Use `encode_into` for Buffer Reuse

In hot loops, use `Encoder.encode_into()` with a pre-allocated `bytearray` instead of `encode()`. Always measure before adopting.

```python
encoder = msgspec.json.Encoder()
buffer = bytearray(1024)
n = encoder.encode_into(msg, buffer)
socket.sendall(memoryview(buffer)[:n])
```

## Use MessagePack for Internal APIs

`msgspec.msgpack` is more compact and can be more performant than `msgspec.json` for internal service communication.

## gc=False

Set `gc=False` on Struct types that will never participate in reference cycles. Reduces GC overhead by up to 75x and saves 16 bytes per instance. See the `msgspec-struct-gc-check` skill for the full safety analysis.

## array_like=True

Set `array_like=True` when both ends know the field schema. Encodes structs as arrays instead of objects, removing field names from the message.

```python
class Point(msgspec.Struct, array_like=True):
x: float
y: float
# Encodes as [1.0, 2.0] instead of {"x": 1.0, "y": 2.0}
```

## Tagged Unions

Use `tag=True` on Struct types when handling multiple message types in a single union for efficient type discrimination during decoding.

```python
class GetRequest(msgspec.Struct, tag=True):
key: str

class PutRequest(msgspec.Struct, tag=True):
key: str
value: str

Request = GetRequest | PutRequest
decoder = msgspec.msgpack.Decoder(Request)
```

## NDJSON with encode_into

For line-delimited JSON, use `encode_into()` with `buffer.extend()` to avoid copies:

```python
encoder = msgspec.json.Encoder()
buffer = bytearray(64)
n = encoder.encode_into(msg, buffer)
file.write(memoryview(buffer)[:n])
file.write(b"\n")
```
76 changes: 76 additions & 0 deletions .claude/skills/msgspec-struct-gc-check/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
name: msgspec-struct-gc-check
description: Check whether msgspec.Struct types can safely use gc=False. Use when adding or changing msgspec.Struct definitions, or when reviewing code that uses msgspec structs.
allowed-tools: Read, Grep, Glob
---

# msgspec.Struct gc=False Safety Check

## When to use this skill

- Adding or modifying a class that inherits from `msgspec.Struct`
- Reviewing or refactoring code that defines or uses msgspec structs
- Deciding whether to add or remove `gc=False` on a Struct

## Why gc=False matters

Setting `gc=False` on a Struct means instances are **never tracked** by Python's garbage collector. This reduces GC pressure and can improve performance when many structs are allocated. The **only** risk: if a **reference cycle** involves only gc=False structs (or objects not tracked by GC), that cycle will **never be collected** (memory leak).

## Verified safety constraints

All must hold for gc=False to be safe.

### 1. No reference cycles

- The struct (and any container it references) must never be part of a reference cycle.
- **Multiple variables** pointing to the same struct (`x = s; y = x`) are **safe** — that is not a cycle.
- **Returning** a struct from a function is **safe**. What matters is whether any reference path leads back to the struct (e.g. struct's list contains the struct or something that holds the struct).

### 2. No mutation that could create cycles

- **Do not mutate** struct fields after construction in a way that could introduce a cycle.
- **Frozen structs** (`frozen=True`) prevent field reassignment; `force_setattr` in `__post_init__` is one-time init only, so that's acceptable.
- Assigning **scalars** (int, str, bool, float, None) to fields is safe — they cannot form cycles.

### 3. Mutable containers (list, dict, set) on the struct

- If the struct has list/dict/set fields, either:
- **Never mutate** those containers after creation, and never store in them any object that references the struct, or
- Do not use `gc=False` (conservative).
- **Reading** from containers does not create cycles and is allowed.

### 4. Nested structs

- If a struct holds another Struct, the same rules apply to the whole reference graph: no cycles, no mutation that could create cycles.

### 5. Generic / mixins

- With `gc=False`, the type must be compatible with `__slots__` (e.g. if using `Generic`, the mixin must define `__slots__ = ()`).

## Quick per-struct analysis steps

1. List all fields and their types (scalars vs containers vs nested Structs).
2. Search the codebase for: assignments to this struct's fields, mutations of its container fields (`.append`, `.update`, etc.), and any place the struct instance is stored.
3. If only scalars or immutable types, or frozen with no container mutation -> likely safe for gc=False.
4. If mutable containers and they're never mutated (and never made to reference the struct) -> likely safe; otherwise -> do not use gc=False.

## Risky structs: audit and at-risk comment

A struct is **risky** for gc=False if it has a condition that would normally disallow gc=False (e.g. mutable list/dict/set fields), but that condition might never arise in practice.

### When audit passes

- Set `gc=False` on the struct.
- Add an **at-risk comment** above the class:

`# gc=False: audit YYYY-MM: <condition> is only read, never mutated.`

- Add a docstring note:

`AT-RISK (gc=False): Has <brief condition>. Any change that <what would violate safety> must be audited; if so, remove gc=False.`

### When touching an at-risk struct

1. Re-run the audit for that struct.
2. If your change mutates the at-risk field(s) or creates a cycle, **remove** `gc=False`.
3. If your change does not touch the at-risk field, the existing gc=False remains; you may update the audit date.
119 changes: 119 additions & 0 deletions .cursor/skills/msgspec-struct-gc-check/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
name: msgspec-struct-gc-check
description: Check whether msgspec.Struct types can safely use gc=False. Use when adding or changing msgspec.Struct definitions, or when reviewing code that uses msgspec structs.
---

# msgspec.Struct gc=False Safety Check

## When to use this skill

- Adding or modifying a class that inherits from `msgspec.Struct`
- Reviewing or refactoring code that defines or uses msgspec structs
- Deciding whether to add or remove `gc=False` on a Struct

## Why gc=False matters

Setting `gc=False` on a Struct means instances are **never tracked** by Python's garbage collector. This reduces GC pressure and can improve performance when many structs are allocated. The **only** risk: if a **reference cycle** involves only gc=False structs (or objects not tracked by GC), that cycle will **never be collected** (memory leak).

Reference: [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc).

## Verified safety constraints

Use these constraints to decide if a Struct can use `gc=False`. All must hold.

### 1. No reference cycles

- The struct (and any container it references) must never be part of a reference cycle.
- **Multiple variables** pointing to the same struct (`x = s; y = x`) are **safe** — that is not a cycle. A cycle is A → B → … → A.
- **Returning** a struct from a function is **safe**. What matters is whether any reference path leads back to the struct (e.g. struct’s list contains the struct or something that holds the struct).

### 2. No mutation that could create cycles

- **Do not mutate** struct fields after construction in a way that could introduce a cycle (e.g. set a field to an object that references the struct, or append the struct to its own list/dict).
- **Frozen structs** (`frozen=True`) prevent field reassignment; `force_setattr` in `__post_init__` is one-time init only, so that’s acceptable.
- Assigning **scalars** (int, str, bool, float, None) to fields is safe — they cannot form cycles.

### 3. Mutable containers (list, dict, set) on the struct

- If the struct has list/dict/set fields, either:
- **Never mutate** those containers after creation (no `.append`, `.update`, `[...] = ...`, etc.), and never store in them any object that references the struct, or
- Do not use `gc=False` (conservative).
- **Reading** from containers (e.g. `x = struct.foobars[i]`) does not create cycles and is allowed.

### 4. Nested structs

- If a struct holds another Struct (or holds containers that hold Structs), the same rules apply to the whole reference graph: no cycles, no mutation that could create cycles. If any nested Struct uses `gc=False`, the whole graph must still be cycle-free.

### 5. Generic / mixins

- With `gc=False`, the type must be compatible with `__slots__` (e.g. if using `Generic`, the mixin must define `__slots__ = ()`). See msgspec issue #631 / PR #635.

## Checklist for “can use gc=False”

- [ ] Struct and everything it references can never participate in a reference cycle.
- [ ] No mutation of struct fields after construction that could introduce a cycle (frozen or init-only mutation is ok; scalar assignment is ok).
- [ ] Any list/dict/set fields are never mutated after creation, or we do not use gc=False.
- [ ] No storing the struct (or anything that references it) inside its own container fields.
- [ ] If Generic/mixins are used, `__slots__` compatibility is satisfied.

## Checklist for “must NOT use gc=False”

- [ ] Struct is mutated after creation in a way that could create a cycle (e.g. appending self to a list field).
- [ ] Container fields are mutated after creation and could hold the struct or back-references.
- [ ] Struct is used in a pattern where it’s stored in a container that the struct (or its fields) also references.

## Quick per-struct analysis steps

1. List all fields and their types (scalars vs containers vs nested Structs).
2. Search the codebase for: assignments to this struct’s fields, mutations of its container fields (`.append`, `.update`, etc.), and any place the struct instance is stored (e.g. in a list/dict that might be referenced by the struct).
3. If only scalars or immutable types, or frozen with no container mutation → likely safe for gc=False.
4. If mutable containers and they’re never mutated (and never made to reference the struct) → likely safe; otherwise → do not use gc=False.

## Risky structs: audit and at-risk comment

A struct is **risky** for gc=False if it has a condition that would normally disallow gc=False (e.g. mutable list/dict/set fields), but that condition might never arise in practice (e.g. the field is only ever read, never mutated after construction).

### Auditing a risky struct

1. Identify the at-risk condition (e.g. "has `metadata: dict` that could be mutated").
2. Search the codebase for all uses of that struct and of the at-risk field:
- Any assignment to the field: `obj.field = ...`, `obj.field[key] = ...`, `obj.field.append(...)`, `obj.field.update(...)`, etc.
- Any code path that could store the struct (or something holding it) inside that container.
3. If the audit finds **no** such mutation or cycle-creating storage, the condition never arises and gc=False is acceptable **provided** you add the at-risk marker so future changes are re-audited.

### When audit passes

- Set `gc=False` on the struct.
- Add an **at-risk comment** and docstring note:

- **Above the class**: a short comment stating why gc=False is used despite the at-risk condition, and when the audit was done (e.g. `# gc=False: audit YYYY-MM: <condition> is only read, never mutated.`).
- **In the docstring**: a line that signals to future readers and to this skill that changes touching this struct must be re-audited. Use this format:

`AT-RISK (gc=False): Has <brief condition>. Any change that <what would violate safety> must be audited; if so, remove gc=False.`

- Example (for a struct with a `metadata` dict that is only ever read):

```python
# gc=False: audit 2026-03: metadata dict is only ever read, never mutated after construction.
class QueryResult(msgspec.Struct, ..., gc=False):
"""Result of a completed inference query.

AT-RISK (gc=False): Has mutable container field `metadata`. Any change that
mutates `metadata` after construction or stores this struct in a container
referenced by this struct must be audited; if so, remove gc=False.
...
```

### When touching an at-risk struct

If you are adding or changing code that uses a struct marked AT-RISK (gc=False):

1. Re-run the audit for that struct (searches above).
2. If your change mutates the at-risk field(s) or creates a cycle (e.g. stores the struct in its own container), **remove** `gc=False` from the struct and remove the at-risk comment/docstring line.
3. If your change does not touch the at-risk field or create cycles, the existing gc=False and at-risk comment remain; you may add a short note in the at-risk comment if the audit was re-checked (e.g. update the audit date).

## References

- [msgspec Structs – Disabling Garbage Collection](https://jcristharif.com/msgspec/structs.html#struct-gc)
- [msgspec Performance Tips – Use gc=False](https://jcristharif.com/msgspec/perf-tips.html#use-gc-false)
- [msgspec #631 – Generic structs and gc=False](https://github.com/jcrist/msgspec/issues/631)
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -189,5 +189,10 @@ outputs/
# Example vLLM virtualenv
examples/03_BenchmarkComparison/vllm_venv/

# Cursor artifacts (local development only)
# Agent artifacts (local development only)
.cursor_artifacts/
.claude/agent-memory/

# User-specific local rules (local Docker dev); do not commit
.cursor/rules/local-docker-dev.mdc
CLAUDE.local.md
33 changes: 9 additions & 24 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@ High-performance benchmarking tool for LLM inference endpoints targeting 50k+ QP
## Common Commands

```bash
# Development setup
python3.12 -m venv venv && source venv/bin/activate
pip install -e ".[dev,test]"
pre-commit install
# Development setup — see docs/DEVELOPMENT.md for full instructions

# Testing
pytest # All tests (excludes slow/performance)
Expand Down Expand Up @@ -73,7 +70,7 @@ CLI is auto-generated from `config/schema.py` Pydantic models via cyclopts. Fiel

- **CLI mode** (`offline`/`online`): cyclopts constructs `OfflineBenchmarkConfig`/`OnlineBenchmarkConfig` (subclasses in `config/schema.py`) directly from CLI args. Type locked via `Literal`. `--dataset` is repeatable with TOML-style format `[perf|acc:]<path>[,key=value...]` (e.g. `--dataset data.csv,samples=500,parser.prompt=article`). Full accuracy support via `accuracy_config.eval_method=pass_at_1` etc.
- **YAML mode** (`from-config`): `BenchmarkConfig.from_yaml_file()` loads YAML, resolves env vars, and auto-selects the right subclass via Pydantic discriminated union. Optional `--timeout`/`--mode` overrides via `config.with_updates()`.
- **eval**: Not yet implemented (raises `NotImplementedError`)
- **eval**: Not yet implemented (raises `CLIError` with a tracking issue link)

### Config Construction & Validation

Expand Down Expand Up @@ -137,7 +134,11 @@ src/inference_endpoint/
│ └── utils.py # Port range helpers
├── async_utils/
│ ├── loop_manager.py # LoopManager (uvloop + eager_task_factory)
│ ├── runner.py # run_async() — uvloop + eager_task_factory entry point for CLI commands
│ ├── event_publisher.py # Async event pub/sub
│ ├── services/
│ │ ├── event_logger/ # EventLoggerService: writes EventRecords to JSONL/SQLite
│ │ └── metrics_aggregator/ # MetricsAggregatorService: real-time metrics (TTFT, TPOT, ISL, OSL)
│ └── transport/ # ZMQ-based IPC transport layer
│ ├── protocol.py # Transport protocols + TransportConfig base
│ ├── record.py # Transport records
Expand Down Expand Up @@ -192,25 +193,9 @@ tests/

## Development Standards

### Code Style
### Code Style and Pre-commit Hooks

- **Formatter/Linter**: `ruff` (line-length 88, target Python 3.12)
- **Type checking**: `mypy` (via pre-commit)
- **Formatting**: `ruff-format` (double quotes, space indent)
- **License headers**: Required on all Python files (enforced by pre-commit hook `scripts/add_license_header.py`)
- **Conventional commits**: `feat:`, `fix:`, `docs:`, `test:`, `chore:`

### Pre-commit Hooks

All of these run automatically on commit:

- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
- `ruff` (lint + autofix) and `ruff-format`
- `mypy` type checking
- `prettier` for YAML/JSON/Markdown
- License header enforcement

**Always run `pre-commit run --all-files` before committing.**
See [Development Guide](docs/DEVELOPMENT.md) for formatting, linting, and pre-commit hook details.

### Data Types & Serialization

Expand Down Expand Up @@ -291,7 +276,7 @@ Update AGENTS.md as part of any PR that includes a **significant refactor**, mea
- **Added or removed CLI commands/subcommands** — update CLI Modes and Common Commands
- **Changed test infrastructure** (new fixtures, changed markers, new test directories) — update Testing section
- **Added or removed key dependencies** — update Key Dependencies table
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update Code Style and Pre-commit Hooks
- **Changed build/tooling** (new pre-commit hooks, changed ruff config, new CI steps) — update [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)
- **Changed hot-path patterns** (new transport, changed serialization, new performance constraints) — update Performance Guidelines

### How to Update
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ Generally we encourage people to become MLCommons members if they wish to contri
Regardless of whether you are a member, your organization (or you as an individual contributor) needs to sign the MLCommons Contributor License Agreement (CLA). Please submit your GitHub username to the [MLCommons Subscription form](https://mlcommons.org/community/subscribe/) to start that process.

MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your pull requests.

For project-specific development standards (code style, test requirements, pre-commit hooks, commit format), see the [Development Guide](docs/DEVELOPMENT.md).
Loading
Loading