doc: Documentation cleanup, refactor by arekay-nv · Pull Request #204 · mlcommons/endpoints

arekay-nv · 2026-03-24T23:35:36Z

What does this PR do?

Update the docs.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-03-24T23:35:45Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-24T23:36:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a major overhaul of the project's documentation, aiming to provide a more structured, comprehensive, and accessible knowledge base. The changes centralize development guidelines, introduce detailed design specifications for core components, and improve the overall navigation and clarity of the project's technical information. This refactor will significantly aid new contributors in onboarding and help all developers understand the system's architecture and component interactions more deeply.

Highlights

Comprehensive Documentation Refactor: Existing documentation files like AGENTS.md, CONTRIBUTING.md, README.md, CLI_QUICK_REFERENCE.md, DEVELOPMENT.md, GITHUB_SETUP.md, and LOCAL_TESTING.md have been significantly updated and streamlined for clarity and consistency.
Introduction of Component Design Specifications: A new set of detailed Design.md documents has been added for nearly every major component of the system, including async_utils, commands, config, core, dataset_manager, endpoint_client, evaluation, load_generator, metrics, openai, plugins, profiling, sglang, testing, and utils. These provide in-depth architectural and implementation details.
Enhanced Development Guide: The DEVELOPMENT.md file has been substantially expanded to cover development environment setup, project structure, testing, code quality standards (including pre-commit hooks), and a detailed pull request process, serving as a central resource for contributors.
Updated Examples and Architecture Overview: The examples/README.md now includes new example configurations, and the main README.md features a more structured documentation section with links to the new design specs and an updated architecture diagram, improving discoverability and understanding of the system's components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly updates and expands the project's documentation, introducing detailed design specifications for various components in new Design.md files, revising development setup and contributing guidelines, and clarifying CLI usage. Notable changes include a more comprehensive "Documentation" section in README.md with links to these new design specs, updated development environment setup in DEVELOPMENT.md to reflect new modules and pre-commit hook details, and refined CLI examples in CLI_QUICK_REFERENCE.md and LOCAL_TESTING.md. Review feedback indicates a potential documentation duplication for the "Endpoint client" component, suggesting a need to clarify or deprecate the older document, and also highlights an inconsistency in the CLI_QUICK_REFERENCE.md regarding the --report-dir option for the from-config subcommand, requiring clarification.

gemini-code-assist · 2026-03-24T23:39:24Z

README.md

+| ----------------- | ---------------------------------------------------------------- |
+| Core types        | [docs/core/Design.md](docs/core/Design.md)                       |
+| Load generator    | [docs/load_generator/Design.md](docs/load_generator/Design.md)   |
+| Endpoint client   | [docs/endpoint_client/Design.md](docs/endpoint_client/Design.md) |


This new table of component design specs is a great addition for navigating the project's architecture.

I noticed that a new high-level design doc docs/endpoint_client/Design.md is introduced for the "Endpoint client". However, there's also a very detailed, pre-existing design document at docs/ENDPOINT_CLIENT.md.

Having two design documents for the same component could lead to confusion and maintenance overhead. Is the intention to deprecate or remove docs/ENDPOINT_CLIENT.md in favor of the new structured documentation? If so, it might be good to either remove it as part of this PR or add a note at the top of the old file indicating that it is deprecated and pointing to the new documentation.

docs/CLI_QUICK_REFERENCE.md

Copilot

Pull request overview

This PR performs a broad documentation cleanup and restructuring, adding per-component “Design Spec” documents and refreshing contributor and usage guides to reflect the current repository structure and CLI workflows.

Changes:

Add new component-level design specs under docs/<component>/Design.md to describe architecture, responsibilities, and integration points.
Refresh contributor and usage documentation (local testing, development workflow, CLI reference, GitHub setup) to match current commands and repository links.
Expand/curate example listings and cross-link documentation from the root README.md.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
examples/README.md	Adds links/descriptions for new example directories.
docs/utils/Design.md	New utils design spec (currently contains several API/behavior mismatches vs code).
docs/testing/Design.md	New testing utilities design spec (echo/max/variable throughput servers).
docs/sglang/Design.md	New SGLang adapter design spec (currently documents non-existent adapter/accumulator methods).
docs/profiling/Design.md	New profiling design spec (line_profiler + pytest flag).
docs/plugins/Design.md	New plugin namespace design spec (currently overstates existing plugin registration API).
docs/openai/Design.md	New OpenAI adapter design spec (currently documents non-existent adapter/accumulator methods).
docs/metrics/Design.md	New metrics design spec (EventRecorder API signature section currently mismatches implementation).
docs/load_generator/Design.md	New load generator design spec (session/scheduler architecture).
docs/evaluation/Design.md	New evaluation design spec (accuracy scoring + LiveCodeBench).
docs/endpoint_client/Design.md	New endpoint client design spec (worker pool, ZMQ IPC, adapters).
docs/dataset_manager/Design.md	New dataset manager design spec (loader/transforms/presets; format inference list currently incomplete).
docs/core/Design.md	New core types design spec (currently mismatches actual `Query`/`QueryResult` struct fields).
docs/config/Design.md	New config design spec (YAML/CLI → RuntimeSettings, templates, rulesets).
docs/commands/Design.md	New commands layer design spec (CLI layout and command flow).
docs/async_utils/services/metrics_aggregator/Design.md	Adds a clearer one-line summary of the metrics aggregator service.
docs/async_utils/services/event_logger/Design.md	Adds a clearer one-line summary of the event logger service.
docs/async_utils/services/Design.md	Adds a clearer one-line summary of the pub/sub system design doc.
docs/async_utils/Design.md	New async_utils design spec (loop manager, ZMQ transport, pub/sub services).
docs/LOCAL_TESTING.md	Updates local testing instructions (dataset path, init syntax, default duration, supported formats list).
docs/GITHUB_SETUP.md	Updates GitHub workflow descriptions and branch protection checklist.
docs/ENDPOINT_CLIENT.md	Adds a short introductory summary line for the document.
docs/DEVELOPMENT.md	Major rewrite of development workflow guide (fork/upstream flow, tooling, formatting, links).
docs/CLI_QUICK_REFERENCE.md	Replaces intro text and adjusts examples/wording for config-driven usage.
README.md	Updates docs links, architecture diagram, LiveCodeBench link, and minor command snippets.
CONTRIBUTING.md	Adds pointer to `docs/DEVELOPMENT.md` for standards.
AGENTS.md	Simplifies setup section and updates repo structure/tooling references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/utils/Design.md

docs/plugins/Design.md

Copilot · 2026-03-24T23:41:55Z

docs/metrics/Design.md

+class EventRecorder:
+    LIVE: "EventRecorder | None"   # class variable; set on construction, cleared on close
+
+    def record_event(
+        self,
+        sample_uuid: str,
+        event_type: SampleEvent | SessionEvent,
+        data: dict | None = None,
+    ) -> None
+
+    def wait_for_writes(self, force_commit: bool = True) -> None
+    # Blocks until the background writer thread has flushed all queued events
+
+    @staticmethod
+    def db_path(session_id: str) -> Path
+```


The EventRecorder public interface in this doc doesn’t match the implementation. In metrics/recorder.py, record_event is a @classmethod with signature roughly (ev_type, timestamp_ns, sample_uuid=..., data=..., ...), not an instance method taking (sample_uuid, event_type, data). Please update the signature/parameter descriptions to avoid misleading callers.

docs/sglang/Design.md

docs/dataset_manager/Design.md

docs/utils/Design.md

Copilot · 2026-03-24T23:41:56Z

docs/core/Design.md

+### `Query`
+
+Represents a single inference request issued to an endpoint.
+
+| Field        | Type          | Description                                |
+| ------------ | ------------- | ------------------------------------------ |
+| `id`         | `str`         | UUID hex (32 chars) for result correlation |
+| `prompt`     | `str`         | Input text                                 |
+| `model`      | `str`         | Model identifier                           |
+| `max_tokens` | `int \| None` | Token cap override                         |
+| `stream`     | `bool`        | Whether to request streaming               |
+
+### `QueryResult`
+
+Represents a completed (success or failure) inference response.
+
+| Field        | Type                      | Description                      |
+| ------------ | ------------------------- | -------------------------------- |
+| `id`         | `str`                     | Matches originating `Query.id`   |
+| `output`     | `TextModelOutput \| None` | Response content                 |
+| `error`      | `str \| None`             | Error message on failure         |
+| `ttft_ns`    | `int \| None`             | Time to first token, nanoseconds |
+| `latency_ns` | `int`                     | End-to-end latency, nanoseconds  |
+


The Query/QueryResult field tables in this doc do not reflect the actual structs in core/types.py. For example, Query is {id, data, headers, created_at} (not prompt/model/max_tokens/stream), and QueryResult fields are {id, response_output, metadata, error, completed_at}. Please update the tables (and any examples) to match the current types so downstream component docs stay consistent.

docs/openai/Design.md

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/profiling/Design.md

docs/CLI_QUICK_REFERENCE.md

viraatc · 2026-03-25T02:31:29Z

docs/endpoint_client/Design.md

consider moving in ENDPOINT_CLIENT.md into this folder as the main design document,
and then consolidate Design.md to usage / args only?

(also CLIENT_PERFORMANCE_TUNING.md?, PERF_ARCHITECTURE.md?)

nvzhihanj

Review Council — Multi-AI Code Review

Reviewed by: Claude (Codex unavailable) | Depth: thorough

Found 9 issues across 5 files.

Note: 17 existing inline comments were already present. Overlapping issues excluded.

docs/load_generator/Design.md

docs/metrics/Design.md

docs/async_utils/Design.md

docs/load_generator/Design.md

docs/config/Design.md

AGENTS.md

nvzhihanj · 2026-03-30T04:26:44Z

Review Council — Multi-AI Code Review

Reviewed by: Claude (Codex unavailable — branch checkout failed) | Depth: thorough

Found 9 issues across 5 files. All verified against actual source code.

17 existing inline comments from a prior review were already present. Overlapping issues have been excluded from this review.

🔴 Must Fix (high) — 3 issues

Issues where the documentation will actively mislead readers about how the code works.

#	File	Line	Category	Summary
1	`docs/load_generator/Design.md`	130	bug	Event types table uses `SESSION_STARTED`/`SESSION_ENDED` — actual enums are `TEST_STARTED`/`TEST_ENDED`
2	`docs/metrics/Design.md`	131	bug	QPS formula references non-existent `SESSION_ENDED`/`SESSION_STARTED` events; actual uses `TEST_STARTED` and `STOP_PERFORMANCE_TRACKING`
3	`docs/metrics/Design.md`	72	api-contract	`MetricsReporter.__init__` signature wrong — actual params are `(connection_name, client_type)`, not `(db_path, runtime_settings, tokenizer)`

🟡 Should Fix (medium) — 4 issues

Real inaccuracies that could cause confusion under specific circumstances.

#	File	Line	Category	Summary
4	`docs/metrics/Design.md`	79	api-contract	`create_report()` returns `Report` not `BenchmarkReport`; takes `tokenizer` and `tpot_reporting_mode` params
5	`docs/metrics/Design.md`	91	api-contract	Metric classes use `REL_TOL` class var and `self.target`, not `tolerance` field or `target_qps`/`target_ms`/`max_ms` attributes
6	`docs/async_utils/Design.md`	72	api-contract	`EventPublisherService` requires `managed_zmq_context` arg on first construction — doc shows no-arg usage
7	`docs/load_generator/Design.md`	63	api-contract	`SampleIssuer` is an `ABC` (requires subclassing), not a `Protocol` (structural subtyping) — doc says "protocol"

🔵 Consider (low) — 2 issues

Valid improvements that could be follow-ups.

#	File	Line	Category	Summary
8	`docs/config/Design.md`	30	bug	`RuntimeSettings` table missing `metric_target: Metric` field
9	`AGENTS.md`	137	design	CLI row says `argparse-based` but project migrated to `cyclopts` in PR #193 — appears based on stale branch

🤖 Generated with Claude Code — Review Council

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/GITHUB_SETUP.md

docs/LOCAL_TESTING.md

docs/dataset_manager/Design.md

docs/testing/Design.md

docs/utils/Design.md

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

README.md

- Add Design.md specs for all 15 top-level components under src/inference_endpoint/ - Restructure AGENTS.md: move code style details to DEVELOPMENT.md, update component table with runner.py and async_utils services - Update README.md: add Component Design Specs table, use python3 in examples - Reformat DEVELOPMENT.md: remove emojis, add commit type list, exact-version pinning guidance - Update CLI_QUICK_REFERENCE.md, LOCAL_TESTING.md, ENDPOINT_CLIENT.md, GITHUB_SETUP.md for consistency - Fix stale references: pkl→jsonl throughout, CLIError for eval mode, dataset_manager Design.md reflects current supported formats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/LOCAL_TESTING.md

arekay-nv · 2026-04-02T01:20:47Z

Follow-up review

All 9 issues from the prior review have been addressed. Verified each against the current source code:

High (must fix) — all 3 fixed:

Event type names in docs/load_generator/Design.md now correctly use TEST_STARTED/TEST_ENDED
QPS formula in docs/metrics/Design.md now references TEST_STARTED/STOP_PERFORMANCE_TRACKING
MetricsReporter.__init__ signature now correctly documents (connection_name, client_type)

Medium (should fix) — all 4 fixed:

create_report() return type and params corrected
Metric classes now document REL_TOL class var and self.target
EventPublisherService constructor now shows required managed_zmq_context arg
SampleIssuer correctly described as ABC, not a structural Protocol

Low (consider) — both fixed:

RuntimeSettings table now includes metric_target: Metric
AGENTS.md CLI row correctly says cyclopts-based

The fix commits (554fd6f, 855d35d) also correct several other pre-existing inaccuracies (echo server latency claim, eval raising CLIError not NotImplementedError, GitHub Actions workflow display names) without introducing new issues.

No issues found in the current state of the PR.

🤖 Generated with Claude Code

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/LOCAL_TESTING.md

- Add msgspec-patterns and msgspec-struct-gc-check Claude/Cursor skills - Add design docs for services architecture and metrics aggregator - Update .gitignore for new artifact patterns - Update zmq_pubsub_demo.py for new transport API

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

docs/async_utils/services/Design.md:21

This section says in-process subscribers “connect to the publisher’s bind_address”. With the current ZMQ helpers, ZmqEventRecordSubscriber expects the socket name/path (e.g. publisher.bind_path / ev_pub_<uuid>), not the full ipc://... bind address; passing bind_address would result in an invalid constructed address because ManagedZMQContext.connect() prepends socket_dir again. Please update the docs to reference the socket name (bind_path) for in-process subscribers, and reserve bind_address for cases where subscribers connect without ManagedZMQContext address construction.
docs/async_utils/services/Design.md:306
The event_logger lifecycle description claims error events after session.ended are still written. The current EventLoggerService.process() stops writing all subsequent records once _shutdown_received is set, regardless of event type (src/inference_endpoint/async_utils/services/event_logger/__main__.py, process() method). Either update this doc to match current behavior, or adjust EventLoggerService to continue writing ErrorEventType.* records after ENDED if that behavior is intended.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/zmq_pubsub_demo.py

docs/CLI_DESIGN.md

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

docs/async_utils/services/Design.md:22

This section says in-process subscribers connect using the publisher's bind_address, but the ZMQ APIs take the socket name (publisher bind_path, e.g. ev_pub_<uuid>) and ManagedZMQContext.connect() constructs the full ipc://... address. Using bind_address directly would not work with connect() as implemented. Suggest updating wording to match the actual API (path=publisher.bind_path).
docs/async_utils/services/Design.md:306
Doc/code mismatch: this claims the event logger continues writing error events after session.ended, but EventLoggerService.process() currently drops all subsequent records once _shutdown_received is set (see src/inference_endpoint/async_utils/services/event_logger/__main__.py, where it continues when _shutdown_received). Either update the implementation to allow ErrorEventType through after ENDED, or adjust this doc to reflect current behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T03:39:43Z

scripts/zmq_pubsub_demo.py

        EventRecord(
            event_type=SampleEventType.COMPLETE,
            timestamp_ns=10211,
            sample_uuid=uuid1,
-            data={"tokens": 50},
+            data="Hello world",
        ),
        EventRecord(
            event_type=SampleEventType.RECV_NON_FIRST,
            timestamp_ns=10214,
            sample_uuid=uuid2,
        ),
        EventRecord(
            event_type=SampleEventType.RECV_NON_FIRST,
            timestamp_ns=10217,
            sample_uuid=uuid2,
        ),
        EventRecord(
            event_type=SampleEventType.RECV_NON_FIRST,
            timestamp_ns=10219,
            sample_uuid=uuid2,
        ),
        EventRecord(
            event_type=SampleEventType.COMPLETE,
            timestamp_ns=10219,
            sample_uuid=uuid2,
-            data={"tokens": 75},
+            data="Sample output for uuid2",
        ),


EventRecord.data is typed as TextModelOutput | PromptData | ErrorData | None (see core.record.EventRecord), so publishing a plain str here will fail msgspec validation/encoding at runtime. Wrap outputs in TextModelOutput(output=...) (or set data=None if unused) so the demo can actually publish COMPLETE events.

Copilot · 2026-04-03T03:39:43Z

scripts/zmq_pubsub_demo.py

+        # Start event_logger as a subprocess (logs to both JSONL and SQL under event_log_dir).
+        event_logger_cmd = [
+            sys.executable,
+            "-m",
+            "inference_endpoint.async_utils.services.event_logger",
+            "--log-dir",
+            str(event_log_dir),
+            "--socket-dir",
+            str(zmq_ctx.socket_dir),
+            "--socket-name",
+            str(connect_path),
+            "--writers",
+            "jsonl",
+            "sql",
+        ]


The demo always starts event_logger with --writers jsonl sql, but the SQL writer is only available when sqlalchemy is installed (the service omits the sql choice otherwise and will exit immediately with an argparse error). Consider detecting sqlalchemy availability in this script (or falling back to JSONL-only) so the demo works with a base install.

Copilot · 2026-04-03T03:39:44Z

scripts/zmq_pubsub_demo.py

+            try:
+                event_logger_proc.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                logger.warning("event_logger did not exit within 5s, terminating")
+                event_logger_proc.terminate()
+                event_logger_proc.wait(timeout=2)
        except KeyboardInterrupt:
            logger.info("Interrupted by user")
        finally:
            logger.info("Cleaning up (closing subscribers)...")
            console_sub.close()
-            file_sub.close()
            duration_sub.close()
+            if event_logger_proc.poll() is None:
+                event_logger_proc.terminate()
+                event_logger_proc.wait(timeout=2)
+


Process termination handling can raise subprocess.TimeoutExpired and break cleanup: after terminate(), both event_logger_proc.wait(timeout=2) calls are unguarded. If the subprocess ignores SIGTERM (or is stuck), the demo will raise and exit before closing subscribers / printing verification. Catch the second timeout and escalate to kill() (and/or continue cleanup) to keep teardown reliable.

arekay-nv · 2026-04-03T03:47:23Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly expands the project's documentation by adding detailed design specifications for all core components, including the load generator, metrics aggregator, and endpoint client. It also introduces new AI agent skills for msgspec optimization and garbage collection safety. Review feedback identified a missing required argument in a documentation code example and a missing field in the metrics aggregator's data model description.

gemini-code-assist · 2026-04-03T03:49:05Z

docs/DEVELOPMENT.md

-        query = Query(prompt="Test", model="test-model")
-        assert query.prompt == "Test"
-        assert query.model == "test-model"
+        query = Query(data={"prompt": "Test", "model": "test-model"})


The Query instantiation in this example is missing the headers argument. According to the core types design specification in docs/core/Design.md, headers is a required field of type dict[str, str]. Omitting it will result in a TypeError during execution.

Suggested change

query = Query(data={"prompt": "Test", "model": "test-model"})

query = Query(data={"prompt": "Test", "model": "test-model"}, headers={})

gemini-code-assist · 2026-04-03T03:49:05Z

docs/async_utils/services/metrics_aggregator/Design.md

+SampleRow
+├── sample_uuid: str
+├── issued_ns: int | None           ← set on ISSUED
+├── complete_ns: int | None          ← set on COMPLETE
+├── recv_first_ns: int | None        ← set on RECV_FIRST
+├── last_recv_ns: int | None         ← set on RECV_FIRST, updated on each RECV_NON_FIRST
+├── client_send_ns: int | None       ← set on CLIENT_SEND
+├── client_resp_done_ns: int | None  ← set on CLIENT_RESP_DONE
+├── prompt_text: str | None          ← from ISSUED event data (for ISL tokenization)
+├── first_chunk_text: str | None     ← from RECV_FIRST event data (for TPOT denominator)
+└── output_chunks: list[str]         ← accumulated from RECV_FIRST/RECV_NON_FIRST data


The SampleRow struct definition is missing the first_chunk_tokens (or first_chunk_token_count) field. This field is required for the tpot_ns calculation described in the formula on line 143 and used in the event dispatch logic on line 184. Without storing this count, the aggregator cannot correctly compute the TPOT denominator for streaming responses.

Suggested change

SampleRow

├── sample_uuid: str

├── issued_ns: int | None ← set on ISSUED

├── complete_ns: int | None ← set on COMPLETE

├── recv_first_ns: int | None ← set on RECV_FIRST

├── last_recv_ns: int | None ← set on RECV_FIRST, updated on each RECV_NON_FIRST

├── client_send_ns: int | None ← set on CLIENT_SEND

├── client_resp_done_ns: int | None ← set on CLIENT_RESP_DONE

├── prompt_text: str | None ← from ISSUED event data (for ISL tokenization)

├── first_chunk_text: str | None ← from RECV_FIRST event data (for TPOT denominator)

└── output_chunks: list[str] ← accumulated from RECV_FIRST/RECV_NON_FIRST data

SampleRow

├── sample_uuid: str

├── issued_ns: int | None ← set on ISSUED

├── complete_ns: int | None ← set on COMPLETE

├── recv_first_ns: int | None ← set on RECV_FIRST

├── last_recv_ns: int | None ← set on RECV_FIRST, updated on each RECV_NON_FIRST

├── client_send_ns: int | None ← set on CLIENT_SEND

├── client_resp_done_ns: int | None ← set on CLIENT_RESP_DONE

├── prompt_text: str | None ← from ISSUED event data (for ISL tokenization)

├── first_chunk_text: str | None ← from RECV_FIRST event data (for TPOT denominator)

├── first_chunk_tokens: int | None ← token count of first_chunk_text

└── output_chunks: list[str] ← accumulated from RECV_FIRST/RECV_NON_FIRST data

arekay-nv requested a review from a team as a code owner March 24, 2026 23:35

Copilot AI review requested due to automatic review settings March 24, 2026 23:35

github-actions bot requested a review from nvzhihanj March 24, 2026 23:35

Copilot started reviewing on behalf of arekay-nv March 24, 2026 23:36 View session

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

Copilot AI reviewed Mar 24, 2026

View reviewed changes

arekay-nv force-pushed the arekay/doc-refactor branch from 4ddf686 to c5e2ed6 Compare March 25, 2026 00:20

arekay-nv requested a review from Copilot March 25, 2026 00:35

Copilot started reviewing on behalf of arekay-nv March 25, 2026 00:35 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

docs/profiling/Design.md Outdated Show resolved Hide resolved

docs/profiling/Design.md Outdated Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

viraatc reviewed Mar 25, 2026

View reviewed changes

nvzhihanj reviewed Mar 30, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 30, 2026 18:18

arekay-nv force-pushed the arekay/doc-refactor branch from 0db4683 to 7196894 Compare March 30, 2026 18:18

Copilot started reviewing on behalf of arekay-nv March 30, 2026 18:18 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

docs/GITHUB_SETUP.md Outdated Show resolved Hide resolved

docs/LOCAL_TESTING.md Outdated Show resolved Hide resolved

docs/dataset_manager/Design.md Show resolved Hide resolved

docs/testing/Design.md Outdated Show resolved Hide resolved

docs/utils/Design.md Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 31, 2026 14:50

Copilot started reviewing on behalf of arekay-nv March 31, 2026 14:51 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

README.md Show resolved Hide resolved

arekay-nv force-pushed the arekay/doc-refactor branch from 465f559 to 9a7697b Compare April 2, 2026 00:37

Merge branch 'main' into arekay/doc-refactor

6b7c1dc

arekay-nv requested a review from Copilot April 2, 2026 00:40

Copilot started reviewing on behalf of arekay-nv April 2, 2026 00:40 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

arekay-nv added 2 commits April 1, 2026 20:06

More fixes.

554fd6f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Consolidate

855d35d

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 2, 2026 01:13

Copilot started reviewing on behalf of arekay-nv April 2, 2026 01:13 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

docs/LOCAL_TESTING.md Outdated Show resolved Hide resolved

arekay-nv added 2 commits April 1, 2026 21:13

events +stream fix

b290b94

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Fix error msg

3458697

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 2, 2026 02:25

Copilot started reviewing on behalf of arekay-nv April 2, 2026 02:25 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

docs/LOCAL_TESTING.md Outdated Show resolved Hide resolved

nv-alicheng and others added 2 commits April 2, 2026 19:48

chore: add skills, design docs, and tooling config

345d040

- Add msgspec-patterns and msgspec-struct-gc-check Claude/Cursor skills - Add design docs for services architecture and metrics aggregator - Update .gitignore for new artifact patterns - Update zmq_pubsub_demo.py for new transport API

Merge

9b186ba

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from Copilot April 3, 2026 01:07

Copilot started reviewing on behalf of arekay-nv April 3, 2026 01:08 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

scripts/zmq_pubsub_demo.py Show resolved Hide resolved

docs/CLI_DESIGN.md Show resolved Hide resolved

arekay-nv added 2 commits April 2, 2026 22:24

Minor fixes.

cd22a8c

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

dup fix.

53ecd26

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 3, 2026 03:34

Copilot started reviewing on behalf of arekay-nv April 3, 2026 03:34 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

arekay-nv requested review from nvzhihanj and viraatc April 3, 2026 21:42

Merge branch 'main' into arekay/doc-refactor

e849839

	query = Query(data={"prompt": "Test", "model": "test-model"})
	query = Query(data={"prompt": "Test", "model": "test-model"}, headers={})

Conversation

arekay-nv commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viraatc Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

nvzhihanj left a comment

Choose a reason for hiding this comment

Review Council — Multi-AI Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvzhihanj commented Mar 30, 2026

Review Council — Multi-AI Code Review

🔴 Must Fix (high) — 3 issues

🟡 Should Fix (medium) — 4 issues

🔵 Consider (low) — 2 issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

arekay-nv commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading