[feat] Initial warmup with random dataset by arekay-nv · Pull Request #170 · mlcommons/endpoints

arekay-nv · 2026-03-13T21:53:39Z

What does this PR do?

Adds a random dataset backed warmup stage.
Closed #86

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-03-13T21:53:47Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-13T21:54:01Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a crucial warmup phase into the benchmarking process. This phase is designed to mitigate 'cold start' effects by sending a series of randomly generated requests to the target endpoint before actual performance measurements begin. By doing so, it ensures that the system is in a steady state, with connections established and caches populated, leading to more accurate and representative benchmark results. The warmup parameters are fully configurable, and its activities are isolated from the final performance metrics.

Highlights

Warmup Phase Introduction: Introduced a new configurable 'warmup' phase to benchmarks, which issues randomly generated requests to prime the endpoint before the timed performance test begins. This helps establish TCP connections, fill KV caches, and trigger JIT compilation.
Metrics Exclusion: Ensured that all samples issued during the warmup phase are explicitly excluded from reported performance metrics (throughput, latency, TTFT, TPOT) by recording them before the 'TEST_STARTED' event.
Configuration Schema: Added a new WarmupConfig schema to define parameters for the warmup phase, including num_samples, input_seq_length, output_seq_length, range_ratio, and random_seed.
Sequential Sample Ordering: Implemented a SequentialSampleOrder to ensure warmup requests are issued in a predictable, non-random sequence, which is useful for controlled priming.
New Example and Documentation: Provided a new example (09_Warmup_Example) with detailed documentation (README.md) and configuration files (warmup_offline.yaml, warmup_online.yaml) demonstrating how to use and tune the warmup feature.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/04_GPTOSS120B_Example/Readme.md
- Updated code block syntax to specify bash or python.
- Changed MODEL_NAME from gpt-oss-120b to gpt-oss-20b.
- Adjusted SGLang launch command port from 3000 to 30000.
- Added instructions for activating the virtual environment before launching benchmarks.
examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml
- Updated benchmark name from gpt-oss-120b-benchmark to gpt-oss-20b-benchmark.
- Changed model name from openai/gpt-oss-120b to openai/gpt-oss-20b.
- Updated endpoint from http://localhost:3000 to http://localhost:8000.
- Adjusted report directory name from gptoss_120b_benchmark_mlperf to gptoss_20b_benchmark_mlperf.
examples/04_GPTOSS120B_Example/sglang_gptoss_120b_example.yaml
- Updated benchmark name from gpt-oss-120b-benchmark to gpt-oss-20b-sglang-benchmark.
- Changed model name from openai/gpt-oss-120b to openai/gpt-oss-20b.
- Adjusted report directory name from sglang_gptoss_120b_benchmark_full to sglang_gptoss_20b_benchmark_full.
examples/09_Warmup_Example/README.md
- Added a new README file detailing the warmup phase feature, its purpose, configuration, and usage examples for both echo server and real endpoints.
examples/09_Warmup_Example/warmup_offline.yaml
- Added a new YAML configuration file for an offline throughput benchmark with a warmup phase.
examples/09_Warmup_Example/warmup_online.yaml
- Added a new YAML configuration file for an online (sustained QPS) benchmark with a warmup phase.
src/inference_endpoint/commands/benchmark.py
- Added logic to construct a warmup dataset using RandomDataset.generate if a warmup configuration is present.
- Integrated the warmup_dataset into the _run_benchmark function call.
src/inference_endpoint/config/schema.py
- Defined a new WarmupConfig class to specify warmup parameters.
- Added an optional warmup field of type WarmupConfig to the BenchmarkConfig.
src/inference_endpoint/load_generator/init.py
- Exported the new SequentialSampleOrder class.
src/inference_endpoint/load_generator/scheduler.py
- Added a SequentialSampleOrder class for issuing samples in their natural order without randomness.
src/inference_endpoint/load_generator/session.py
- Imported LoadPattern, LoadPatternType, and SequentialSampleOrder.
- Modified the _run_test method to accept and execute a warmup_generator before the main performance test.
- Added a warmup_dataset parameter to the start method.
- Implemented the creation of a warmup_generator using MaxThroughputScheduler and SequentialSampleOrder.
src/inference_endpoint/metrics/reporter.py
- Added a test_started_timestamp_ns cached property to retrieve the timestamp of the TEST_STARTED event.
- Updated derive_TTFT, derive_sample_latency, get_sample_statuses, and get_sample_outputs methods to filter events based on test_started_timestamp_ns, ensuring warmup samples are excluded from metrics.
- Refactored get_test_started_at to use the new test_started_timestamp_ns property.
tests/integration/commands/test_warmup.py
- Added new integration tests to verify the warmup phase functionality for both offline and online benchmarks against an echo server.
tests/unit/load_generator/test_session.py
- Added a unit test to confirm that the warmup phase correctly utilizes SequentialSampleOrder.

Activity

The pull request was created with a 'WIP' (Work In Progress) tag, indicating that the initial implementation of the warmup feature is being introduced.
No specific reviewer comments or further progress updates were provided in the context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a warmup feature to the benchmarking tool. The implementation adds a configurable warmup stage that runs before the timed performance test, using a randomly generated dataset to prime the inference endpoint. The changes are well-structured, spanning configuration schema, benchmark orchestration, and metrics reporting to correctly exclude warmup samples. The PR also includes clear documentation and tests for the new feature. I have one suggestion to enhance the configuration validation for better robustness.

src/inference_endpoint/config/schema.py

Copilot

Pull request overview

Adds an initial “warmup” phase to the benchmark flow by generating a random-token dataset, issuing it before TEST_STARTED, and filtering metrics to exclude warmup-issued samples.

Changes:

Add warmup configuration (warmup:) and generate/load a RandomDataset for warmup in the benchmark command.
Run an optional warmup load generator before the timed performance window, and update metrics queries to only include samples issued after TEST_STARTED.
Add unit/integration tests plus example YAML/README updates demonstrating warmup usage.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/unit/load_generator/test_session.py	Adds a unit test asserting warmup uses sequential sample ordering.
tests/integration/commands/test_warmup.py	Adds integration coverage for warmup running before performance mode (offline/online).
src/inference_endpoint/metrics/reporter.py	Introduces `test_started_timestamp_ns` and filters metric derivations/statuses to exclude pre-`TEST_STARTED` (warmup) samples.
src/inference_endpoint/load_generator/session.py	Adds warmup generator creation and execution before recording `TEST_STARTED`.
src/inference_endpoint/load_generator/scheduler.py	Adds `SequentialSampleOrder` for deterministic dataset traversal (used by warmup).
src/inference_endpoint/load_generator/init.py	Exports `SequentialSampleOrder`.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and `warmup` field to `BenchmarkConfig`.
src/inference_endpoint/commands/benchmark.py	Builds and loads warmup random dataset when configured and passes it into `BenchmarkSession.start`.
examples/09_Warmup_Example/warmup_online.yaml	New example config showing warmup for online mode.
examples/09_Warmup_Example/warmup_offline.yaml	New example config showing warmup for offline mode.
examples/09_Warmup_Example/README.md	Documentation for warmup behavior, configuration, and local echo-server testing.
examples/04_GPTOSS120B_Example/sglang_gptoss_120b_example.yaml	Updates example to gpt-oss-20b + endpoint/report_dir changes.
examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml	Updates example to gpt-oss-20b + endpoint/report_dir changes.
examples/04_GPTOSS120B_Example/Readme.md	Updates commands/docs formatting and ports/model name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/integration/commands/test_warmup.py

src/inference_endpoint/load_generator/session.py

src/inference_endpoint/config/schema.py

src/inference_endpoint/commands/benchmark.py

tests/integration/commands/test_warmup.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds an initial “warmup” phase to the benchmark flow by generating a random, tokenizer-backed dataset and issuing it before the timed performance window, with reporter-side filtering to exclude warmup samples from performance metrics.

Changes:

Introduces warmup config schema and CLI wiring to generate/load a RandomDataset for warmup.
Adds a warmup run in BenchmarkSession plus a new SequentialSampleOrder for deterministic warmup issuance.
Updates metrics rollups to use TEST_STARTED as the lower bound, and adds unit/integration tests + example configs/docs.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/load_generator/test_session.py	Adds a unit test asserting warmup uses sequential sample order.
tests/integration/commands/test_warmup.py	Adds integration tests validating warmup runs with the echo server.
src/inference_endpoint/metrics/reporter.py	Adds `TEST_STARTED` lower-bound filtering for key perf queries.
src/inference_endpoint/load_generator/session.py	Runs warmup generator before recording `TEST_STARTED`; wires warmup dataset into `start()`.
src/inference_endpoint/load_generator/scheduler.py	Adds `SequentialSampleOrder` implementation.
src/inference_endpoint/load_generator/init.py	Exports `SequentialSampleOrder`.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and a `warmup` field to `BenchmarkConfig`.
src/inference_endpoint/commands/benchmark.py	Builds and loads warmup random dataset when configured; passes it into the session.
examples/09_Warmup_Example/warmup_online.yaml	Adds an online warmup example config.
examples/09_Warmup_Example/warmup_offline.yaml	Adds an offline warmup example config.
examples/09_Warmup_Example/README.md	Documents warmup purpose, configuration, and how to run examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/inference_endpoint/load_generator/session.py

src/inference_endpoint/config/schema.py

tests/integration/commands/test_warmup.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Implements an initial “warmup phase” that issues a configurable number of randomly generated requests prior to the timed performance window, and updates metric derivation to exclude warmup samples from reported performance metrics.

Changes:

Add warmup configuration (warmup:) to the benchmark YAML schema and CLI execution path, generating a RandomDataset for warmup.
Run a warmup load-generator phase before emitting TEST_STARTED, and exclude pre-TEST_STARTED samples from TTFT/latency/status/output queries.
Add unit + integration tests and example configs/docs demonstrating warmup behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/load_generator/test_session.py	Adds a unit test asserting warmup uses sequential sample order.
tests/integration/commands/test_warmup.py	Adds end-to-end integration tests verifying warmup runs before performance against the echo server.
src/inference_endpoint/metrics/reporter.py	Adds `TEST_STARTED` timestamp lookup and filters several metric queries to exclude warmup samples.
src/inference_endpoint/load_generator/session.py	Adds warmup generator creation + warmup execution/drain phase before `TEST_STARTED`.
src/inference_endpoint/load_generator/scheduler.py	Introduces `SequentialSampleOrder` for deterministic warmup issuance.
src/inference_endpoint/load_generator/init.py	Exports `SequentialSampleOrder`.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and exposes `warmup` on `BenchmarkConfig`.
src/inference_endpoint/commands/benchmark.py	Builds/loads a warmup random dataset when configured and passes it into `BenchmarkSession.start`.
examples/09_Warmup_Example/warmup_online.yaml	Adds an online warmup example config.
examples/09_Warmup_Example/warmup_offline.yaml	Adds an offline warmup example config.
examples/09_Warmup_Example/README.md	Documents warmup behavior and how to run it locally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/inference_endpoint/load_generator/session.py

Copilot

Pull request overview

Implements an initial “warmup phase” that issues a set of randomly generated requests before the timed performance window, aiming to prime the endpoint while excluding warmup traffic from reported performance metrics (closes #86).

Changes:

Add warmup configuration (YAML schema) and warmup dataset generation using the existing RandomDataset.
Extend BenchmarkSession to optionally run a warmup generator before recording TEST_STARTED.
Update metrics queries to apply a lower bound at TEST_STARTED for several per-sample rollups, plus add unit/integration tests and examples.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/load_generator/test_session.py	Verifies warmup uses sequential sample order.
tests/integration/commands/test_warmup.py	Ensures warmup runs (via logs) before performance phase in offline/online modes.
src/inference_endpoint/metrics/reporter.py	Adds `TEST_STARTED` lower-bound filtering to multiple metrics queries.
src/inference_endpoint/load_generator/session.py	Adds warmup generator creation and warmup execution/drain prior to `TEST_STARTED`.
src/inference_endpoint/load_generator/scheduler.py	Introduces `SequentialSampleOrder` for deterministic warmup issuance.
src/inference_endpoint/load_generator/init.py	Exports `SequentialSampleOrder`.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and optional `BenchmarkConfig.warmup`.
src/inference_endpoint/commands/benchmark.py	Generates/loads warmup random dataset and passes it into `BenchmarkSession.start()`.
examples/09_Warmup_Example/*	Adds warmup example configs and README.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/inference_endpoint/metrics/reporter.py

        stop_ts = self.stop_performance_tracking_timestamp_ns
+        start_ts = self.test_started_timestamp_ns

-        # Build WHERE clause to filter samples issued before stop_ts
-        where_clause = ""
+        # Build WHERE clause to filter samples in the performance window
        if stop_ts != float("inf"):
            where_clause = f"""
            WHERE sample_uuid IN (
                SELECT sample_uuid FROM events
                WHERE event_type = '{SessionEvent.LOADGEN_ISSUE_CALLED.value}'
+                AND timestamp_ns >= {start_ts}
                AND timestamp_ns < {stop_ts}
            )
            """
+        else:
+            where_clause = f"""
+            WHERE sample_uuid IN (
+                SELECT sample_uuid FROM events
+                WHERE event_type = '{SessionEvent.LOADGEN_ISSUE_CALLED.value}'
+                AND timestamp_ns >= {start_ts}
+            )
+            """


src/inference_endpoint/load_generator/session.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Implements an initial “warmup” phase that issues randomly generated requests before the timed performance window, and updates metrics derivation to exclude warmup activity from reported performance metrics (closes #86).

Changes:

Add optional warmup config block and generate a RandomDataset warmup dataset in the benchmark command.
Run an explicit warmup generator before TEST_STARTED, and introduce a sequential sample order for deterministic warmup issuance.
Update MetricsReporter queries to filter samples based on TEST_STARTED (and STOP_PERFORMANCE_TRACKING) so warmup samples don’t affect perf metrics; add unit/integration tests and examples.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit/metrics/test_reporter.py	Adds unit coverage ensuring warmup errors are excluded from error counts.
tests/unit/load_generator/test_session.py	Verifies warmup uses sequential sample ordering.
tests/integration/commands/test_warmup.py	Adds integration tests validating warmup runs before perf against echo server.
src/inference_endpoint/metrics/reporter.py	Introduces `test_started_timestamp_ns` and applies perf-window filtering across multiple metrics/queries.
src/inference_endpoint/load_generator/session.py	Runs warmup issuance/drain before recording `TEST_STARTED`; wires warmup dataset into session start.
src/inference_endpoint/load_generator/scheduler.py	Adds `SequentialSampleOrder` implementation.
src/inference_endpoint/load_generator/init.py	Exports `SequentialSampleOrder`.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and optional `BenchmarkConfig.warmup`.
src/inference_endpoint/commands/benchmark.py	Generates and loads warmup dataset via `RandomDataset` and passes it to `BenchmarkSession.start`.
examples/09_Warmup_Example/warmup_online.yaml	Adds an online-mode YAML example using warmup.
examples/09_Warmup_Example/warmup_offline.yaml	Adds an offline-mode YAML example using warmup.
examples/09_Warmup_Example/README.md	Documents how warmup works and how to try it locally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

examples/09_Warmup_Example/README.md

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

nvzhihanj

Review Council — Multi-AI Code Review

Found 2 issues.

nvzhihanj · 2026-03-19T03:39:11Z

src/inference_endpoint/commands/benchmark.py

+            )
+        except (ValueError, TypeError) as e:
+            raise InputValidationError(
+                f"Failed to generate warmup dataset from warmup config: {e}"


[Codex] medium: Warmup ModelParams is reconstructed from scratch with only name and max_new_tokens, dropping workload-defining flags like streaming, sampling params, chat/template options, and API-specific fields. This means warmup doesn't exercise the same code path as the real test — e.g., online tests use streaming but warmup doesn't. Endpoints that require the full request shape may reject warmup requests entirely. Consider copying config.model_params and only overriding max_new_tokens for warmup.

src/inference_endpoint/load_generator/session.py

nvzhihanj · 2026-03-19T03:39:20Z

Review Council — Multi-AI Code Review Council

Reviewed by: Codex + Claude

Found 2 issues across 2 files:

1 high
1 medium

Each issue is posted as an inline comment on the relevant file and line.

#	File	Line	Severity	Reviewer(s)	Summary
1	`src/inference_endpoint/load_generator/session.py`	104	high	Codex + Claude	Warmup drain timeout proceeds to TEST_STARTED with inflight warmup traffic overlapping the perf test
2	`src/inference_endpoint/commands/benchmark.py`	476	medium	Codex	Warmup ModelParams drops streaming/sampling flags — doesn't warm the same codepath

examples/09_Warmup_Example/warmup_offline.yaml

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds a configurable warmup phase that issues randomly generated requests before the timed performance window, and updates metrics derivation to exclude warmup traffic from reported performance metrics (issue #86).

Changes:

Introduce warmup config (random token dataset generation) and wire it into the benchmark command/session execution flow.
Add a sequential warmup sample order + session warmup stage that drains in-flight warmup requests before TEST_STARTED.
Update MetricsReporter queries to only consider samples issued at/after TEST_STARTED, plus add/extend unit + integration tests and examples.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/inference_endpoint/commands/benchmark.py`	Generates and loads a `RandomDataset` for warmup when configured; passes it into `BenchmarkSession`.
`src/inference_endpoint/config/schema.py`	Adds `WarmupConfig` and exposes it on `BenchmarkConfig`.
`src/inference_endpoint/load_generator/session.py`	Runs warmup issuance + drain before recording `TEST_STARTED`; constructs a warmup load generator in `start()`.
`src/inference_endpoint/load_generator/scheduler.py`	Adds `SequentialSampleOrder` for deterministic warmup issuance ordering.
`src/inference_endpoint/load_generator/__init__.py`	Exports `SequentialSampleOrder`.
`src/inference_endpoint/metrics/reporter.py`	Adds `test_started_timestamp_ns` and filters multiple metrics to exclude warmup samples.
`tests/unit/metrics/test_reporter.py`	Adds a unit test ensuring warmup errors are excluded from `get_error_count()`.
`tests/unit/load_generator/test_session.py`	Adds a unit test asserting warmup uses sequential sample order.
`tests/integration/commands/test_warmup.py`	Adds integration tests for warmup running before offline/online perf runs (currently with a failing log assertion).
`examples/09_Warmup_Example/*`	Adds example YAMLs and README demonstrating warmup usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/integration/commands/test_warmup.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds support for an optional warmup phase that issues randomly generated requests before the timed benchmark window, and updates metric reporting to exclude warmup events from performance results.

Changes:

Introduces warmup configuration (random dataset generation + output token override) and wires it into the benchmark command/session lifecycle.
Filters metrics to the performance window starting at TEST_STARTED (and ending at STOP_PERFORMANCE_TRACKING when present) so warmup samples don’t affect reported metrics.
Adds unit/integration tests and example configs demonstrating warmup behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/inference_endpoint/config/schema.py`	Adds `WarmupConfig` and exposes `warmup` on `BenchmarkConfig`.
`src/inference_endpoint/commands/benchmark.py`	Generates/loads a RandomDataset-backed warmup dataset and passes it into the session.
`src/inference_endpoint/load_generator/session.py`	Executes warmup issuance + drain before emitting `TEST_STARTED`; builds a warmup generator/scheduler.
`src/inference_endpoint/load_generator/scheduler.py`	Adds `SequentialSampleOrder` for deterministic warmup sample ordering.
`src/inference_endpoint/load_generator/__init__.py`	Exports `SequentialSampleOrder`.
`src/inference_endpoint/metrics/reporter.py`	Adds cached `TEST_STARTED` timestamp and uses it as a lower bound in multiple metric queries.
`src/inference_endpoint/dataset_manager/predefined/random/__init__.py`	Renames `range_ratio` to `input_range_ratio` in random dataset generation.
`tests/unit/metrics/test_reporter.py`	Adds unit test ensuring warmup errors don’t count toward perf error count.
`tests/unit/load_generator/test_session.py`	Adds unit test verifying warmup uses sequential sample order.
`tests/unit/dataset_manager/test_data_loader.py`	Updates random dataset loader test to use `input_range_ratio`.
`tests/integration/commands/test_warmup.py`	Adds integration tests validating warmup runs before performance phase.
`examples/09_Warmup_Example/warmup_offline.yaml`	Adds offline warmup example configuration.
`examples/09_Warmup_Example/warmup_online.yaml`	Adds online warmup example configuration.
`examples/09_Warmup_Example/README.md`	Documents warmup usage and tuning parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

examples/09_Warmup_Example/README.md

tests/unit/dataset_manager/test_data_loader.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

viraatc · 2026-03-20T01:20:38Z

examples/09_Warmup_Example/README.md

+
+Warmup data is generated at runtime from random token IDs using the model


Btw we might want deterministic warmup data for reproducibility and debugging (at least controlled by RNG seed)

Given the random_seed in the config - it should be deterministic.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds an initial warmup phase (backed by a runtime-generated random dataset) that runs before the timed performance window so endpoints can be “primed” without contaminating performance metrics (closes #86).

Changes:

Introduces warmup configuration in the benchmark YAML schema and wires it into the benchmark command to generate/load a random warmup dataset.
Runs warmup requests before TEST_STARTED and updates metrics queries/reporting to exclude warmup samples while still reporting warmup wall-clock duration.
Adds unit + integration test coverage for warmup sequencing and metric exclusion behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/metrics/test_reporter.py	Adds coverage ensuring warmup errors don’t affect performance error counts; updates expected JSON fields.
tests/unit/metrics/test_recorder.py	Increases worker join timeout to reduce test flakiness.
tests/unit/load_generator/test_session.py	Adds unit test asserting warmup uses sequential sample ordering.
tests/unit/dataset_manager/test_data_loader.py	Renames random dataset arg to `input_range_ratio` and fixes f-string quoting.
tests/integration/commands/test_warmup.py	New integration tests verifying warmup runs before performance phase for offline/online modes.
src/inference_endpoint/metrics/reporter.py	Adds `warmup_duration_ns` to reports and filters metrics to only include events at/after `TEST_STARTED`.
src/inference_endpoint/load_generator/session.py	Implements warmup execution/drain logic and shares a single EventRecorder DB across warmup + perf.
src/inference_endpoint/load_generator/scheduler.py	Adds `SequentialSampleOrder` for deterministic warmup issuing.
src/inference_endpoint/load_generator/init.py	Re-exports `SequentialSampleOrder`.
src/inference_endpoint/dataset_manager/predefined/random/init.py	Renames `range_ratio` → `input_range_ratio` in random dataset generation API.
src/inference_endpoint/config/schema.py	Adds `WarmupConfig` and `BenchmarkConfig.warmup`.
src/inference_endpoint/commands/benchmark.py	Generates/loads warmup dataset from config and passes it into `BenchmarkSession.start()`.
examples/09_Warmup_Example/warmup_online.yaml	Adds an online warmup-enabled example config.
examples/09_Warmup_Example/warmup_offline.yaml	Adds an offline warmup-enabled example config.
examples/09_Warmup_Example/README.md	Documents warmup config usage and how to run the examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T18:01:56Z

src/inference_endpoint/load_generator/session.py

+        finally:
+            # If the thread did not start (warmup stopped early or an exception was
+            # raised), we own the recorder and must close it here.
+            if not thread_started:


In the failure paths where the benchmark thread never starts, this close() call can raise EventRecorderSingletonViolation if __enter__() failed (e.g., another recorder is already active). That would mask the original exception. Consider tracking whether the recorder was successfully opened (or checking EventRecorder.LIVE is session.event_recorder) before calling close() here, and avoid raising a secondary exception during cleanup.

Suggested change

if not thread_started:

# Only close if this session's recorder is the active one to avoid

# raising a secondary exception during cleanup.

if (not thread_started) and (

getattr(EventRecorder, "LIVE", None) is session.event_recorder

):

Copilot · 2026-03-23T18:01:57Z

src/inference_endpoint/load_generator/session.py

+        thread_started = False
+        try:
+            if warmup_generator is not None:
+                session._run_warmup(warmup_generator, max_shutdown_timeout_s or 300.0)


max_shutdown_timeout_s or 300.0 treats an explicit None (documented as “wait indefinitely”) the same as “use a 300s timeout” for warmup only. This makes warmup drain semantics inconsistent with the perf drain in _run_test (which receives None and waits forever). Prefer passing max_shutdown_timeout_s through unchanged (or updating the API/docs so both phases share the same default/meaning).

Suggested change

session._run_warmup(warmup_generator, max_shutdown_timeout_s or 300.0)

session._run_warmup(warmup_generator, max_shutdown_timeout_s)

Copilot · 2026-03-23T18:01:57Z

src/inference_endpoint/metrics/reporter.py

+        start_ts = self.test_started_timestamp_ns
        stop_ts = self.stop_performance_tracking_timestamp_ns



The behavior of this method has changed to also apply a lower bound at TEST_STARTED (via start_ts) when STOP_PERFORMANCE_TRACKING is missing. Please update the docstring above to reflect that pre-TEST_STARTED/warmup errors are excluded even without STOP_PERFORMANCE_TRACKING.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv · 2026-03-23T18:57:00Z

@nvzhihanj please review.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

viraatc · 2026-03-25T02:35:21Z

examples/09_Warmup_Example/warmup_offline.yaml

+    parser:
+      input: prompt
+
+settings:


should we rename this to benchmark-settings or smth else a bit more descriptive in a followup?
settings seems bit confusing / out-of-place here?

viraatc · 2026-03-25T02:37:28Z

src/inference_endpoint/config/schema.py

+        random_seed: Seed for reproducible warmup data generation.
+    """
+
+    num_samples: int = Field(100, gt=0)


lets add cycleopts descriptions as well since this will be showing up in CLI.

arekay-nv requested a review from a team as a code owner March 13, 2026 21:53

Copilot AI review requested due to automatic review settings March 13, 2026 21:53

github-actions bot requested a review from nvzhihanj March 13, 2026 21:53

Copilot started reviewing on behalf of arekay-nv March 13, 2026 21:54 View session

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

src/inference_endpoint/config/schema.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Initial warmup implementation

4565bb8

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv force-pushed the arekay/naive_warmup_impl branch from 634576c to 4565bb8 Compare March 13, 2026 22:02

Merge branch 'main' into arekay/naive_warmup_impl

2e22d0c

Copilot AI review requested due to automatic review settings March 16, 2026 23:11

Copilot started reviewing on behalf of arekay-nv March 16, 2026 23:11 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

arekay-nv added 2 commits March 16, 2026 19:24

Fix test.

df1e8db

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Fix type hint

057991d

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 17, 2026 00:33

Copilot started reviewing on behalf of arekay-nv March 17, 2026 00:34 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

src/inference_endpoint/load_generator/session.py Show resolved Hide resolved

src/inference_endpoint/load_generator/session.py Outdated Show resolved Hide resolved

arekay-nv mentioned this pull request Mar 17, 2026

feature: Support integrating random dataset #73

Open

arekay-nv force-pushed the arekay/naive_warmup_impl branch from b728474 to 5c8df76 Compare March 18, 2026 17:59

Copilot AI review requested due to automatic review settings March 18, 2026 18:00

arekay-nv force-pushed the arekay/naive_warmup_impl branch from 5c8df76 to b65b27a Compare March 18, 2026 18:00

Copilot started reviewing on behalf of arekay-nv March 18, 2026 18:00 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

arekay-nv force-pushed the arekay/naive_warmup_impl branch from b65b27a to 34c2bd1 Compare March 18, 2026 19:39

Lint

9a37881

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 18, 2026 19:52

arekay-nv force-pushed the arekay/naive_warmup_impl branch from 34c2bd1 to 9a37881 Compare March 18, 2026 19:52

Copilot AI reviewed Mar 18, 2026

View reviewed changes

examples/09_Warmup_Example/README.md Outdated Show resolved Hide resolved

Update doc

4e1e047

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv changed the title ~~[feat] WIP - Initial warmup implementation~~ [feat] Initial warmup with random dataset Mar 18, 2026

nvzhihanj reviewed Mar 19, 2026

View reviewed changes

examples/09_Warmup_Example/warmup_offline.yaml Outdated Show resolved Hide resolved

nvzhihanj requested changes Mar 19, 2026

View reviewed changes

Address comments.

e4cd553

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 19, 2026 06:05

Copilot AI reviewed Mar 19, 2026

View reviewed changes

tests/integration/commands/test_warmup.py Show resolved Hide resolved

tests/integration/commands/test_warmup.py Outdated Show resolved Hide resolved

arekay-nv added 2 commits March 19, 2026 01:14

Fixes.

b8309b4

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Merge branch 'main' into arekay/naive_warmup_impl

2cadffe

Copilot AI review requested due to automatic review settings March 19, 2026 06:19

Copilot AI reviewed Mar 19, 2026

View reviewed changes

Address doc/yaml

909ac9f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from nvzhihanj March 19, 2026 06:44

viraatc reviewed Mar 20, 2026

View reviewed changes

viraatc approved these changes Mar 20, 2026

View reviewed changes

Move warmup outside timeout

5eb6b68

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 23, 2026 17:56

Copilot started reviewing on behalf of arekay-nv March 23, 2026 17:57 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Fix docstring

dc8a302

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Merge branch 'main' into arekay/naive_warmup_impl

86b7a78

Copilot AI review requested due to automatic review settings March 23, 2026 23:52

Copilot started reviewing on behalf of arekay-nv March 23, 2026 23:52 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Merge branch 'main' into arekay/naive_warmup_impl

4b55239

viraatc reviewed Mar 25, 2026

View reviewed changes


		Warmup data is generated at runtime from random token IDs using the model

-            if not thread_started:
+            # Only close if this session's recorder is the active one to avoid
+            # raising a secondary exception during cleanup.
+            if (not thread_started) and (
+                getattr(EventRecorder, "LIVE", None) is session.event_recorder
+            ):

	session._run_warmup(warmup_generator, max_shutdown_timeout_s or 300.0)
	session._run_warmup(warmup_generator, max_shutdown_timeout_s)

		start_ts = self.test_started_timestamp_ns
		stop_ts = self.stop_performance_tracking_timestamp_ns

Conversation

arekay-nv commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

nvzhihanj left a comment

Choose a reason for hiding this comment

Review Council — Multi-AI Code Review

Uh oh!

nvzhihanj Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nvzhihanj commented Mar 19, 2026

Review Council — Multi-AI Code Review Council

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

arekay-nv commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading

viraatc Mar 25, 2026 •

edited

Loading