Skip to content

[feat] Initial warmup with random dataset#170

Open
arekay-nv wants to merge 14 commits intomainfrom
arekay/naive_warmup_impl
Open

[feat] Initial warmup with random dataset#170
arekay-nv wants to merge 14 commits intomainfrom
arekay/naive_warmup_impl

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv commented Mar 13, 2026

What does this PR do?

Adds a random dataset backed warmup stage.
Closed #86

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@arekay-nv arekay-nv requested a review from a team as a code owner March 13, 2026 21:53
Copilot AI review requested due to automatic review settings March 13, 2026 21:53
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested a review from nvzhihanj March 13, 2026 21:53
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a crucial warmup phase into the benchmarking process. This phase is designed to mitigate 'cold start' effects by sending a series of randomly generated requests to the target endpoint before actual performance measurements begin. By doing so, it ensures that the system is in a steady state, with connections established and caches populated, leading to more accurate and representative benchmark results. The warmup parameters are fully configurable, and its activities are isolated from the final performance metrics.

Highlights

  • Warmup Phase Introduction: Introduced a new configurable 'warmup' phase to benchmarks, which issues randomly generated requests to prime the endpoint before the timed performance test begins. This helps establish TCP connections, fill KV caches, and trigger JIT compilation.
  • Metrics Exclusion: Ensured that all samples issued during the warmup phase are explicitly excluded from reported performance metrics (throughput, latency, TTFT, TPOT) by recording them before the 'TEST_STARTED' event.
  • Configuration Schema: Added a new WarmupConfig schema to define parameters for the warmup phase, including num_samples, input_seq_length, output_seq_length, range_ratio, and random_seed.
  • Sequential Sample Ordering: Implemented a SequentialSampleOrder to ensure warmup requests are issued in a predictable, non-random sequence, which is useful for controlled priming.
  • New Example and Documentation: Provided a new example (09_Warmup_Example) with detailed documentation (README.md) and configuration files (warmup_offline.yaml, warmup_online.yaml) demonstrating how to use and tune the warmup feature.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/04_GPTOSS120B_Example/Readme.md
    • Updated code block syntax to specify bash or python.
    • Changed MODEL_NAME from gpt-oss-120b to gpt-oss-20b.
    • Adjusted SGLang launch command port from 3000 to 30000.
    • Added instructions for activating the virtual environment before launching benchmarks.
  • examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml
    • Updated benchmark name from gpt-oss-120b-benchmark to gpt-oss-20b-benchmark.
    • Changed model name from openai/gpt-oss-120b to openai/gpt-oss-20b.
    • Updated endpoint from http://localhost:3000 to http://localhost:8000.
    • Adjusted report directory name from gptoss_120b_benchmark_mlperf to gptoss_20b_benchmark_mlperf.
  • examples/04_GPTOSS120B_Example/sglang_gptoss_120b_example.yaml
    • Updated benchmark name from gpt-oss-120b-benchmark to gpt-oss-20b-sglang-benchmark.
    • Changed model name from openai/gpt-oss-120b to openai/gpt-oss-20b.
    • Adjusted report directory name from sglang_gptoss_120b_benchmark_full to sglang_gptoss_20b_benchmark_full.
  • examples/09_Warmup_Example/README.md
    • Added a new README file detailing the warmup phase feature, its purpose, configuration, and usage examples for both echo server and real endpoints.
  • examples/09_Warmup_Example/warmup_offline.yaml
    • Added a new YAML configuration file for an offline throughput benchmark with a warmup phase.
  • examples/09_Warmup_Example/warmup_online.yaml
    • Added a new YAML configuration file for an online (sustained QPS) benchmark with a warmup phase.
  • src/inference_endpoint/commands/benchmark.py
    • Added logic to construct a warmup dataset using RandomDataset.generate if a warmup configuration is present.
    • Integrated the warmup_dataset into the _run_benchmark function call.
  • src/inference_endpoint/config/schema.py
    • Defined a new WarmupConfig class to specify warmup parameters.
    • Added an optional warmup field of type WarmupConfig to the BenchmarkConfig.
  • src/inference_endpoint/load_generator/init.py
    • Exported the new SequentialSampleOrder class.
  • src/inference_endpoint/load_generator/scheduler.py
    • Added a SequentialSampleOrder class for issuing samples in their natural order without randomness.
  • src/inference_endpoint/load_generator/session.py
    • Imported LoadPattern, LoadPatternType, and SequentialSampleOrder.
    • Modified the _run_test method to accept and execute a warmup_generator before the main performance test.
    • Added a warmup_dataset parameter to the start method.
    • Implemented the creation of a warmup_generator using MaxThroughputScheduler and SequentialSampleOrder.
  • src/inference_endpoint/metrics/reporter.py
    • Added a test_started_timestamp_ns cached property to retrieve the timestamp of the TEST_STARTED event.
    • Updated derive_TTFT, derive_sample_latency, get_sample_statuses, and get_sample_outputs methods to filter events based on test_started_timestamp_ns, ensuring warmup samples are excluded from metrics.
    • Refactored get_test_started_at to use the new test_started_timestamp_ns property.
  • tests/integration/commands/test_warmup.py
    • Added new integration tests to verify the warmup phase functionality for both offline and online benchmarks against an echo server.
  • tests/unit/load_generator/test_session.py
    • Added a unit test to confirm that the warmup phase correctly utilizes SequentialSampleOrder.
Activity
  • The pull request was created with a 'WIP' (Work In Progress) tag, indicating that the initial implementation of the warmup feature is being introduced.
  • No specific reviewer comments or further progress updates were provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a warmup feature to the benchmarking tool. The implementation adds a configurable warmup stage that runs before the timed performance test, using a randomly generated dataset to prime the inference endpoint. The changes are well-structured, spanning configuration schema, benchmark orchestration, and metrics reporting to correctly exclude warmup samples. The PR also includes clear documentation and tests for the new feature. I have one suggestion to enhance the configuration validation for better robustness.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial “warmup” phase to the benchmark flow by generating a random-token dataset, issuing it before TEST_STARTED, and filtering metrics to exclude warmup-issued samples.

Changes:

  • Add warmup configuration (warmup:) and generate/load a RandomDataset for warmup in the benchmark command.
  • Run an optional warmup load generator before the timed performance window, and update metrics queries to only include samples issued after TEST_STARTED.
  • Add unit/integration tests plus example YAML/README updates demonstrating warmup usage.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/load_generator/test_session.py Adds a unit test asserting warmup uses sequential sample ordering.
tests/integration/commands/test_warmup.py Adds integration coverage for warmup running before performance mode (offline/online).
src/inference_endpoint/metrics/reporter.py Introduces test_started_timestamp_ns and filters metric derivations/statuses to exclude pre-TEST_STARTED (warmup) samples.
src/inference_endpoint/load_generator/session.py Adds warmup generator creation and execution before recording TEST_STARTED.
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder for deterministic dataset traversal (used by warmup).
src/inference_endpoint/load_generator/init.py Exports SequentialSampleOrder.
src/inference_endpoint/config/schema.py Adds WarmupConfig and warmup field to BenchmarkConfig.
src/inference_endpoint/commands/benchmark.py Builds and loads warmup random dataset when configured and passes it into BenchmarkSession.start.
examples/09_Warmup_Example/warmup_online.yaml New example config showing warmup for online mode.
examples/09_Warmup_Example/warmup_offline.yaml New example config showing warmup for offline mode.
examples/09_Warmup_Example/README.md Documentation for warmup behavior, configuration, and local echo-server testing.
examples/04_GPTOSS120B_Example/sglang_gptoss_120b_example.yaml Updates example to gpt-oss-20b + endpoint/report_dir changes.
examples/04_GPTOSS120B_Example/gptoss_120b_example.yaml Updates example to gpt-oss-20b + endpoint/report_dir changes.
examples/04_GPTOSS120B_Example/Readme.md Updates commands/docs formatting and ports/model name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv force-pushed the arekay/naive_warmup_impl branch from 634576c to 4565bb8 Compare March 13, 2026 22:02
Copilot AI review requested due to automatic review settings March 16, 2026 23:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial “warmup” phase to the benchmark flow by generating a random, tokenizer-backed dataset and issuing it before the timed performance window, with reporter-side filtering to exclude warmup samples from performance metrics.

Changes:

  • Introduces warmup config schema and CLI wiring to generate/load a RandomDataset for warmup.
  • Adds a warmup run in BenchmarkSession plus a new SequentialSampleOrder for deterministic warmup issuance.
  • Updates metrics rollups to use TEST_STARTED as the lower bound, and adds unit/integration tests + example configs/docs.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/load_generator/test_session.py Adds a unit test asserting warmup uses sequential sample order.
tests/integration/commands/test_warmup.py Adds integration tests validating warmup runs with the echo server.
src/inference_endpoint/metrics/reporter.py Adds TEST_STARTED lower-bound filtering for key perf queries.
src/inference_endpoint/load_generator/session.py Runs warmup generator before recording TEST_STARTED; wires warmup dataset into start().
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder implementation.
src/inference_endpoint/load_generator/init.py Exports SequentialSampleOrder.
src/inference_endpoint/config/schema.py Adds WarmupConfig and a warmup field to BenchmarkConfig.
src/inference_endpoint/commands/benchmark.py Builds and loads warmup random dataset when configured; passes it into the session.
examples/09_Warmup_Example/warmup_online.yaml Adds an online warmup example config.
examples/09_Warmup_Example/warmup_offline.yaml Adds an offline warmup example config.
examples/09_Warmup_Example/README.md Documents warmup purpose, configuration, and how to run examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 17, 2026 00:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an initial “warmup phase” that issues a configurable number of randomly generated requests prior to the timed performance window, and updates metric derivation to exclude warmup samples from reported performance metrics.

Changes:

  • Add warmup configuration (warmup:) to the benchmark YAML schema and CLI execution path, generating a RandomDataset for warmup.
  • Run a warmup load-generator phase before emitting TEST_STARTED, and exclude pre-TEST_STARTED samples from TTFT/latency/status/output queries.
  • Add unit + integration tests and example configs/docs demonstrating warmup behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/load_generator/test_session.py Adds a unit test asserting warmup uses sequential sample order.
tests/integration/commands/test_warmup.py Adds end-to-end integration tests verifying warmup runs before performance against the echo server.
src/inference_endpoint/metrics/reporter.py Adds TEST_STARTED timestamp lookup and filters several metric queries to exclude warmup samples.
src/inference_endpoint/load_generator/session.py Adds warmup generator creation + warmup execution/drain phase before TEST_STARTED.
src/inference_endpoint/load_generator/scheduler.py Introduces SequentialSampleOrder for deterministic warmup issuance.
src/inference_endpoint/load_generator/init.py Exports SequentialSampleOrder.
src/inference_endpoint/config/schema.py Adds WarmupConfig and exposes warmup on BenchmarkConfig.
src/inference_endpoint/commands/benchmark.py Builds/loads a warmup random dataset when configured and passes it into BenchmarkSession.start.
examples/09_Warmup_Example/warmup_online.yaml Adds an online warmup example config.
examples/09_Warmup_Example/warmup_offline.yaml Adds an offline warmup example config.
examples/09_Warmup_Example/README.md Documents warmup behavior and how to run it locally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@arekay-nv arekay-nv force-pushed the arekay/naive_warmup_impl branch from b728474 to 5c8df76 Compare March 18, 2026 17:59
Copilot AI review requested due to automatic review settings March 18, 2026 18:00
@arekay-nv arekay-nv force-pushed the arekay/naive_warmup_impl branch from 5c8df76 to b65b27a Compare March 18, 2026 18:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an initial “warmup phase” that issues a set of randomly generated requests before the timed performance window, aiming to prime the endpoint while excluding warmup traffic from reported performance metrics (closes #86).

Changes:

  • Add warmup configuration (YAML schema) and warmup dataset generation using the existing RandomDataset.
  • Extend BenchmarkSession to optionally run a warmup generator before recording TEST_STARTED.
  • Update metrics queries to apply a lower bound at TEST_STARTED for several per-sample rollups, plus add unit/integration tests and examples.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/load_generator/test_session.py Verifies warmup uses sequential sample order.
tests/integration/commands/test_warmup.py Ensures warmup runs (via logs) before performance phase in offline/online modes.
src/inference_endpoint/metrics/reporter.py Adds TEST_STARTED lower-bound filtering to multiple metrics queries.
src/inference_endpoint/load_generator/session.py Adds warmup generator creation and warmup execution/drain prior to TEST_STARTED.
src/inference_endpoint/load_generator/scheduler.py Introduces SequentialSampleOrder for deterministic warmup issuance.
src/inference_endpoint/load_generator/init.py Exports SequentialSampleOrder.
src/inference_endpoint/config/schema.py Adds WarmupConfig and optional BenchmarkConfig.warmup.
src/inference_endpoint/commands/benchmark.py Generates/loads warmup random dataset and passes it into BenchmarkSession.start().
examples/09_Warmup_Example/* Adds warmup example configs and README.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 930 to +950
stop_ts = self.stop_performance_tracking_timestamp_ns
start_ts = self.test_started_timestamp_ns

# Build WHERE clause to filter samples issued before stop_ts
where_clause = ""
# Build WHERE clause to filter samples in the performance window
if stop_ts != float("inf"):
where_clause = f"""
WHERE sample_uuid IN (
SELECT sample_uuid FROM events
WHERE event_type = '{SessionEvent.LOADGEN_ISSUE_CALLED.value}'
AND timestamp_ns >= {start_ts}
AND timestamp_ns < {stop_ts}
)
"""
else:
where_clause = f"""
WHERE sample_uuid IN (
SELECT sample_uuid FROM events
WHERE event_type = '{SessionEvent.LOADGEN_ISSUE_CALLED.value}'
AND timestamp_ns >= {start_ts}
)
"""
@arekay-nv arekay-nv force-pushed the arekay/naive_warmup_impl branch from b65b27a to 34c2bd1 Compare March 18, 2026 19:39
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 18, 2026 19:52
@arekay-nv arekay-nv force-pushed the arekay/naive_warmup_impl branch from 34c2bd1 to 9a37881 Compare March 18, 2026 19:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an initial “warmup” phase that issues randomly generated requests before the timed performance window, and updates metrics derivation to exclude warmup activity from reported performance metrics (closes #86).

Changes:

  • Add optional warmup config block and generate a RandomDataset warmup dataset in the benchmark command.
  • Run an explicit warmup generator before TEST_STARTED, and introduce a sequential sample order for deterministic warmup issuance.
  • Update MetricsReporter queries to filter samples based on TEST_STARTED (and STOP_PERFORMANCE_TRACKING) so warmup samples don’t affect perf metrics; add unit/integration tests and examples.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/metrics/test_reporter.py Adds unit coverage ensuring warmup errors are excluded from error counts.
tests/unit/load_generator/test_session.py Verifies warmup uses sequential sample ordering.
tests/integration/commands/test_warmup.py Adds integration tests validating warmup runs before perf against echo server.
src/inference_endpoint/metrics/reporter.py Introduces test_started_timestamp_ns and applies perf-window filtering across multiple metrics/queries.
src/inference_endpoint/load_generator/session.py Runs warmup issuance/drain before recording TEST_STARTED; wires warmup dataset into session start.
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder implementation.
src/inference_endpoint/load_generator/init.py Exports SequentialSampleOrder.
src/inference_endpoint/config/schema.py Adds WarmupConfig and optional BenchmarkConfig.warmup.
src/inference_endpoint/commands/benchmark.py Generates and loads warmup dataset via RandomDataset and passes it to BenchmarkSession.start.
examples/09_Warmup_Example/warmup_online.yaml Adds an online-mode YAML example using warmup.
examples/09_Warmup_Example/warmup_offline.yaml Adds an offline-mode YAML example using warmup.
examples/09_Warmup_Example/README.md Documents how warmup works and how to try it locally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv changed the title [feat] WIP - Initial warmup implementation [feat] Initial warmup with random dataset Mar 18, 2026
Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Council — Multi-AI Code Review

Found 2 issues.

)
except (ValueError, TypeError) as e:
raise InputValidationError(
f"Failed to generate warmup dataset from warmup config: {e}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] medium: Warmup ModelParams is reconstructed from scratch with only name and max_new_tokens, dropping workload-defining flags like streaming, sampling params, chat/template options, and API-specific fields. This means warmup doesn't exercise the same code path as the real test — e.g., online tests use streaming but warmup doesn't. Endpoints that require the full request shape may reject warmup requests entirely. Consider copying config.model_params and only overriding max_new_tokens for warmup.

@nvzhihanj
Copy link
Copy Markdown
Collaborator

Review Council — Multi-AI Code Review Council

Reviewed by: Codex + Claude

Found 2 issues across 2 files:

  • 1 high
  • 1 medium

Each issue is posted as an inline comment on the relevant file and line.

# File Line Severity Reviewer(s) Summary
1 src/inference_endpoint/load_generator/session.py 104 high Codex + Claude Warmup drain timeout proceeds to TEST_STARTED with inflight warmup traffic overlapping the perf test
2 src/inference_endpoint/commands/benchmark.py 476 medium Codex Warmup ModelParams drops streaming/sampling flags — doesn't warm the same codepath

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 19, 2026 06:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable warmup phase that issues randomly generated requests before the timed performance window, and updates metrics derivation to exclude warmup traffic from reported performance metrics (issue #86).

Changes:

  • Introduce warmup config (random token dataset generation) and wire it into the benchmark command/session execution flow.
  • Add a sequential warmup sample order + session warmup stage that drains in-flight warmup requests before TEST_STARTED.
  • Update MetricsReporter queries to only consider samples issued at/after TEST_STARTED, plus add/extend unit + integration tests and examples.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/inference_endpoint/commands/benchmark.py Generates and loads a RandomDataset for warmup when configured; passes it into BenchmarkSession.
src/inference_endpoint/config/schema.py Adds WarmupConfig and exposes it on BenchmarkConfig.
src/inference_endpoint/load_generator/session.py Runs warmup issuance + drain before recording TEST_STARTED; constructs a warmup load generator in start().
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder for deterministic warmup issuance ordering.
src/inference_endpoint/load_generator/__init__.py Exports SequentialSampleOrder.
src/inference_endpoint/metrics/reporter.py Adds test_started_timestamp_ns and filters multiple metrics to exclude warmup samples.
tests/unit/metrics/test_reporter.py Adds a unit test ensuring warmup errors are excluded from get_error_count().
tests/unit/load_generator/test_session.py Adds a unit test asserting warmup uses sequential sample order.
tests/integration/commands/test_warmup.py Adds integration tests for warmup running before offline/online perf runs (currently with a failing log assertion).
examples/09_Warmup_Example/* Adds example YAMLs and README demonstrating warmup usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 19, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for an optional warmup phase that issues randomly generated requests before the timed benchmark window, and updates metric reporting to exclude warmup events from performance results.

Changes:

  • Introduces warmup configuration (random dataset generation + output token override) and wires it into the benchmark command/session lifecycle.
  • Filters metrics to the performance window starting at TEST_STARTED (and ending at STOP_PERFORMANCE_TRACKING when present) so warmup samples don’t affect reported metrics.
  • Adds unit/integration tests and example configs demonstrating warmup behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/inference_endpoint/config/schema.py Adds WarmupConfig and exposes warmup on BenchmarkConfig.
src/inference_endpoint/commands/benchmark.py Generates/loads a RandomDataset-backed warmup dataset and passes it into the session.
src/inference_endpoint/load_generator/session.py Executes warmup issuance + drain before emitting TEST_STARTED; builds a warmup generator/scheduler.
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder for deterministic warmup sample ordering.
src/inference_endpoint/load_generator/__init__.py Exports SequentialSampleOrder.
src/inference_endpoint/metrics/reporter.py Adds cached TEST_STARTED timestamp and uses it as a lower bound in multiple metric queries.
src/inference_endpoint/dataset_manager/predefined/random/__init__.py Renames range_ratio to input_range_ratio in random dataset generation.
tests/unit/metrics/test_reporter.py Adds unit test ensuring warmup errors don’t count toward perf error count.
tests/unit/load_generator/test_session.py Adds unit test verifying warmup uses sequential sample order.
tests/unit/dataset_manager/test_data_loader.py Updates random dataset loader test to use input_range_ratio.
tests/integration/commands/test_warmup.py Adds integration tests validating warmup runs before performance phase.
examples/09_Warmup_Example/warmup_offline.yaml Adds offline warmup example configuration.
examples/09_Warmup_Example/warmup_online.yaml Adds online warmup example configuration.
examples/09_Warmup_Example/README.md Documents warmup usage and tuning parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv requested a review from nvzhihanj March 19, 2026 06:44
Comment on lines +25 to +26

Warmup data is generated at runtime from random token IDs using the model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw we might want deterministic warmup data for reproducibility and debugging (at least controlled by RNG seed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the random_seed in the config - it should be deterministic.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 23, 2026 17:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial warmup phase (backed by a runtime-generated random dataset) that runs before the timed performance window so endpoints can be “primed” without contaminating performance metrics (closes #86).

Changes:

  • Introduces warmup configuration in the benchmark YAML schema and wires it into the benchmark command to generate/load a random warmup dataset.
  • Runs warmup requests before TEST_STARTED and updates metrics queries/reporting to exclude warmup samples while still reporting warmup wall-clock duration.
  • Adds unit + integration test coverage for warmup sequencing and metric exclusion behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/metrics/test_reporter.py Adds coverage ensuring warmup errors don’t affect performance error counts; updates expected JSON fields.
tests/unit/metrics/test_recorder.py Increases worker join timeout to reduce test flakiness.
tests/unit/load_generator/test_session.py Adds unit test asserting warmup uses sequential sample ordering.
tests/unit/dataset_manager/test_data_loader.py Renames random dataset arg to input_range_ratio and fixes f-string quoting.
tests/integration/commands/test_warmup.py New integration tests verifying warmup runs before performance phase for offline/online modes.
src/inference_endpoint/metrics/reporter.py Adds warmup_duration_ns to reports and filters metrics to only include events at/after TEST_STARTED.
src/inference_endpoint/load_generator/session.py Implements warmup execution/drain logic and shares a single EventRecorder DB across warmup + perf.
src/inference_endpoint/load_generator/scheduler.py Adds SequentialSampleOrder for deterministic warmup issuing.
src/inference_endpoint/load_generator/init.py Re-exports SequentialSampleOrder.
src/inference_endpoint/dataset_manager/predefined/random/init.py Renames range_ratioinput_range_ratio in random dataset generation API.
src/inference_endpoint/config/schema.py Adds WarmupConfig and BenchmarkConfig.warmup.
src/inference_endpoint/commands/benchmark.py Generates/loads warmup dataset from config and passes it into BenchmarkSession.start().
examples/09_Warmup_Example/warmup_online.yaml Adds an online warmup-enabled example config.
examples/09_Warmup_Example/warmup_offline.yaml Adds an offline warmup-enabled example config.
examples/09_Warmup_Example/README.md Documents warmup config usage and how to run the examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

finally:
# If the thread did not start (warmup stopped early or an exception was
# raised), we own the recorder and must close it here.
if not thread_started:
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the failure paths where the benchmark thread never starts, this close() call can raise EventRecorderSingletonViolation if __enter__() failed (e.g., another recorder is already active). That would mask the original exception. Consider tracking whether the recorder was successfully opened (or checking EventRecorder.LIVE is session.event_recorder) before calling close() here, and avoid raising a secondary exception during cleanup.

Suggested change
if not thread_started:
# Only close if this session's recorder is the active one to avoid
# raising a secondary exception during cleanup.
if (not thread_started) and (
getattr(EventRecorder, "LIVE", None) is session.event_recorder
):

Copilot uses AI. Check for mistakes.
thread_started = False
try:
if warmup_generator is not None:
session._run_warmup(warmup_generator, max_shutdown_timeout_s or 300.0)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_shutdown_timeout_s or 300.0 treats an explicit None (documented as “wait indefinitely”) the same as “use a 300s timeout” for warmup only. This makes warmup drain semantics inconsistent with the perf drain in _run_test (which receives None and waits forever). Prefer passing max_shutdown_timeout_s through unchanged (or updating the API/docs so both phases share the same default/meaning).

Suggested change
session._run_warmup(warmup_generator, max_shutdown_timeout_s or 300.0)
session._run_warmup(warmup_generator, max_shutdown_timeout_s)

Copilot uses AI. Check for mistakes.
Comment on lines +983 to 985
start_ts = self.test_started_timestamp_ns
stop_ts = self.stop_performance_tracking_timestamp_ns

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of this method has changed to also apply a lower bound at TEST_STARTED (via start_ts) when STOP_PERFORMANCE_TRACKING is missing. Please update the docstring above to reflect that pre-TEST_STARTED/warmup errors are excluded even without STOP_PERFORMANCE_TRACKING.

Copilot uses AI. Check for mistakes.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv
Copy link
Copy Markdown
Collaborator Author

@nvzhihanj please review.

Copilot AI review requested due to automatic review settings March 23, 2026 23:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

parser:
input: prompt

settings:
Copy link
Copy Markdown
Collaborator

@viraatc viraatc Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename this to benchmark-settings or smth else a bit more descriptive in a followup?
settings seems bit confusing / out-of-place here?

random_seed: Seed for reproducible warmup data generation.
"""

num_samples: int = Field(100, gt=0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add cycleopts descriptions as well since this will be showing up in CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Warmup runs

4 participants