Skip to content

feat: add --service-tier-dist for per-request service_tier distribution#675

Open
ajcasagrande wants to merge 3 commits intomainfrom
ajc/vip-only
Open

feat: add --service-tier-dist for per-request service_tier distribution#675
ajcasagrande wants to merge 3 commits intomainfrom
ajc/vip-only

Conversation

@ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Feb 11, 2026

Allows users to specify a distribution of service_tier values (e.g. default:50;flex:30;priority:20) that get sampled per turn and sent in OpenAI chat/completions payloads. The response service_tier is extracted into record metadata for downstream analysis.

Summary by CodeRabbit

  • New Features

    • Added a CLI option to specify a distribution of service tiers for API requests (format: tier:percentage;tier:percentage).
    • Service tier is now sent with requests when present and captured in response metrics.
  • Validation

    • CLI option is disallowed alongside explicit per-request service_tier and restricted to chat/completions endpoints.
  • Tests

    • Added unit and endpoint tests covering distribution parsing, sampling, payloads, and response extraction.
  • Documentation

    • CLI docs updated with the new option and usage examples.

Allows users to specify a distribution of service_tier values (e.g.
`default:50;flex:30;priority:20`) that get sampled per turn and sent
in OpenAI chat/completions payloads. The response service_tier is
extracted into record metadata for downstream analysis.

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@github-actions
Copy link

github-actions bot commented Feb 11, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@808d652a48346325d7a339dc757fa6b39efc1ea5

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@808d652a48346325d7a339dc757fa6b39efc1ea5

Last updated for commit: 808d652Browse code

@ajcasagrande
Copy link
Contributor Author

Note: currently does not add support for grouping the results by service_tier. I think that one fits in more with my MetricsAccumulator refactoring

@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Walkthrough

Adds service-tier distribution support: a CLI option to specify tier probabilities, models and parser for distributions with sampling, config validators and accessors, endpoints updated to send/receive service_tier, dataset composer assigns sampled tiers to turns, metadata recorded, and tests/docs added.

Changes

Cohort / File(s) Summary
Configuration
src/aiperf/common/config/input_config.py, src/aiperf/common/config/user_config.py
Added --service-tier-dist CLI field and accessor, mutual-exclusivity and endpoint-type validators (CHAT/COMPLETIONS restriction).
Distribution Model
src/aiperf/common/models/service_tier_distribution.py
New ServiceTierEntry, ServiceTierDistribution, and ServiceTierDistributionParser implementing validated parsing, probability-sum checks, cumulative distribution, and O(log n) sampling.
Data Models
src/aiperf/common/models/dataset_models.py, src/aiperf/common/models/record_models.py
Added optional service_tier field to Turn and MetricRecordMetadata, propagated in copy/creation paths.
Endpoints
src/aiperf/endpoints/openai_chat.py, src/aiperf/endpoints/openai_completions.py
Include service_tier from turns in outgoing payloads and extract service_tier from responses into ParsedResponse metadata.
Dataset Composer
src/aiperf/dataset/composer/base.py
Initialize service-tier distribution from config and sample/assign turn.service_tier during turn finalization.
Record Processing
src/aiperf/records/record_processor_service.py
Extract service_tier from response metadata when building metric record metadata.
Docs
docs/cli_options.md
Documented --service-tier-dist format and accepted tier names with examples.
Tests
tests/unit/common/models/test_service_tier_distribution.py, tests/unit/endpoints/test_openai_chat_completions.py, tests/unit/endpoints/test_openai_completions.py
Added unit tests for distribution parsing/validation/sampling and endpoint payload/response metadata behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble strings of tier and prob,
I shuffle seeds in a tiny gob,
I sample swift with a hop and twirl,
Assign a tier, then give a whirl,
Hooray — the requests now dance and bob!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new CLI option --service-tier-dist for sampling service_tier values per request from a distribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/aiperf/common/models/service_tier_distribution.py (1)

63-91: Nit: redundant list() conversion on Line 86.

_validate_probability_sum accepts list[ServiceTierEntry] but self._entries is already a tuple of the same type. Since the function only iterates via sum(), accepting Sequence or just passing the tuple would avoid the copy.

Proposed fix

Either widen the type hint of _validate_probability_sum:

-def _validate_probability_sum(entries: list[ServiceTierEntry]) -> None:
+def _validate_probability_sum(entries: list[ServiceTierEntry] | tuple[ServiceTierEntry, ...]) -> None:

and drop the conversion:

-        _validate_probability_sum(list(self._entries))
+        _validate_probability_sum(self._entries)

Or use Sequence[ServiceTierEntry] from collections.abc.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/cli_options.md`:
- Around line 251-253: Update the docs and/or validation to make behavior
consistent: either relax the documentation text in the CLI docs and the
Field(description=...) used in input_config.py to say "Common tiers" or "Example
tiers" (listing auto, default, flex, scale, priority) OR add whitelist
validation inside ServiceTierEntry to only accept those specific strings; ensure
both the docs string (the Field(description=...) in input_config.py) and the CLI
docs entry for --service-tier-dist match the chosen approach so behavior and
documentation are consistent with the ServiceTierEntry validation.

In `@src/aiperf/common/config/input_config.py`:
- Around line 377-384: Add an explicit return type to
get_service_tier_distribution: annotate it as returning
Optional[ServiceTierDistribution] (e.g. def get_service_tier_distribution(self)
-> Optional["ServiceTierDistribution"]:) and import typing.Optional; either
import ServiceTierDistribution at module top from
aiperf.common.models.service_tier_distribution or use a forward-reference string
to avoid top-level import; keep the existing local use of
ServiceTierDistributionParser.parse and return the parsed
ServiceTierDistribution or None.

In `@src/aiperf/endpoints/openai_chat.py`:
- Around line 205-210: The current truthiness check "if service_tier :=
json_obj.get('service_tier')" in the block that builds metadata can silently
drop falsy but valid values (e.g., empty string); update the check to mirror
format_payload's behavior by testing for None (e.g., use "is not None") so
service_tier is included whenever it exists in json_obj, then return
ParsedResponse(perf_ns=response.perf_ns, data=data, usage=usage,
metadata=metadata) as before.

In `@src/aiperf/endpoints/openai_completions.py`:
- Around line 93-98: The truthiness check for service_tier uses "if service_tier
:= json_obj.get('service_tier'):" which mismatches the explicit None-check used
elsewhere (e.g., format_payload); change this to "if
json_obj.get('service_tier') is not None" (or assign first then check "is not
None") so you only include service_tier when present and allow falsy-but-valid
values (e.g., empty string or 0); update the block that builds metadata and
returns ParsedResponse (reference variables: json_obj, service_tier, metadata,
ParsedResponse) accordingly.
🧹 Nitpick comments (1)
src/aiperf/common/models/service_tier_distribution.py (1)

89-91: Consider using lazy lambda for the debug log.

Per the coding guidelines, expensive logs should use lambda: logger.debug(lambda: f"..."). While this only runs once at construction time, maintaining consistency with the guideline avoids accidental f-string evaluation when debug logging is disabled.

Suggested change
-        logger.debug(
-            f"Created service tier distribution with {len(self._entries)} entries: {self}"
-        )
+        logger.debug(
+            lambda: f"Created service tier distribution with {len(self._entries)} entries: {self}"
+        )

As per coding guidelines: src/**/*.py: Use lambda for expensive logs: self.debug(lambda: f"{self._x()}").

@codecov
Copy link

codecov bot commented Feb 11, 2026

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding multi-turn: does anyone switch service tiers mid-conversation?

I would think not, in which case sampling per-conversation seems to make more sense than per-turn.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants