feat: add --service-tier-dist for per-request service_tier distribution#675
feat: add --service-tier-dist for per-request service_tier distribution#675ajcasagrande wants to merge 3 commits intomainfrom
Conversation
Allows users to specify a distribution of service_tier values (e.g. `default:50;flex:30;priority:20`) that get sampled per turn and sent in OpenAI chat/completions payloads. The response service_tier is extracted into record metadata for downstream analysis. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@808d652a48346325d7a339dc757fa6b39efc1ea5Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@808d652a48346325d7a339dc757fa6b39efc1ea5Last updated for commit: |
|
Note: currently does not add support for grouping the results by service_tier. I think that one fits in more with my MetricsAccumulator refactoring |
WalkthroughAdds service-tier distribution support: a CLI option to specify tier probabilities, models and parser for distributions with sampling, config validators and accessors, endpoints updated to send/receive service_tier, dataset composer assigns sampled tiers to turns, metadata recorded, and tests/docs added. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/cli_options.md`:
- Around line 251-253: Update the docs and/or validation to make behavior
consistent: either relax the documentation text in the CLI docs and the
Field(description=...) used in input_config.py to say "Common tiers" or "Example
tiers" (listing auto, default, flex, scale, priority) OR add whitelist
validation inside ServiceTierEntry to only accept those specific strings; ensure
both the docs string (the Field(description=...) in input_config.py) and the CLI
docs entry for --service-tier-dist match the chosen approach so behavior and
documentation are consistent with the ServiceTierEntry validation.
In `@src/aiperf/common/config/input_config.py`:
- Around line 377-384: Add an explicit return type to
get_service_tier_distribution: annotate it as returning
Optional[ServiceTierDistribution] (e.g. def get_service_tier_distribution(self)
-> Optional["ServiceTierDistribution"]:) and import typing.Optional; either
import ServiceTierDistribution at module top from
aiperf.common.models.service_tier_distribution or use a forward-reference string
to avoid top-level import; keep the existing local use of
ServiceTierDistributionParser.parse and return the parsed
ServiceTierDistribution or None.
In `@src/aiperf/endpoints/openai_chat.py`:
- Around line 205-210: The current truthiness check "if service_tier :=
json_obj.get('service_tier')" in the block that builds metadata can silently
drop falsy but valid values (e.g., empty string); update the check to mirror
format_payload's behavior by testing for None (e.g., use "is not None") so
service_tier is included whenever it exists in json_obj, then return
ParsedResponse(perf_ns=response.perf_ns, data=data, usage=usage,
metadata=metadata) as before.
In `@src/aiperf/endpoints/openai_completions.py`:
- Around line 93-98: The truthiness check for service_tier uses "if service_tier
:= json_obj.get('service_tier'):" which mismatches the explicit None-check used
elsewhere (e.g., format_payload); change this to "if
json_obj.get('service_tier') is not None" (or assign first then check "is not
None") so you only include service_tier when present and allow falsy-but-valid
values (e.g., empty string or 0); update the block that builds metadata and
returns ParsedResponse (reference variables: json_obj, service_tier, metadata,
ParsedResponse) accordingly.
🧹 Nitpick comments (1)
src/aiperf/common/models/service_tier_distribution.py (1)
89-91: Consider using lazy lambda for the debug log.Per the coding guidelines, expensive logs should use lambda:
logger.debug(lambda: f"..."). While this only runs once at construction time, maintaining consistency with the guideline avoids accidental f-string evaluation when debug logging is disabled.Suggested change
- logger.debug( - f"Created service tier distribution with {len(self._entries)} entries: {self}" - ) + logger.debug( + lambda: f"Created service tier distribution with {len(self._entries)} entries: {self}" + )As per coding guidelines:
src/**/*.py: Use lambda for expensive logs:self.debug(lambda: f"{self._x()}").
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Regarding multi-turn: does anyone switch service tiers mid-conversation?
I would think not, in which case sampling per-conversation seems to make more sense than per-turn.
Thoughts?
Allows users to specify a distribution of service_tier values (e.g.
default:50;flex:30;priority:20) that get sampled per turn and sent in OpenAI chat/completions payloads. The response service_tier is extracted into record metadata for downstream analysis.Summary by CodeRabbit
New Features
Validation
Tests
Documentation