Skip to content

feat: make rate_matching degradation factors configurable#615

Open
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991:feat/configurable-rate-matching
Open

feat: make rate_matching degradation factors configurable#615
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991:feat/configurable-rate-matching

Conversation

@liyuanzhe1991
Copy link
Contributor

@liyuanzhe1991 liyuanzhe1991 commented Mar 18, 2026

Convert module-level constants _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR and _RATE_MATCHING_DECODE_DEGRADATION_FACTOR into configurable instance attributes on DisaggInferenceSession with a dedicated setter method. Propagate these parameters through TaskConfig.advanced_tuning_config and disagg_pareto() kwargs, eliminating the need for monkey-patching.
Add unit tests covering default values, setter behavior, and end-to-end parameter forwarding from TaskConfig to disagg_pareto.
Made-with: Cursor

Overview:

Previously, _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR (0.9) and _RATE_MATCHING_DECODE_DEGRADATION_FACTOR (0.92) were module-level constants in picking.py, imported at module load time by inference_session.py. This made them impossible to override at runtime via monkey-patching—experiment scripts that attempted picking._RATE_MATCHING_*_DEGRADATION_FACTOR = 1.0 had no effect on already-imported copies inside DisaggInferenceSession.
This PR converts these constants into configurable instance attributes with a proper setter API (set_rate_matching_degradation_factors), and threads the values through TaskConfig.advanced_tuning_configTaskRunner.run_disagg()disagg_pareto()DisaggInferenceSession, making them fully configurable without monkey-patching.

Details:

  • inference_session.py: Add _rate_matching_prefill_degradation_factor and _rate_matching_decode_degradation_factor as instance attributes on DisaggInferenceSession.__init__ (defaulting to the module constants). Add set_rate_matching_degradation_factors() setter. Replace all internal references to the module constants with self._rate_matching_* attributes in _get_disagg_summary_df and _find_best_result_under_constraints.
  • pareto_analysis.py: disagg_pareto() now accepts rate_matching_prefill_degradation_factor and rate_matching_decode_degradation_factor via **kwargs, and calls disagg_sess.set_rate_matching_degradation_factors() when either is provided.
  • task.py: Add rate_matching_prefill_degradation_factor: None and rate_matching_decode_degradation_factor: None to the default advanced_tuning_config. TaskRunner.run_disagg() forwards these values to disagg_pareto().
  • test_inference_session.py: Add TestRateMatchingDegradationFactors class (5 tests): default values, setter with both/partial args, and end-to-end impact on tokens/s/gpu output.
  • test_task.py: Add default-value assertions in test_taskconfig_disagg_default. Add TestRateMatchingFactorsForwarding class (3 tests): None forwarding, custom values, and partial override.

Where should the reviewer start?

  • src/aiconfigurator/sdk/inference_session.py — the core change: new instance attributes and setter method (lines 175–202), and the two call sites that now use self._rate_matching_* (lines 220–221, 765–766).
  • src/aiconfigurator/sdk/pareto_analysis.py — kwargs extraction and conditional setter call (lines 239–247).
  • src/aiconfigurator/sdk/task.py — default config addition (lines 494–495) and forwarding in run_disagg (lines 1346–1351).

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Relates to: disagg rate-matching configurability improvement

Summary by CodeRabbit

  • New Features
    • Added customizable rate-matching degradation factors for disaggregated inference optimization. Users can now configure prefill and decode degradation factors through task configuration or runtime API calls to fine-tune performance analysis.

Convert module-level constants _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR
and _RATE_MATCHING_DECODE_DEGRADATION_FACTOR into configurable instance
attributes on DisaggInferenceSession with a dedicated setter method.
Propagate these parameters through TaskConfig.advanced_tuning_config and
disagg_pareto() kwargs, eliminating the need for monkey-patching.

Add unit tests covering default values, setter behavior, and end-to-end
parameter forwarding from TaskConfig to disagg_pareto.

Signed-off-by: Yuanzhe Li <yuanli@nvidia.com>
Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Walkthrough

The pull request introduces runtime-configurable rate-matching degradation factors for disaggregated inference sessions. New instance-level attributes replace hardcoded constants, with a public setter method enabling customization. Changes propagate through the inference session, Pareto analysis, and task configuration layers, with comprehensive test coverage validating the forwarding mechanism across all components.

Changes

Cohort / File(s) Summary
Inference Session Core Logic
src/aiconfigurator/sdk/inference_session.py
Added _rate_matching_prefill_degradation_factor and _rate_matching_decode_degradation_factor instance fields initialized from module constants. Introduced set_rate_matching_degradation_factors() public method for runtime customization. Updated internal methods to use instance attributes instead of constants.
Configuration & Analysis Threading
src/aiconfigurator/sdk/task.py, src/aiconfigurator/sdk/pareto_analysis.py
Extended task config with optional degradation factor fields in advanced_tuning_config. Threaded factors through disaggregated worker config construction and Pareto analysis invocation. Added conditional parameter passing for rate-matching customization.
Test Coverage
tests/unit/sdk/task/test_task.py, tests/unit/sdk/test_inference_session.py
Comprehensive test suites validating degradation factor defaults, custom value forwarding from task config to Pareto analysis, partial overrides, and propagation into inference result comparison logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Degradation factors now dance at runtime's command,
No longer trapped in constants, but malleable and grand!
Through sessions, tasks, and analysis they thread with grace,
Customizable throughput metrics in every place.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: making rate_matching degradation factors configurable, which is the primary objective of the PR.
Description check ✅ Passed The description follows the required template with all sections completed: Overview explains the motivation, Details breaks down changes by file with line references, Where should the reviewer start identifies key files, and Related Issues is included.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/aiconfigurator/sdk/inference_session.py (1)

187-203: Add input validation for degradation factors in the public setter.

Line 201 and Line 202 currently accept invalid values (e.g., <= 0 or NaN), which can silently distort or eliminate valid rate-matching outcomes.

Proposed guardrails
 def set_rate_matching_degradation_factors(
     self,
     prefill_degradation_factor: float = _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR,
     decode_degradation_factor: float = _RATE_MATCHING_DECODE_DEGRADATION_FACTOR,
 ):
@@
+    for name, value in (
+        ("prefill_degradation_factor", prefill_degradation_factor),
+        ("decode_degradation_factor", decode_degradation_factor),
+    ):
+        if (
+            not isinstance(value, (int, float))
+            or isinstance(value, bool)
+            or pd.isna(value)
+            or value <= 0
+        ):
+            raise ValueError(f"{name} must be a positive finite number, got {value!r}")
+
     self._rate_matching_prefill_degradation_factor = prefill_degradation_factor
     self._rate_matching_decode_degradation_factor = decode_degradation_factor
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/inference_session.py` around lines 187 - 203, The
public setter set_rate_matching_degradation_factors currently assigns values
that may be <=0 or NaN/Inf; add input validation in that method to ensure
prefill_degradation_factor and decode_degradation_factor are floats, greater
than 0 and finite (use math.isfinite or equivalent) and raise a ValueError with
a clear message if validation fails, and only assign to
self._rate_matching_prefill_degradation_factor and
self._rate_matching_decode_degradation_factor after the checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/aiconfigurator/sdk/inference_session.py`:
- Around line 187-203: The public setter set_rate_matching_degradation_factors
currently assigns values that may be <=0 or NaN/Inf; add input validation in
that method to ensure prefill_degradation_factor and decode_degradation_factor
are floats, greater than 0 and finite (use math.isfinite or equivalent) and
raise a ValueError with a clear message if validation fails, and only assign to
self._rate_matching_prefill_degradation_factor and
self._rate_matching_decode_degradation_factor after the checks pass.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fbc04f3e-d4f4-4c89-b021-69e8a4157eb9

📥 Commits

Reviewing files that changed from the base of the PR and between 57857b5 and 60ee9e9.

📒 Files selected for processing (5)
  • src/aiconfigurator/sdk/inference_session.py
  • src/aiconfigurator/sdk/pareto_analysis.py
  • src/aiconfigurator/sdk/task.py
  • tests/unit/sdk/task/test_task.py
  • tests/unit/sdk/test_inference_session.py

disagg_sess = DisaggInferenceSession(prefill_database, prefill_backend, decode_database, decode_backend)
disagg_sess.set_latency_correction_scales(prefill_latency_correction_scale, decode_latency_correction_scale)

rate_matching_prefill = kwargs.pop("rate_matching_prefill_degradation_factor", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it to 1.0 by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants