feat: make rate_matching degradation factors configurable#615
feat: make rate_matching degradation factors configurable#615liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Convert module-level constants _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR and _RATE_MATCHING_DECODE_DEGRADATION_FACTOR into configurable instance attributes on DisaggInferenceSession with a dedicated setter method. Propagate these parameters through TaskConfig.advanced_tuning_config and disagg_pareto() kwargs, eliminating the need for monkey-patching. Add unit tests covering default values, setter behavior, and end-to-end parameter forwarding from TaskConfig to disagg_pareto. Signed-off-by: Yuanzhe Li <yuanli@nvidia.com> Made-with: Cursor
WalkthroughThe pull request introduces runtime-configurable rate-matching degradation factors for disaggregated inference sessions. New instance-level attributes replace hardcoded constants, with a public setter method enabling customization. Changes propagate through the inference session, Pareto analysis, and task configuration layers, with comprehensive test coverage validating the forwarding mechanism across all components. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.Add a Ruff configuration file to your project to customize how CodeRabbit runs |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/aiconfigurator/sdk/inference_session.py (1)
187-203: Add input validation for degradation factors in the public setter.Line 201 and Line 202 currently accept invalid values (e.g.,
<= 0orNaN), which can silently distort or eliminate valid rate-matching outcomes.Proposed guardrails
def set_rate_matching_degradation_factors( self, prefill_degradation_factor: float = _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR, decode_degradation_factor: float = _RATE_MATCHING_DECODE_DEGRADATION_FACTOR, ): @@ + for name, value in ( + ("prefill_degradation_factor", prefill_degradation_factor), + ("decode_degradation_factor", decode_degradation_factor), + ): + if ( + not isinstance(value, (int, float)) + or isinstance(value, bool) + or pd.isna(value) + or value <= 0 + ): + raise ValueError(f"{name} must be a positive finite number, got {value!r}") + self._rate_matching_prefill_degradation_factor = prefill_degradation_factor self._rate_matching_decode_degradation_factor = decode_degradation_factor🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiconfigurator/sdk/inference_session.py` around lines 187 - 203, The public setter set_rate_matching_degradation_factors currently assigns values that may be <=0 or NaN/Inf; add input validation in that method to ensure prefill_degradation_factor and decode_degradation_factor are floats, greater than 0 and finite (use math.isfinite or equivalent) and raise a ValueError with a clear message if validation fails, and only assign to self._rate_matching_prefill_degradation_factor and self._rate_matching_decode_degradation_factor after the checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/aiconfigurator/sdk/inference_session.py`:
- Around line 187-203: The public setter set_rate_matching_degradation_factors
currently assigns values that may be <=0 or NaN/Inf; add input validation in
that method to ensure prefill_degradation_factor and decode_degradation_factor
are floats, greater than 0 and finite (use math.isfinite or equivalent) and
raise a ValueError with a clear message if validation fails, and only assign to
self._rate_matching_prefill_degradation_factor and
self._rate_matching_decode_degradation_factor after the checks pass.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fbc04f3e-d4f4-4c89-b021-69e8a4157eb9
📒 Files selected for processing (5)
src/aiconfigurator/sdk/inference_session.pysrc/aiconfigurator/sdk/pareto_analysis.pysrc/aiconfigurator/sdk/task.pytests/unit/sdk/task/test_task.pytests/unit/sdk/test_inference_session.py
| disagg_sess = DisaggInferenceSession(prefill_database, prefill_backend, decode_database, decode_backend) | ||
| disagg_sess.set_latency_correction_scales(prefill_latency_correction_scale, decode_latency_correction_scale) | ||
|
|
||
| rate_matching_prefill = kwargs.pop("rate_matching_prefill_degradation_factor", None) |
There was a problem hiding this comment.
should we make it to 1.0 by default?
Convert module-level constants _RATE_MATCHING_PREFILL_DEGRADATION_FACTOR and _RATE_MATCHING_DECODE_DEGRADATION_FACTOR into configurable instance attributes on DisaggInferenceSession with a dedicated setter method. Propagate these parameters through TaskConfig.advanced_tuning_config and disagg_pareto() kwargs, eliminating the need for monkey-patching.
Add unit tests covering default values, setter behavior, and end-to-end parameter forwarding from TaskConfig to disagg_pareto.
Made-with: Cursor
Overview:
Previously,
_RATE_MATCHING_PREFILL_DEGRADATION_FACTOR(0.9) and_RATE_MATCHING_DECODE_DEGRADATION_FACTOR(0.92) were module-level constants inpicking.py, imported at module load time byinference_session.py. This made them impossible to override at runtime via monkey-patching—experiment scripts that attemptedpicking._RATE_MATCHING_*_DEGRADATION_FACTOR = 1.0had no effect on already-imported copies insideDisaggInferenceSession.This PR converts these constants into configurable instance attributes with a proper setter API (
set_rate_matching_degradation_factors), and threads the values throughTaskConfig.advanced_tuning_config→TaskRunner.run_disagg()→disagg_pareto()→DisaggInferenceSession, making them fully configurable without monkey-patching.Details:
inference_session.py: Add_rate_matching_prefill_degradation_factorand_rate_matching_decode_degradation_factoras instance attributes onDisaggInferenceSession.__init__(defaulting to the module constants). Addset_rate_matching_degradation_factors()setter. Replace all internal references to the module constants withself._rate_matching_*attributes in_get_disagg_summary_dfand_find_best_result_under_constraints.pareto_analysis.py:disagg_pareto()now acceptsrate_matching_prefill_degradation_factorandrate_matching_decode_degradation_factorvia**kwargs, and callsdisagg_sess.set_rate_matching_degradation_factors()when either is provided.task.py: Addrate_matching_prefill_degradation_factor: Noneandrate_matching_decode_degradation_factor: Noneto the defaultadvanced_tuning_config.TaskRunner.run_disagg()forwards these values todisagg_pareto().test_inference_session.py: AddTestRateMatchingDegradationFactorsclass (5 tests): default values, setter with both/partial args, and end-to-end impact ontokens/s/gpuoutput.test_task.py: Add default-value assertions intest_taskconfig_disagg_default. AddTestRateMatchingFactorsForwardingclass (3 tests): None forwarding, custom values, and partial override.Where should the reviewer start?
src/aiconfigurator/sdk/inference_session.py— the core change: new instance attributes and setter method (lines 175–202), and the two call sites that now useself._rate_matching_*(lines 220–221, 765–766).src/aiconfigurator/sdk/pareto_analysis.py— kwargs extraction and conditional setter call (lines 239–247).src/aiconfigurator/sdk/task.py— default config addition (lines 494–495) and forwarding inrun_disagg(lines 1346–1351).Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit