feat: wire V2 saturation analyzer into engine, gated by analyzerName by ev-shindin · Pull Request #695 · llm-d/llm-d-workload-variant-autoscaler

ev-shindin · 2026-02-12T11:11:57Z

Summary

Wire the V2 token-based saturation analyzer into the optimization engine, gated by analyzerName: "saturation" in the saturation scaling config. When active, it replaces the V1 percentage-based analyzer while keeping the rest of the pipeline (enforcer, limiter, decision converter) unchanged via an adapter pattern.

This PR also introduces the CostAwareOptimizer — the first ScalingOptimizer implementation for the V2 pipeline — which handles unlimited-mode multi-variant scaling with cost-based replica allocation.

Base branch: main (after PR #689 merge)
Depends on: PR #689 (saturation_v2 package)

Changes

Engine Integration (`internal/engines/saturation/`)

V1/V2 split in engine.go: Refactor optimize() into optimizeV1() and optimizeV2(), gated by analyzerName == "saturation" from global config
V2 fields on Engine struct: saturationV2Analyzer, capacityStore, optimizer — initialized once in NewEngine()
optimizeV2() three-stage pipeline:
1. Collect ModelScalingRequests (run V2 analyzer per model, pre-populate capacity store from deployments)
2. Call optimizer.Optimize() across all models
3. Apply enforcer per-model via bridge functions
Enforcer bridge (engine_v2.go): extractTargetsFromDecisions, buildVariantAnalysesFromDecisions, applyEnforcedTargetsToDecisions — adapts V2 optimizer output to existing V1 enforcer interface
Config pattern: Follows upstream's e.Config DI pattern with namespace-aware config loading (SaturationConfigForNamespace, ScaleToZeroConfigForNamespace)

CostAwareOptimizer (`internal/engines/pipeline/`)

optimizer_interfaces.go: ScalingOptimizer interface and ModelScalingRequest type
cost_aware_optimizer.go: Unlimited-mode optimizer that processes each model independently:
- Scale-up: allocates replicas to most cost-efficient variant (lowest cost / perReplicaCapacity). Variants with pending replicas are not skipped — the analyzer already accounts for their capacity in the supply calculation, so RequiredCapacity > 0 means demand exceeds total supply including pending.
- Scale-down: removes replicas from most expensive variant (highest absolute cost). The cheapest variant is protected at min 1 replica only when it is the last variant with replicas — this prevents scale-down deadlocks where the expensive variant's per-replica capacity exceeds spare but cheaper replicas could be removed.
- Skips variants with zero capacity

Limiter Infrastructure (`internal/engines/pipeline/`)

limiter_interfaces.go: New ResourcePool, ResourceConstraints, and ConstraintProvider interface — enables V2 limited-mode path (future GreedyBySaturationOptimizer)
default_limiter.go: DefaultLimiter now implements ConstraintProvider via ComputeConstraints() (V2 path) alongside existing Limiter.Limit() (V1 path)
type_inventory.go: Added GetResourcePools() to Inventory interface and TypeInventory implementation

What stays unchanged

Component	Why
`Enforcer.EnforcePolicy()`	Consumes `[]VariantSaturationAnalysis` — provided by bridge adapter
`convertSaturationTargetsToDecisions()`	V1-only path, not touched
`GPULimiter.Limit()`	Applied globally after both V1 and V2 paths
`applySaturationDecisions()`	Consumes `[]VariantDecision` — unchanged
`saturation_v2` package	No changes to analyzer logic

Integrate the V2 token-based saturation analyzer into the optimization engine behind a config gate (analyzerName: "saturation"). When active, it replaces the V1 percentage-based analyzer inside RunSaturationAnalysis while keeping the rest of the pipeline (enforcer, limiter, decision converter) unchanged via an adapter pattern. Also introduces the CostAwareOptimizer — the first ScalingOptimizer implementation for the V2 pipeline — which handles unlimited-mode multi-variant scaling with cost-based replica allocation. Engine integration: - Add saturationV2Analyzer, capacityStore, and optimizer fields to Engine struct, initialized once in NewEngine() - Gate V2 path in optimize() via analyzerName == "saturation" from global config - optimizeV2() three-stage pipeline: collect ModelScalingRequests, call optimizer.Optimize(), apply enforcer per-model via bridge - Enforcer bridge: extractTargetsFromDecisions, buildVariantAnalysesFromDecisions, applyEnforcedTargetsToDecisions CostAwareOptimizer (unlimited mode): - Scale-up: allocate to most cost-efficient variant (lowest cost/perReplicaCapacity). Variants with pending replicas are NOT skipped — the analyzer already accounts for their capacity in the supply calculation, so RequiredCapacity > 0 means demand exceeds total supply including pending. - Scale-down: remove from most expensive variant (highest absolute cost). The cheapest variant is protected at min 1 replica only when it is the last variant with replicas — this prevents scale-down deadlocks where the expensive variant's per-replica capacity exceeds spare but cheaper replicas could be removed. - Skips variants with zero capacity Limiter infrastructure: - ResourcePool, ResourceConstraints, ConstraintProvider interface for future V2 limited-mode path (GreedyBySaturationOptimizer) - DefaultLimiter implements ConstraintProvider via ComputeConstraints() - TypeInventory.GetResourcePools() for per-type resource availability

internal/engines/pipeline/cost_aware_optimizer.go

internal/engines/saturation/engine.go

internal/engines/saturation/engine_v2.go

Replace V(1) calls with V(logging.DEBUG) in cost_aware_optimizer.go, engine.go, and engine_v2.go for better readability per review feedback.

ev-shindin requested a review from lionelvillard February 12, 2026 11:12

ev-shindin assigned ev-shindin and unassigned ev-shindin Feb 12, 2026

ev-shindin force-pushed the saturation-v2-engine-integration-impl branch from b2b903d to 567a44f Compare February 12, 2026 13:48

ev-shindin force-pushed the saturation-v2-engine-integration-impl branch from 567a44f to 74afe4c Compare February 12, 2026 14:58

lionelvillard reviewed Feb 15, 2026

View reviewed changes

internal/engines/pipeline/cost_aware_optimizer.go Outdated Show resolved Hide resolved

lionelvillard reviewed Feb 15, 2026

View reviewed changes

internal/engines/saturation/engine.go Show resolved Hide resolved

lionelvillard reviewed Feb 15, 2026

View reviewed changes

internal/engines/saturation/engine_v2.go Show resolved Hide resolved

fix: use logging level constants instead of raw numeric values

24b89e5

Replace V(1) calls with V(logging.DEBUG) in cost_aware_optimizer.go, engine.go, and engine_v2.go for better readability per review feedback.

ev-shindin requested a review from lionelvillard February 15, 2026 07:17

ev-shindin linked an issue Feb 15, 2026 that may be closed by this pull request

Enhance saturation detection in lieu of flow controller enablement in inference scheduler #665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire V2 saturation analyzer into engine, gated by analyzerName#695

feat: wire V2 saturation analyzer into engine, gated by analyzerName#695
ev-shindin wants to merge 2 commits intollm-d:mainfrom
ev-shindin:saturation-v2-engine-integration-impl

ev-shindin commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ev-shindin commented Feb 12, 2026

Summary

Changes

Engine Integration (internal/engines/saturation/)

CostAwareOptimizer (internal/engines/pipeline/)

Limiter Infrastructure (internal/engines/pipeline/)

What stays unchanged

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Engine Integration (`internal/engines/saturation/`)

CostAwareOptimizer (`internal/engines/pipeline/`)

Limiter Infrastructure (`internal/engines/pipeline/`)