feat: wire V2 saturation analyzer into engine, gated by analyzerName#695
Open
ev-shindin wants to merge 2 commits intollm-d:mainfrom
Open
feat: wire V2 saturation analyzer into engine, gated by analyzerName#695ev-shindin wants to merge 2 commits intollm-d:mainfrom
ev-shindin wants to merge 2 commits intollm-d:mainfrom
Conversation
b2b903d to
567a44f
Compare
Integrate the V2 token-based saturation analyzer into the optimization engine behind a config gate (analyzerName: "saturation"). When active, it replaces the V1 percentage-based analyzer inside RunSaturationAnalysis while keeping the rest of the pipeline (enforcer, limiter, decision converter) unchanged via an adapter pattern. Also introduces the CostAwareOptimizer — the first ScalingOptimizer implementation for the V2 pipeline — which handles unlimited-mode multi-variant scaling with cost-based replica allocation. Engine integration: - Add saturationV2Analyzer, capacityStore, and optimizer fields to Engine struct, initialized once in NewEngine() - Gate V2 path in optimize() via analyzerName == "saturation" from global config - optimizeV2() three-stage pipeline: collect ModelScalingRequests, call optimizer.Optimize(), apply enforcer per-model via bridge - Enforcer bridge: extractTargetsFromDecisions, buildVariantAnalysesFromDecisions, applyEnforcedTargetsToDecisions CostAwareOptimizer (unlimited mode): - Scale-up: allocate to most cost-efficient variant (lowest cost/perReplicaCapacity). Variants with pending replicas are NOT skipped — the analyzer already accounts for their capacity in the supply calculation, so RequiredCapacity > 0 means demand exceeds total supply including pending. - Scale-down: remove from most expensive variant (highest absolute cost). The cheapest variant is protected at min 1 replica only when it is the last variant with replicas — this prevents scale-down deadlocks where the expensive variant's per-replica capacity exceeds spare but cheaper replicas could be removed. - Skips variants with zero capacity Limiter infrastructure: - ResourcePool, ResourceConstraints, ConstraintProvider interface for future V2 limited-mode path (GreedyBySaturationOptimizer) - DefaultLimiter implements ConstraintProvider via ComputeConstraints() - TypeInventory.GetResourcePools() for per-type resource availability
567a44f to
74afe4c
Compare
Replace V(1) calls with V(logging.DEBUG) in cost_aware_optimizer.go, engine.go, and engine_v2.go for better readability per review feedback.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire the V2 token-based saturation analyzer into the optimization engine, gated by
analyzerName: "saturation"in the saturation scaling config. When active, it replaces the V1 percentage-based analyzer while keeping the rest of the pipeline (enforcer, limiter, decision converter) unchanged via an adapter pattern.This PR also introduces the
CostAwareOptimizer— the firstScalingOptimizerimplementation for the V2 pipeline — which handles unlimited-mode multi-variant scaling with cost-based replica allocation.Base branch:
main(after PR #689 merge)Depends on: PR #689 (saturation_v2 package)
Changes
Engine Integration (
internal/engines/saturation/)engine.go: Refactoroptimize()intooptimizeV1()andoptimizeV2(), gated byanalyzerName == "saturation"from global configsaturationV2Analyzer,capacityStore,optimizer— initialized once inNewEngine()optimizeV2()three-stage pipeline:ModelScalingRequests(run V2 analyzer per model, pre-populate capacity store from deployments)optimizer.Optimize()across all modelsengine_v2.go):extractTargetsFromDecisions,buildVariantAnalysesFromDecisions,applyEnforcedTargetsToDecisions— adapts V2 optimizer output to existing V1 enforcer interfacee.ConfigDI pattern with namespace-aware config loading (SaturationConfigForNamespace,ScaleToZeroConfigForNamespace)CostAwareOptimizer (
internal/engines/pipeline/)optimizer_interfaces.go:ScalingOptimizerinterface andModelScalingRequesttypecost_aware_optimizer.go: Unlimited-mode optimizer that processes each model independently:cost / perReplicaCapacity). Variants with pending replicas are not skipped — the analyzer already accounts for their capacity in the supply calculation, soRequiredCapacity > 0means demand exceeds total supply including pending.Limiter Infrastructure (
internal/engines/pipeline/)limiter_interfaces.go: NewResourcePool,ResourceConstraints, andConstraintProviderinterface — enables V2 limited-mode path (futureGreedyBySaturationOptimizer)default_limiter.go:DefaultLimiternow implementsConstraintProviderviaComputeConstraints()(V2 path) alongside existingLimiter.Limit()(V1 path)type_inventory.go: AddedGetResourcePools()toInventoryinterface andTypeInventoryimplementationWhat stays unchanged
Enforcer.EnforcePolicy()[]VariantSaturationAnalysis— provided by bridge adapterconvertSaturationTargetsToDecisions()GPULimiter.Limit()applySaturationDecisions()[]VariantDecision— unchangedsaturation_v2package