Changes to move to new design by asm582 · Pull Request #11 · llm-d/llm-d-workload-variant-autoscaler

asm582 · 2025-06-26T01:07:10Z

PR to implement the inferno controller using controller-runtime based on community review. This has a dummy optimizer implementation.

…and amd gpus Signed-off-by: Harold Ship <harold@il.ibm.com>

Signed-off-by: Harold Ship <harold@il.ibm.com>

atantawi · 2025-07-14T20:30:48Z

Tested successfully by running on a local kind cluster.
Ran both the vllme loadgen and the variantautoscaling controller locally, external to the cluster.
Since, currently, the Prometheus address is hard-coded in the code, two changes had to be made:

vllm_emulator: in client.py base_url="http://localhost:8000/v1"
inferno-autoscaler: in variantautoscaling_controller.go Address: "http://localhost:9090"

/lgtm

asm582 · 2025-07-14T20:38:08Z

Tested successfully by running on a local kind cluster. Ran both the vllme loadgen and the variantautoscaling controller locally, external to the cluster. Since, currently, the Prometheus address is hard-coded in the code, two changes had to be made:

vllm_emulator: in client.py base_url="http://localhost:8000/v1"

inferno-autoscaler: in variantautoscaling_controller.go Address: "http://localhost:9090"

/lgtm

Thanks, let me create an issue about it.

atantawi

Tested successfully on local cluster.

clubanderson · 2026-02-14T20:24:25Z

Hey — just a heads up that the NumReplicas int field definition in OptimizedAlloc ended up causing a subtle issue downstream (#731). Because it's a bare int without omitempty, the CRD generator marks it as required. This became a problem when later PRs introduced MergeFrom patches (#460) and scale-to-zero (#585) — the JSON merge patch omits the zero value, and the API server rejects the partial object.

The idiomatic Kubernetes pattern would be NumReplicas *int32 with json:"numReplicas,omitempty" (similar to how HPA defines minReplicas). We've got a short-term fix in PR #721 and will look at aligning the type definition as a follow-up. Just noting it here for context!

asm582 and others added 22 commits June 25, 2025 21:02

changes to move to new design

6c5344f

remove duplicate structs

820fd30

add details about inferno

3275a63

add actuator interface

42d1add

update docs

db7a4d3

add optimizer folder

f779478

remo duplicate API structs

f8d3d35

add modelanalyzer folder

ec40337

add actuator folder

afe16e1

updates types.go based on inferno APIs

5e702d3

add sample

c914a4a

changes for new API

4bd1e63

update readme to spawn cluster

14e96b7

change install, fix domain and create inventory in reconcile

d21cd32

wait for deployments to spawn

00e568d

update deploy/local-cluster.sh to 3-node cluster with nvidia, intel, …

3b02c6f

…and amd gpus Signed-off-by: Harold Ship <harold@il.ibm.com>

Merge branch 'fake-gpus' into move_ctrl_rntm

b2f89a0

add comment for grouping

766c565

always requeue and comments

ec49265

use wall-clock time to wake up reconciler

3e57576

add configmap to run periodic opt

6701c45

move collection to collector.go, collect also amd and intel gpu info

752c173

Signed-off-by: Harold Ship <harold@il.ibm.com>

haroldship force-pushed the move_ctrl_rntm branch from 18c394d to 752c173 Compare July 3, 2025 16:53

asm582 and others added 7 commits July 3, 2025 13:42

add configmap

0035507

wait for manager to start

4978600

enable vllm metrics scraping

3326bbc

add sample vllm deployment

44155b0

changes to update status

3df22e2

rem duplicate log imports

496681c

start consuming promethues and query metrics

38d11c8

asm582 and others added 8 commits July 9, 2025 11:36

enable vllme

c197cb0

remove Collector class

6f372fb

Signed-off-by: Harold Ship <harold@il.ibm.com>

add cms and API changes

4e9aa4e

add dummy optimizer

af64b47

read slo from cm

798a047

add actuator

06d4f98

update readme

e269e26

add owner ref

e1e9193

asm582 marked this pull request as ready for review July 13, 2025 20:50

asm582 added 2 commits July 14, 2025 13:46

rename api

5e4df29

remove old deployments

01dbd5f

atantawi self-requested a review July 14, 2025 20:30

add collector to its own folder

1d822cc

asm582 mentioned this pull request Jul 14, 2025

Address comments in this issue #12

Closed

atantawi approved these changes Jul 14, 2025

View reviewed changes

atantawi merged commit 51e5a39 into llm-d:dev Jul 14, 2025

clubanderson mentioned this pull request Feb 14, 2026

🐛 Scale-to-zero status update fails: numReplicas Required value error with MergeFrom patch #731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to move to new design#11

Changes to move to new design#11
atantawi merged 40 commits intollm-d:devfrom
asm582:move_ctrl_rntm

asm582 commented Jun 26, 2025 •

edited

Loading

Uh oh!

atantawi commented Jul 14, 2025

Uh oh!

asm582 commented Jul 14, 2025

Uh oh!

atantawi left a comment

Uh oh!

clubanderson commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

asm582 commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atantawi commented Jul 14, 2025

Uh oh!

asm582 commented Jul 14, 2025

Uh oh!

atantawi left a comment

Choose a reason for hiding this comment

Uh oh!

clubanderson commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asm582 commented Jun 26, 2025 •

edited

Loading