Skip to content

Changes to move to new design#11

Merged
atantawi merged 40 commits intollm-d:devfrom
asm582:move_ctrl_rntm
Jul 14, 2025
Merged

Changes to move to new design#11
atantawi merged 40 commits intollm-d:devfrom
asm582:move_ctrl_rntm

Conversation

@asm582
Copy link
Collaborator

@asm582 asm582 commented Jun 26, 2025

PR to implement the inferno controller using controller-runtime based on community review. This has a dummy optimizer implementation.

@asm582 asm582 marked this pull request as ready for review July 13, 2025 20:50
@atantawi
Copy link
Collaborator

Tested successfully by running on a local kind cluster.
Ran both the vllme loadgen and the variantautoscaling controller locally, external to the cluster.
Since, currently, the Prometheus address is hard-coded in the code, two changes had to be made:

  • vllm_emulator: in client.py base_url="http://localhost:8000/v1"
  • inferno-autoscaler: in variantautoscaling_controller.go Address: "http://localhost:9090"

/lgtm

@atantawi atantawi self-requested a review July 14, 2025 20:30
@asm582
Copy link
Collaborator Author

asm582 commented Jul 14, 2025

Tested successfully by running on a local kind cluster. Ran both the vllme loadgen and the variantautoscaling controller locally, external to the cluster. Since, currently, the Prometheus address is hard-coded in the code, two changes had to be made:

  • vllm_emulator: in client.py base_url="http://localhost:8000/v1"
  • inferno-autoscaler: in variantautoscaling_controller.go Address: "http://localhost:9090"

/lgtm

Thanks, let me create an issue about it.

Copy link
Collaborator

@atantawi atantawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully on local cluster.

@clubanderson
Copy link
Contributor

Hey — just a heads up that the NumReplicas int field definition in OptimizedAlloc ended up causing a subtle issue downstream (#731). Because it's a bare int without omitempty, the CRD generator marks it as required. This became a problem when later PRs introduced MergeFrom patches (#460) and scale-to-zero (#585) — the JSON merge patch omits the zero value, and the API server rejects the partial object.

The idiomatic Kubernetes pattern would be NumReplicas *int32 with json:"numReplicas,omitempty" (similar to how HPA defines minReplicas). We've got a short-term fix in PR #721 and will look at aligning the type definition as a follow-up. Just noting it here for context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants