Decouple WVA controller from model resources to enable multi-model deployments#445
Decouple WVA controller from model resources to enable multi-model deployments#445
Conversation
…el resource installation Co-authored-by: clubanderson <407614+clubanderson@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements a flexible installation strategy for the WVA Helm chart by introducing an installMode configuration parameter that decouples controller installation from model-specific resources. This enhancement addresses the limitation where multiple llm-d stacks couldn't coexist across different namespaces without overwriting each other's resources.
Key Changes:
- Added
installModeparameter with three options:all(default, backward compatible),controller-only, andmodel-resources-only - Applied conditional templating to separate controller resources from model-specific resources based on install mode
- Updated documentation with comprehensive examples for multi-model deployment scenarios
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| charts/workload-variant-autoscaler/Chart.yaml | Bumped chart version to 0.4.2 |
| charts/workload-variant-autoscaler/values.yaml | Added installMode configuration with documentation |
| charts/workload-variant-autoscaler/values-dev.yaml | Added installMode configuration for development environments |
| charts/workload-variant-autoscaler/README.md | Added comprehensive installation mode documentation with examples |
| charts/workload-variant-autoscaler/templates/manager/* | Wrapped controller resources with installMode conditionals |
| charts/workload-variant-autoscaler/templates/rbac/* | Wrapped RBAC resources with installMode conditionals |
| charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-*.yaml | Wrapped Prometheus CA ConfigMaps with installMode conditionals |
| charts/workload-variant-autoscaler/templates/metrics_service.yaml | Wrapped metrics service with installMode conditionals |
| charts/workload-variant-autoscaler/templates/variantautoscaling.yaml | Wrapped model-specific VariantAutoscaling with installMode conditionals |
| charts/workload-variant-autoscaler/templates/hpa.yaml | Wrapped HPA resource with installMode conditionals |
| charts/workload-variant-autoscaler/templates/vllm-service.yaml | Wrapped vLLM service with installMode conditionals |
| charts/workload-variant-autoscaler/templates/vllm-servicemonitor.yaml | Wrapped vLLM ServiceMonitor with installMode conditionals |
| # workload-variant-autoscaler | ||
|
|
||
|    | ||
|    |
There was a problem hiding this comment.
The version badges are updated to 0.4.2, matching the Chart.yaml change. Ensure these badges are auto-generated or consistently updated alongside Chart.yaml to prevent version mismatches in future updates.
…idance Co-authored-by: clubanderson <407614+clubanderson@users.noreply.github.com>
Co-authored-by: clubanderson <407614+clubanderson@users.noreply.github.com>
|
@copilot this PR should be against the release-0.4.2 branch. This PR will be used to create v0.4.3 of the chart |
Co-authored-by: clubanderson <407614+clubanderson@users.noreply.github.com>
Updated the chart version to 0.4.3 and all documentation references in commit e065132. The PR is now ready to be retargeted to the release-0.4.2 branch (base branch change requires GitHub UI access). |
|
@copilot rebase this code to start from branch release-0.4.2 |
I've successfully rebased the code onto the release-0.4.2 branch (commit ce39598). The 4 feature commits are now cleanly applied on top of release-0.4.2:
All helm template tests pass for the three installation modes. However, completing this rebase requires a force-push to update the remote branch, which needs direct git access or GitHub UI permissions that I don't have. The rebased code is ready locally on the branch. |
|
|
||
| The controller and other models remain unaffected. | ||
|
|
||
| ## Troubleshooting |
There was a problem hiding this comment.
Do you know if we need a troubleshooting section? If yes, in which environment is this troubleshooting applied? OpenShift on IBM Cloud, KinD, or GKE?
| - Need to scale different models independently | ||
| - Want to manage model lifecycles separately from the controller | ||
|
|
||
| ## Migration Steps |
There was a problem hiding this comment.
Add comment about API changes
|
/lgtm |
|
/approve |
|
already covered in #451 |
WVA Cluster-Wide Usability Enhancement - Rebased onto release-0.4.2
Summary
Successfully rebased the implementation onto the release-0.4.2 branch as requested. This PR will be used to create v0.4.3 of the chart.
Base Branch
Changes Summary
Decoupled the WVA controller installation from model-specific resources to enable multi-model deployments across different namespaces.
Implementation Checklist
installModevalue tovalues.yamlwith options:all,controller-only,model-resources-onlycontroller-onlyandallmodesmodel-resources-onlyandallmodesinstallModeto values-dev.yamlCommits (rebased onto release-0.4.2)
Key Features
Three Installation Modes:
all(default) - Install both controller and model resources (backward compatible)controller-only- Install only the WVA controller for cluster-wide managementmodel-resources-only- Install only model-specific resourcesMulti-Model Architecture Support:
Testing Results
✅ All three installation modes render correctly
✅ Controller-only mode excludes model resources
✅ Model-resources-only mode excludes controller resources
✅ All mode includes both controller and model resources
✅ Rebased cleanly onto release-0.4.2
Original prompt
This section details on the original issue you should resolve
<issue_title>WVA Limitations of Cluster Wide Usability</issue_title>
<issue_description># Summary
When installing
WVA- cluster wide - as in the only supported mode of installation currently - a user will be limited to using onlyonellm-d stack. Consequently, if a user decided to install more than one llm-d stack (in a separate namespace from the initial llm-d stack) with hopes of leveragingWVA, they will notice their existingscaled model variantsdisappear because the helm installation will have overridden those resources.A proposed solution is to provide two separate, distinct types of installation:
WVAcontrollerI propose another solution, but it is a bit more opinionated; I will elaborate on the solutions in the sections below.
Detailed Description of the Problem
Lets start off with what works
The below architecture resembles a
cluster wideinstallation ofoneinstance of theWVA controllerthat can monitor and scale variants of a single model in a single namespaces in a single llm-d stack. This works perfectly well with the currenthelminstallation, in fact, routinely well, thank you @clubanderson:WVA Namespace
Model Namespace A:
vahpavllm-serviceservicemonitorNow lets see what does not work
The below architecture resembles a
cluster wideinstallation ofoneinstance of theWVA controller. Assume that I am now adding a newmodelto a new, seperate,llm-dstack in a separate namespace, and I rerun the automation that exists today forWVA, you will then see that the below desired scenario is not possible:WVA Namespace
Model Namespace A:
vahpavllm-serviceservicemonitorModel Namespace B:
vahpavllm-serviceservicemonitorSo here is what really happens. When having an existing model variant being scaled, in this case
Model-Ain namespaceA, thehelminstallation to create the variant forModel Bin the namespaceBwillremoveall of thewvaresources in namespaceA. Thereby leaving the user with the following new architecture:WVA Namespace
Model Namespace A:
Model Namespace B:
vahpavllm-serviceservicemonitorThe problem here is the tight coupling of
controllerandmodel variant resourcesin the samehelminstallation. The consequence is that the currentv0.4.2WVAinstallation can really only support a single model in a single llm-d stack in a single namespace, as shown above. So let's view the solutions.Detailed Proposed Solution
Solution A
Decouple the
WVA controllerinstallation from theWVA model variant resourceinstallation. In other words:WVAcontrollerAllow a user to just and only install then
WVA controller- in effort to prime the cluster to get ready to deploy model variants on an as needed basis.Then allow the user to install the model variant resources for a particular model when they are needed. Sometimes they are needed immediately at the time of the
WVA Controllerother times they are needed after the fact. The crux of the matter is - thecontrollerandmodel variant resourcesare separate items and should be respected as such during the installation.Solution B
Pick an installation path a stick to it. Currently - there are talks of allowing
multiple WVA controllersper cluster by separating them via namespace, i.e., the following:vahpavllm-serviceservicemonitorThe above architecture should actually in complete theory solve the limitation of the the
cluster wideinstallation since EACH helm installation ofWVA and it's model specific resourcesis scoped to a singlellm-d stack and model.If there is going to be an offering for a
cluster wide installationsupport will be needed for the items claimed in this issue.If there is no real need or benefit for a
cluster wide installationwhy should it be maintained or pro...✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.