Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions charts/workload-variant-autoscaler/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ apiVersion: v2
name: workload-variant-autoscaler
description: Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads
type: application
version: 0.4.1
appVersion: "v0.4.1"
version: 0.4.3
appVersion: "v0.4.3"
101 changes: 100 additions & 1 deletion charts/workload-variant-autoscaler/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,108 @@
# workload-variant-autoscaler

![Version: 0.4.1](https://img.shields.io/badge/Version-0.4.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.4.1](https://img.shields.io/badge/AppVersion-v0.4.1-informational?style=flat-square)
![Version: 0.4.3](https://img.shields.io/badge/Version-0.4.3-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.4.3](https://img.shields.io/badge/AppVersion-v0.4.3-informational?style=flat-square)

Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads

## Installation Modes

WVA supports three installation modes to enable flexible deployment architectures:

### Mode 1: `all` (Default)
Installs both the WVA controller and model-specific resources in a single installation. This is the traditional mode and is backward compatible with previous versions.

**Use case**: Single llm-d stack with one model.

```bash
helm install workload-variant-autoscaler ./workload-variant-autoscaler \
-n workload-variant-autoscaler-system \
--set installMode=all
```

### Mode 2: `controller-only`
Installs only the WVA controller without any model-specific resources. This enables a cluster-wide controller that can manage multiple models across different namespaces.

**Use case**: Install the controller once for the entire cluster, then deploy model-specific resources separately as needed.

```bash
# Install the controller once
helm install wva-controller ./workload-variant-autoscaler \
-n workload-variant-autoscaler-system \
--create-namespace \
--set installMode=controller-only \
--set wva.namespaceScoped=false
```

### Mode 3: `model-resources-only`
Installs only model-specific resources (VariantAutoscaling, HPA, Service, ServiceMonitor) without the controller. Use this mode to add new models when a cluster-wide controller is already running.

**Use case**: Deploy resources for additional models in different namespaces after the controller is installed.

```bash
# Deploy model resources for Model A in namespace-a
helm install model-a-resources ./workload-variant-autoscaler \
-n namespace-a \
--set installMode=model-resources-only \
--set llmd.namespace=namespace-a \
--set llmd.modelName=model-a \
--set llmd.modelID="vendor/model-a"

# Deploy model resources for Model B in namespace-b
helm install model-b-resources ./workload-variant-autoscaler \
-n namespace-b \
--set installMode=model-resources-only \
--set llmd.namespace=namespace-b \
--set llmd.modelName=model-b \
--set llmd.modelID="vendor/model-b"
```

## Multi-Model Architecture Example

For supporting multiple llm-d stacks with a single controller:

```bash
# Step 1: Install the WVA controller once (cluster-wide)
helm install wva-controller ./workload-variant-autoscaler \
-n workload-variant-autoscaler-system \
--create-namespace \
--set installMode=controller-only \
--set wva.namespaceScoped=false \
--set wva.prometheus.baseURL="https://prometheus:9090"

# Step 2: Deploy Model A resources
helm install model-a ./workload-variant-autoscaler \
--set installMode=model-resources-only \
--set llmd.namespace=llm-d-model-a \
--set llmd.modelName=ms-model-a-llm-d-modelservice \
--set llmd.modelID="meta-llama/Llama-2-7b-hf" \
--set va.accelerator=H100

# Step 3: Deploy Model B resources (in a different namespace)
helm install model-b ./workload-variant-autoscaler \
--set installMode=model-resources-only \
--set llmd.namespace=llm-d-model-b \
--set llmd.modelName=ms-model-b-llm-d-modelservice \
--set llmd.modelID="mistralai/Mistral-7B-v0.1" \
--set va.accelerator=A100
```

### Important Configuration Notes

**Namespace Scoping:**
- When using `installMode=controller-only` for multi-model deployments, you must set `wva.namespaceScoped=false` to allow the controller to watch all namespaces.
- When using `installMode=all` (default), you can keep `wva.namespaceScoped=true` for single-namespace operation or set it to `false` for cluster-wide operation.
- `installMode=model-resources-only` does not use the `namespaceScoped` setting since it doesn't install the controller.

**Resource Isolation:**
- Each model's resources (VariantAutoscaling, HPA, Service, ServiceMonitor) are deployed in the model's namespace.
- The controller remains in its dedicated namespace (typically `workload-variant-autoscaler-system`).
- Multiple Helm releases can coexist: one for the controller and one per model.

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| installMode | string | `"all"` | Installation mode: "all" (controller + model resources), "controller-only" (just controller), "model-resources-only" (just model resources) |
| hpa.enabled | bool | `true` | |
| hpa.maxReplicas | int | `10` | |
| hpa.targetAverageValue | string | `"1"` | |
Expand All @@ -24,6 +119,7 @@ Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM
| vllmService.scheme | string | `"http"` | |
| wva.enabled | bool | `true` | |
| wva.experimentalHybridOptimization | enum | `off` | supports on, off, and model-only |
| wva.namespaceScoped | bool | `true` | If true, controller watches only its namespace; if false, watches all namespaces (cluster-scoped) |
| wva.image.repository | string | `"ghcr.io/llm-d-incubation/workload-variant-autoscaler"` | |
| wva.image.tag | string | `"latest"` | |
| wva.imagePullPolicy | string | `"Always"` | |
Expand All @@ -41,6 +137,9 @@ Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM
Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2)

### INSTALL (on OpenShift)

> **Note**: The default installation mode is `all`, which installs both the controller and model resources. For multi-model deployments, use `controller-only` mode first, then use `model-resources-only` mode for each model. See the [Installation Modes](#installation-modes) section above for details.

1. Before running, be sure to delete all previous helm installations for workload-variant-scheduler and prometheus-adapter.
2. llm-d must be installed for WVA to do it's magic. If you plan on installing llm-d with these instructions, please be sure to remove any other helm installation of llm-d before proceeding.

Expand Down
2 changes: 1 addition & 1 deletion charts/workload-variant-autoscaler/templates/hpa.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.hpa.enabled }}
{{- if and (or (eq .Values.installMode "all") (eq .Values.installMode "model-resources-only")) .Values.hpa.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -13,3 +14,4 @@ subjects:
- kind: ServiceAccount
name: {{ .Values.wva.prometheus.serviceAccountName }}
namespace: {{ .Values.wva.prometheus.monitoringNamespace }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -13,3 +14,4 @@ subjects:
- kind: ServiceAccount
name: workload-variant-autoscaler-controller-manager
namespace: {{ .Release.Namespace }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -34,3 +35,4 @@ data:
"device": "NVIDIA-L40S",
"cost": "32.00"
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
# This configMap defines the set of accelerators available
Expand Down Expand Up @@ -35,3 +36,4 @@ data:
- model: meta/llama0-7b
slo-tpot: 150
slo-ttft: 1500
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -56,3 +57,4 @@ data:
# EPP_METRICS_CACHE_TTL: "15s"
# EPP_METRICS_CACHE_MAX_SIZE: "500"
# EPP_METRICS_CACHE_CLEANUP_INTERVAL: "30s"
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
# This ConfigMap defines saturation-based scaling thresholds for model variants.
Expand Down Expand Up @@ -61,4 +62,4 @@ data:
# model_id: meta/llama-70b
# namespace: production
# kvCacheThreshold: 0.85
# kvSpareTrigger: 0.15
# kvSpareTrigger: 0.15{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: apps/v1
kind: Deployment
metadata:
Expand Down Expand Up @@ -144,3 +145,4 @@ spec:
optional: true
serviceAccountName: workload-variant-autoscaler-controller-manager
terminationGracePeriodSeconds: 10
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ServiceAccount
metadata:
Expand All @@ -7,3 +8,4 @@ metadata:
app.kubernetes.io/name: workload-variant-autoscaler
imagePullSecrets:
- name: {{ .Values.wva.imagePullSecret | default "" }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
Expand Down Expand Up @@ -27,3 +28,4 @@ spec:
matchLabels:
app.kubernetes.io/name: workload-variant-autoscaler
control-plane: controller-manager
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: Secret
metadata:
name: workload-variant-autoscaler-controller-manager-token-manual
namespace: {{ .Release.Namespace }}
annotations:
kubernetes.io/service-account.name: workload-variant-autoscaler-controller-manager
type: kubernetes.io/service-account-token
type: kubernetes.io/service-account-token{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: Service
metadata:
Expand All @@ -15,3 +16,4 @@ spec:
selector:
control-plane: controller-manager
app.kubernetes.io/name: workload-variant-autoscaler
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand All @@ -10,3 +11,4 @@ data:
{{- else }}
# CA certificate not provided - using system CA bundle
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand All @@ -10,3 +11,4 @@ data:
{{- else }}
# CA certificate not provided - using system CA bundle
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
# permissions to do leader election.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
Expand Down Expand Up @@ -38,3 +39,4 @@ rules:
verbs:
- create
- patch
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
Expand All @@ -13,3 +14,4 @@ subjects:
- kind: ServiceAccount
name: workload-variant-autoscaler-controller-manager
namespace: {{ .Release.Namespace }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
Expand All @@ -15,3 +16,4 @@ rules:
- subjectaccessreviews
verbs:
- create
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -10,3 +11,4 @@ subjects:
- kind: ServiceAccount
name: workload-variant-autoscaler-controller-manager
namespace: {{ .Release.Namespace}}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
Expand All @@ -7,3 +8,4 @@ rules:
- "/metrics"
verbs:
- get
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -9,4 +10,4 @@ roleRef:
subjects:
- kind: ServiceAccount
name: workload-variant-autoscaler-controller-manager
namespace: {{ .Release.Namespace }}
namespace: {{ .Release.Namespace }}{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -14,3 +15,4 @@ subjects:
- kind: ServiceAccount
name: {{ .Values.wva.prometheus.serviceAccountName }}
namespace: {{ .Values.wva.prometheus.monitoringNamespace }}
{{- end }}
2 changes: 2 additions & 0 deletions charts/workload-variant-autoscaler/templates/rbac/role.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
Expand Down Expand Up @@ -83,3 +84,4 @@ rules:
verbs:
- create
- patch
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
Expand All @@ -12,3 +13,4 @@ subjects:
- kind: ServiceAccount
name: workload-variant-autoscaler-controller-manager
namespace: {{ .Release.Namespace }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
# This rule is not used by the project workload-variant-autoscaler itself.
# It is provided to allow the cluster admin to help manage permissions for users.
#
Expand All @@ -24,3 +25,4 @@ rules:
- variantautoscalings/status
verbs:
- get
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
# This rule is not used by the project workload-variant-autoscaler itself.
# It is provided to allow the cluster admin to help manage permissions for users.
#
Expand Down Expand Up @@ -30,3 +31,4 @@ rules:
- variantautoscalings/status
verbs:
- get
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }}
# This rule is not used by the project workload-variant-autoscaler itself.
# It is provided to allow the cluster admin to help manage permissions for users.
#
Expand Down Expand Up @@ -26,3 +27,4 @@ rules:
- variantautoscalings/status
verbs:
- get
{{- end }}
Loading
Loading