diff --git a/charts/workload-variant-autoscaler/Chart.yaml b/charts/workload-variant-autoscaler/Chart.yaml index 8e5a64681..7439ad42a 100644 --- a/charts/workload-variant-autoscaler/Chart.yaml +++ b/charts/workload-variant-autoscaler/Chart.yaml @@ -2,5 +2,5 @@ apiVersion: v2 name: workload-variant-autoscaler description: Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads type: application -version: 0.4.1 -appVersion: "v0.4.1" +version: 0.4.3 +appVersion: "v0.4.3" diff --git a/charts/workload-variant-autoscaler/README.md b/charts/workload-variant-autoscaler/README.md index 2af11e9c8..93d835486 100644 --- a/charts/workload-variant-autoscaler/README.md +++ b/charts/workload-variant-autoscaler/README.md @@ -1,13 +1,108 @@ # workload-variant-autoscaler -![Version: 0.4.1](https://img.shields.io/badge/Version-0.4.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.4.1](https://img.shields.io/badge/AppVersion-v0.4.1-informational?style=flat-square) +![Version: 0.4.3](https://img.shields.io/badge/Version-0.4.3-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.4.3](https://img.shields.io/badge/AppVersion-v0.4.3-informational?style=flat-square) Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads +## Installation Modes + +WVA supports three installation modes to enable flexible deployment architectures: + +### Mode 1: `all` (Default) +Installs both the WVA controller and model-specific resources in a single installation. This is the traditional mode and is backward compatible with previous versions. + +**Use case**: Single llm-d stack with one model. + +```bash +helm install workload-variant-autoscaler ./workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --set installMode=all +``` + +### Mode 2: `controller-only` +Installs only the WVA controller without any model-specific resources. This enables a cluster-wide controller that can manage multiple models across different namespaces. + +**Use case**: Install the controller once for the entire cluster, then deploy model-specific resources separately as needed. + +```bash +# Install the controller once +helm install wva-controller ./workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false +``` + +### Mode 3: `model-resources-only` +Installs only model-specific resources (VariantAutoscaling, HPA, Service, ServiceMonitor) without the controller. Use this mode to add new models when a cluster-wide controller is already running. + +**Use case**: Deploy resources for additional models in different namespaces after the controller is installed. + +```bash +# Deploy model resources for Model A in namespace-a +helm install model-a-resources ./workload-variant-autoscaler \ + -n namespace-a \ + --set installMode=model-resources-only \ + --set llmd.namespace=namespace-a \ + --set llmd.modelName=model-a \ + --set llmd.modelID="vendor/model-a" + +# Deploy model resources for Model B in namespace-b +helm install model-b-resources ./workload-variant-autoscaler \ + -n namespace-b \ + --set installMode=model-resources-only \ + --set llmd.namespace=namespace-b \ + --set llmd.modelName=model-b \ + --set llmd.modelID="vendor/model-b" +``` + +## Multi-Model Architecture Example + +For supporting multiple llm-d stacks with a single controller: + +```bash +# Step 1: Install the WVA controller once (cluster-wide) +helm install wva-controller ./workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="https://prometheus:9090" + +# Step 2: Deploy Model A resources +helm install model-a ./workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-a \ + --set llmd.modelName=ms-model-a-llm-d-modelservice \ + --set llmd.modelID="meta-llama/Llama-2-7b-hf" \ + --set va.accelerator=H100 + +# Step 3: Deploy Model B resources (in a different namespace) +helm install model-b ./workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-b \ + --set llmd.modelName=ms-model-b-llm-d-modelservice \ + --set llmd.modelID="mistralai/Mistral-7B-v0.1" \ + --set va.accelerator=A100 +``` + +### Important Configuration Notes + +**Namespace Scoping:** +- When using `installMode=controller-only` for multi-model deployments, you must set `wva.namespaceScoped=false` to allow the controller to watch all namespaces. +- When using `installMode=all` (default), you can keep `wva.namespaceScoped=true` for single-namespace operation or set it to `false` for cluster-wide operation. +- `installMode=model-resources-only` does not use the `namespaceScoped` setting since it doesn't install the controller. + +**Resource Isolation:** +- Each model's resources (VariantAutoscaling, HPA, Service, ServiceMonitor) are deployed in the model's namespace. +- The controller remains in its dedicated namespace (typically `workload-variant-autoscaler-system`). +- Multiple Helm releases can coexist: one for the controller and one per model. + ## Values | Key | Type | Default | Description | |-----|------|---------|-------------| +| installMode | string | `"all"` | Installation mode: "all" (controller + model resources), "controller-only" (just controller), "model-resources-only" (just model resources) | | hpa.enabled | bool | `true` | | | hpa.maxReplicas | int | `10` | | | hpa.targetAverageValue | string | `"1"` | | @@ -24,6 +119,7 @@ Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM | vllmService.scheme | string | `"http"` | | | wva.enabled | bool | `true` | | | wva.experimentalHybridOptimization | enum | `off` | supports on, off, and model-only | +| wva.namespaceScoped | bool | `true` | If true, controller watches only its namespace; if false, watches all namespaces (cluster-scoped) | | wva.image.repository | string | `"ghcr.io/llm-d-incubation/workload-variant-autoscaler"` | | | wva.image.tag | string | `"latest"` | | | wva.imagePullPolicy | string | `"Always"` | | @@ -41,6 +137,9 @@ Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2) ### INSTALL (on OpenShift) + +> **Note**: The default installation mode is `all`, which installs both the controller and model resources. For multi-model deployments, use `controller-only` mode first, then use `model-resources-only` mode for each model. See the [Installation Modes](#installation-modes) section above for details. + 1. Before running, be sure to delete all previous helm installations for workload-variant-scheduler and prometheus-adapter. 2. llm-d must be installed for WVA to do it's magic. If you plan on installing llm-d with these instructions, please be sure to remove any other helm installation of llm-d before proceeding. diff --git a/charts/workload-variant-autoscaler/templates/hpa.yaml b/charts/workload-variant-autoscaler/templates/hpa.yaml index 7ad0c8d66..cdbf06738 100644 --- a/charts/workload-variant-autoscaler/templates/hpa.yaml +++ b/charts/workload-variant-autoscaler/templates/hpa.yaml @@ -1,4 +1,4 @@ -{{- if .Values.hpa.enabled }} +{{- if and (or (eq .Values.installMode "all") (eq .Values.installMode "model-resources-only")) .Values.hpa.enabled }} apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: diff --git a/charts/workload-variant-autoscaler/templates/manager/prometheus-clusterrolebinding.yaml b/charts/workload-variant-autoscaler/templates/manager/prometheus-clusterrolebinding.yaml index fce4b3d9b..bd303dff9 100644 --- a/charts/workload-variant-autoscaler/templates/manager/prometheus-clusterrolebinding.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/prometheus-clusterrolebinding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -13,3 +14,4 @@ subjects: - kind: ServiceAccount name: {{ .Values.wva.prometheus.serviceAccountName }} namespace: {{ .Values.wva.prometheus.monitoringNamespace }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-clusterrolebinding.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-clusterrolebinding.yaml index 58376493c..3fd78ef8d 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-clusterrolebinding.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-clusterrolebinding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -13,3 +14,4 @@ subjects: - kind: ServiceAccount name: workload-variant-autoscaler-controller-manager namespace: {{ .Release.Namespace }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-configmap-accelerator-costs.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-configmap-accelerator-costs.yaml index d83284c11..a52d032ed 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-configmap-accelerator-costs.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-configmap-accelerator-costs.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap metadata: @@ -34,3 +35,4 @@ data: "device": "NVIDIA-L40S", "cost": "32.00" } +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-configmap-service-class.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-configmap-service-class.yaml index 49d0cbbe6..19acf3aa5 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-configmap-service-class.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-configmap-service-class.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap # This configMap defines the set of accelerators available @@ -35,3 +36,4 @@ data: - model: meta/llama0-7b slo-tpot: 150 slo-ttft: 1500 +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-configmap.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-configmap.yaml index ddc2782d1..6e01d4c07 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-configmap.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-configmap.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap metadata: @@ -56,3 +57,4 @@ data: # EPP_METRICS_CACHE_TTL: "15s" # EPP_METRICS_CACHE_MAX_SIZE: "500" # EPP_METRICS_CACHE_CLEANUP_INTERVAL: "30s" +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-cp-configmap.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-cp-configmap.yaml index 0a5214cde..9adc56e7b 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-cp-configmap.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-cp-configmap.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap # This ConfigMap defines saturation-based scaling thresholds for model variants. @@ -61,4 +62,4 @@ data: # model_id: meta/llama-70b # namespace: production # kvCacheThreshold: 0.85 - # kvSpareTrigger: 0.15 \ No newline at end of file + # kvSpareTrigger: 0.15{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml index e779b8344..3f4264233 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: apps/v1 kind: Deployment metadata: @@ -144,3 +145,4 @@ spec: optional: true serviceAccountName: workload-variant-autoscaler-controller-manager terminationGracePeriodSeconds: 10 +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-serviceaccount.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-serviceaccount.yaml index 10694f1c8..c630b59f5 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-serviceaccount.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-serviceaccount.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ServiceAccount metadata: @@ -7,3 +8,4 @@ metadata: app.kubernetes.io/name: workload-variant-autoscaler imagePullSecrets: - name: {{ .Values.wva.imagePullSecret | default "" }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-servicemonitor.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-servicemonitor.yaml index 920ef2d40..a782a941c 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-servicemonitor.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-servicemonitor.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: @@ -27,3 +28,4 @@ spec: matchLabels: app.kubernetes.io/name: workload-variant-autoscaler control-plane: controller-manager +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/manager/wva-token-secret.yaml b/charts/workload-variant-autoscaler/templates/manager/wva-token-secret.yaml index e42ca8299..dd6ae58f1 100644 --- a/charts/workload-variant-autoscaler/templates/manager/wva-token-secret.yaml +++ b/charts/workload-variant-autoscaler/templates/manager/wva-token-secret.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: Secret metadata: @@ -5,4 +6,4 @@ metadata: namespace: {{ .Release.Namespace }} annotations: kubernetes.io/service-account.name: workload-variant-autoscaler-controller-manager -type: kubernetes.io/service-account-token \ No newline at end of file +type: kubernetes.io/service-account-token{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/metrics_service.yaml b/charts/workload-variant-autoscaler/templates/metrics_service.yaml index a89c5b089..dc6275a41 100644 --- a/charts/workload-variant-autoscaler/templates/metrics_service.yaml +++ b/charts/workload-variant-autoscaler/templates/metrics_service.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: Service metadata: @@ -15,3 +16,4 @@ spec: selector: control-plane: controller-manager app.kubernetes.io/name: workload-variant-autoscaler +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-prom.yaml b/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-prom.yaml index 25d1693ec..a424bd5ab 100644 --- a/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-prom.yaml +++ b/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-prom.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap metadata: @@ -10,3 +11,4 @@ data: {{- else }} # CA certificate not provided - using system CA bundle {{- end }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-wva.yaml b/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-wva.yaml index 509913708..bab3f2083 100644 --- a/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-wva.yaml +++ b/charts/workload-variant-autoscaler/templates/prometheus-ca-configmap-wva.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: v1 kind: ConfigMap metadata: @@ -10,3 +11,4 @@ data: {{- else }} # CA certificate not provided - using system CA bundle {{- end }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/leader_election_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/leader_election_role.yaml index 1771f26ca..3d567c728 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/leader_election_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/leader_election_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} # permissions to do leader election. apiVersion: rbac.authorization.k8s.io/v1 kind: Role @@ -38,3 +39,4 @@ rules: verbs: - create - patch +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/leader_election_role_binding.yaml b/charts/workload-variant-autoscaler/templates/rbac/leader_election_role_binding.yaml index a7217dde5..c87a7b510 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/leader_election_role_binding.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/leader_election_role_binding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: @@ -13,3 +14,4 @@ subjects: - kind: ServiceAccount name: workload-variant-autoscaler-controller-manager namespace: {{ .Release.Namespace }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role.yaml index 1ffc31c62..9d9f00aeb 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: @@ -15,3 +16,4 @@ rules: - subjectaccessreviews verbs: - create +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role_binding.yaml b/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role_binding.yaml index 418ecdc27..836213067 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role_binding.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/metrics_auth_role_binding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -10,3 +11,4 @@ subjects: - kind: ServiceAccount name: workload-variant-autoscaler-controller-manager namespace: {{ .Release.Namespace}} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role.yaml index d1d0e009b..28f896fe1 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: @@ -7,3 +8,4 @@ rules: - "/metrics" verbs: - get +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role_binding.yaml b/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role_binding.yaml index 59d7c44f6..256602dbf 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role_binding.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/metrics_reader_role_binding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -9,4 +10,4 @@ roleRef: subjects: - kind: ServiceAccount name: workload-variant-autoscaler-controller-manager - namespace: {{ .Release.Namespace }} \ No newline at end of file + namespace: {{ .Release.Namespace }}{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/prometheus_metrics_auth_role_binding.yaml b/charts/workload-variant-autoscaler/templates/rbac/prometheus_metrics_auth_role_binding.yaml index 3293e0c90..454643489 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/prometheus_metrics_auth_role_binding.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/prometheus_metrics_auth_role_binding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -14,3 +15,4 @@ subjects: - kind: ServiceAccount name: {{ .Values.wva.prometheus.serviceAccountName }} namespace: {{ .Values.wva.prometheus.monitoringNamespace }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/role.yaml b/charts/workload-variant-autoscaler/templates/rbac/role.yaml index 4107ebcdf..790aa7185 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole @@ -83,3 +84,4 @@ rules: verbs: - create - patch +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/role_binding.yaml b/charts/workload-variant-autoscaler/templates/rbac/role_binding.yaml index da246f23e..e0a8ff010 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/role_binding.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/role_binding.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: @@ -12,3 +13,4 @@ subjects: - kind: ServiceAccount name: workload-variant-autoscaler-controller-manager namespace: {{ .Release.Namespace }} +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_admin_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_admin_role.yaml index 8cfc66882..7587034a5 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_admin_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_admin_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} # This rule is not used by the project workload-variant-autoscaler itself. # It is provided to allow the cluster admin to help manage permissions for users. # @@ -24,3 +25,4 @@ rules: - variantautoscalings/status verbs: - get +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_editor_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_editor_role.yaml index d9a59f600..6ea85ed66 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_editor_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_editor_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} # This rule is not used by the project workload-variant-autoscaler itself. # It is provided to allow the cluster admin to help manage permissions for users. # @@ -30,3 +31,4 @@ rules: - variantautoscalings/status verbs: - get +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_viewer_role.yaml b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_viewer_role.yaml index f5c4c01aa..8318dfae5 100644 --- a/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_viewer_role.yaml +++ b/charts/workload-variant-autoscaler/templates/rbac/variantautoscaling_viewer_role.yaml @@ -1,3 +1,4 @@ +{{- if or (eq .Values.installMode "all") (eq .Values.installMode "controller-only") }} # This rule is not used by the project workload-variant-autoscaler itself. # It is provided to allow the cluster admin to help manage permissions for users. # @@ -26,3 +27,4 @@ rules: - variantautoscalings/status verbs: - get +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/variantautoscaling.yaml b/charts/workload-variant-autoscaler/templates/variantautoscaling.yaml index 1a7c79d99..d2ec82773 100644 --- a/charts/workload-variant-autoscaler/templates/variantautoscaling.yaml +++ b/charts/workload-variant-autoscaler/templates/variantautoscaling.yaml @@ -1,4 +1,4 @@ -{{- if .Values.va.enabled }} +{{- if and (or (eq .Values.installMode "all") (eq .Values.installMode "model-resources-only")) .Values.va.enabled }} apiVersion: llmd.ai/v1alpha1 # Optimizing a variant, create only when the model is deployed and serving traffic # this is for the collector the collect existing (previous) running metrics of the variant. @@ -58,4 +58,4 @@ spec: maxBatchSize: 4 {{- end}} -{{- end }} \ No newline at end of file +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/vllm-service.yaml b/charts/workload-variant-autoscaler/templates/vllm-service.yaml index ee13b17c5..9e42ea3c2 100644 --- a/charts/workload-variant-autoscaler/templates/vllm-service.yaml +++ b/charts/workload-variant-autoscaler/templates/vllm-service.yaml @@ -1,4 +1,4 @@ -{{- if .Values.vllmService.enabled }} +{{- if and (or (eq .Values.installMode "all") (eq .Values.installMode "model-resources-only")) .Values.vllmService.enabled }} apiVersion: v1 kind: Service metadata: @@ -16,4 +16,4 @@ spec: targetPort: 8200 nodePort: {{ .Values.vllmService.nodePort }} type: NodePort -{{- end }} \ No newline at end of file +{{- end }} diff --git a/charts/workload-variant-autoscaler/templates/vllm-servicemonitor.yaml b/charts/workload-variant-autoscaler/templates/vllm-servicemonitor.yaml index 247999cea..c65b7813f 100644 --- a/charts/workload-variant-autoscaler/templates/vllm-servicemonitor.yaml +++ b/charts/workload-variant-autoscaler/templates/vllm-servicemonitor.yaml @@ -1,4 +1,4 @@ -{{- if .Values.vllmService.enabled }} +{{- if and (or (eq .Values.installMode "all") (eq .Values.installMode "model-resources-only")) .Values.vllmService.enabled }} apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: @@ -23,4 +23,4 @@ spec: {{- end }} namespaceSelector: any: true -{{- end }} \ No newline at end of file +{{- end }} diff --git a/charts/workload-variant-autoscaler/values-dev.yaml b/charts/workload-variant-autoscaler/values-dev.yaml index 4138936f9..30ec1ee68 100644 --- a/charts/workload-variant-autoscaler/values-dev.yaml +++ b/charts/workload-variant-autoscaler/values-dev.yaml @@ -1,6 +1,12 @@ # Development values for workload-variant-autoscaler # This file contains development-specific configurations with relaxed security settings +# Installation mode controls which components are installed: +# - "all": Install both controller and model-specific resources (default, backward compatible) +# - "controller-only": Install only the WVA controller (for cluster-wide controller management) +# - "model-resources-only": Install only model-specific resources (VA, HPA, Service, ServiceMonitor) +installMode: all + wva: enabled: true replicaCount: 1 diff --git a/charts/workload-variant-autoscaler/values.yaml b/charts/workload-variant-autoscaler/values.yaml index 11472ec04..05b9f8986 100644 --- a/charts/workload-variant-autoscaler/values.yaml +++ b/charts/workload-variant-autoscaler/values.yaml @@ -1,3 +1,9 @@ +# Installation mode controls which components are installed: +# - "all": Install both controller and model-specific resources (default, backward compatible) +# - "controller-only": Install only the WVA controller (for cluster-wide controller management) +# - "model-resources-only": Install only model-specific resources (VA, HPA, Service, ServiceMonitor) +installMode: all + wva: enabled: true @@ -13,6 +19,8 @@ wva: # If true, the controller will only watch the namespace it is deployed in. # If false, the controller will watch all namespaces (cluster-scoped). + # Note: When using installMode="controller-only" for multi-model deployments, + # set this to false to allow the controller to watch all namespaces. namespaceScoped: true reconcileInterval: 60s diff --git a/deploy/README.md b/deploy/README.md index e6601877c..9c63df5d2 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -219,7 +219,9 @@ The WVA can be deployed as a standalone using Helm, assuming you have: - ServiceMonitors configured - Prometheus Adapter (optional, for HPA) -This method is particularly useful when there is one (or more) existing llm-d infrastructure deployed +This method is particularly useful when there is one (or more) existing llm-d infrastructure deployed. + +> **New in v0.4.3**: The Helm chart now supports three installation modes (`all`, `controller-only`, `model-resources-only`) to enable flexible multi-model deployments. See the [Helm Chart Installation Modes](#helm-chart-installation-modes) section below for details. #### Helm Chart Quick Start @@ -529,10 +531,178 @@ spec: EOF ``` +#### Helm Chart Installation Modes + +**New in v0.4.3**: The Helm chart supports three installation modes to enable flexible multi-model deployments across multiple namespaces. This addresses the limitation where installing WVA for a new model would overwrite resources from existing models. + +##### Mode 1: `all` (Default - Backward Compatible) + +Installs both the WVA controller and model-specific resources. This is the traditional mode and maintains backward compatibility. + +**Use case**: Single llm-d stack with one model. + +```bash +helm install workload-variant-autoscaler ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=all # This is the default +``` + +##### Mode 2: `controller-only` + +Installs only the WVA controller (Deployment, ServiceAccount, RBAC, ConfigMaps) without any model-specific resources. + +**Use case**: Install a cluster-wide controller once, then deploy model-specific resources separately as needed. + +```bash +# Step 1: Install the controller once for the entire cluster +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="https://prometheus-k8s.monitoring.svc:9090" \ + --set wva.prometheus.tls.insecureSkipVerify=false +``` + +##### Mode 3: `model-resources-only` + +Installs only model-specific resources (VariantAutoscaling, HPA, Service, ServiceMonitor) without the controller. + +**Use case**: Deploy resources for additional models in different namespaces after a cluster-wide controller is installed. + +```bash +# Step 2a: Deploy model resources for Model A in namespace-a +helm install model-a-resources ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-a \ + --set llmd.modelName=ms-model-a-llm-d-modelservice \ + --set llmd.modelID="meta-llama/Llama-2-7b-hf" \ + --set va.accelerator=H100 \ + --set va.enabled=true \ + --set hpa.enabled=true \ + --set vllmService.enabled=true + +# Step 2b: Deploy model resources for Model B in namespace-b +helm install model-b-resources ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-b \ + --set llmd.modelName=ms-model-b-llm-d-modelservice \ + --set llmd.modelID="mistralai/Mistral-7B-v0.1" \ + --set va.accelerator=A100 \ + --set va.enabled=true \ + --set hpa.enabled=true \ + --set vllmService.enabled=true +``` + +##### Complete Multi-Model Example + +Here's a complete example showing how to deploy WVA to support multiple llm-d stacks: + +```bash +# Prerequisites: Multiple llm-d stacks already deployed in different namespaces + +# Step 1: Install the WVA controller once (cluster-wide) +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="https://prometheus-k8s.monitoring.svc:9090" + +# Step 2: Deploy model resources for each llm-d stack +# Model A in llm-d-stack-a namespace +helm install wva-model-a ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-stack-a \ + --set llmd.modelName=ms-model-a-llm-d-modelservice \ + --set llmd.modelID="meta-llama/Llama-2-7b-hf" \ + --set va.accelerator=H100 + +# Model B in llm-d-stack-b namespace +helm install wva-model-b ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-stack-b \ + --set llmd.modelName=ms-model-b-llm-d-modelservice \ + --set llmd.modelID="mistralai/Mistral-7B-v0.1" \ + --set va.accelerator=A100 + +# Model C in llm-d-stack-c namespace +helm install wva-model-c ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-stack-c \ + --set llmd.modelName=ms-model-c-llm-d-modelservice \ + --set llmd.modelID="google/gemma-2b" \ + --set va.accelerator=L40S +``` + +**Architecture with Multiple Models:** + +``` +Cluster +├── workload-variant-autoscaler-system (namespace) +│ └── wva-controller (Deployment) ← Single controller watching all namespaces +│ +├── llm-d-stack-a (namespace) +│ ├── llm-d resources (Gateway, Scheduler, vLLM) +│ └── wva-resources (VA, HPA, Service, ServiceMonitor) for Model A +│ +├── llm-d-stack-b (namespace) +│ ├── llm-d resources (Gateway, Scheduler, vLLM) +│ └── wva-resources (VA, HPA, Service, ServiceMonitor) for Model B +│ +└── llm-d-stack-c (namespace) + ├── llm-d resources (Gateway, Scheduler, vLLM) + └── wva-resources (VA, HPA, Service, ServiceMonitor) for Model C +``` + +**Benefits:** +- Single WVA controller manages all models across the cluster +- Each model's resources are isolated in their own namespace +- Adding/removing models doesn't affect other models +- Supports multiple llm-d stacks without resource conflicts + +##### Upgrading Existing Installations + +If you have an existing WVA installation (v0.4.1 or earlier), it will continue to work with the default `all` mode. To migrate to the new multi-model architecture: + +```bash +# 1. Note your current model configuration +kubectl get variantautoscaling -A +kubectl get hpa -A | grep vllm + +# 2. Uninstall the old WVA installation +helm uninstall workload-variant-autoscaler -n workload-variant-autoscaler-system + +# 3. Reinstall with controller-only mode +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="" + +# 4. Reinstall model resources for each model +helm install wva-model-a ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace= \ + --set llmd.modelName= \ + --set llmd.modelID="" \ + --set va.accelerator= +``` + #### Helm Uninstall ```bash -# Uninstall the release +# Uninstall controller +helm uninstall wva-controller -n workload-variant-autoscaler-system + +# Uninstall model resources (repeat for each model) +helm uninstall wva-model-a +helm uninstall wva-model-b +# ... + +# Or uninstall all-in-one installation helm uninstall workload-variant-autoscaler -n workload-variant-autoscaler-system ``` diff --git a/docs/user-guide/installation.md b/docs/user-guide/installation.md index 5a6fc4777..037314d40 100644 --- a/docs/user-guide/installation.md +++ b/docs/user-guide/installation.md @@ -2,6 +2,8 @@ This guide covers installing Workload-Variant-Autoscaler (WVA) on your Kubernetes cluster. +> **New in v0.4.3**: WVA now supports flexible installation modes for multi-model deployments. See the [Multi-Model Migration Guide](multi-model-migration.md) for details on deploying WVA across multiple llm-d stacks. + ## Prerequisites - Kubernetes v1.32.0 or later @@ -13,7 +15,13 @@ This guide covers installing Workload-Variant-Autoscaler (WVA) on your Kubernete ### Option 1: Helm Installation (Recommended) -The simplest way to install WVA is using Helm: +The simplest way to install WVA is using Helm. The Helm chart supports three installation modes: + +- **`all` (default)**: Install both controller and model resources together +- **`controller-only`**: Install only the controller for cluster-wide management +- **`model-resources-only`**: Install only model-specific resources + +**Basic installation (single model):** ```bash # Install WVA with default configuration @@ -28,6 +36,28 @@ helm install workload-variant-autoscaler ./charts/workload-variant-autoscaler \ --values custom-values.yaml ``` +**Multi-model installation:** + +For deploying WVA to manage multiple models across different namespaces: + +```bash +# Step 1: Install controller once +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false + +# Step 2: Install model resources for each model +helm install wva-model-a ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-a \ + --set llmd.modelName=model-a \ + --set llmd.modelID="meta-llama/Llama-2-7b-hf" +``` + +See the [Multi-Model Migration Guide](multi-model-migration.md) for complete details. + **Verify the installation:** ```bash kubectl get pods -n workload-variant-autoscaler-system diff --git a/docs/user-guide/multi-model-migration.md b/docs/user-guide/multi-model-migration.md new file mode 100644 index 000000000..4c644d024 --- /dev/null +++ b/docs/user-guide/multi-model-migration.md @@ -0,0 +1,311 @@ +# Multi-Model Migration Guide + +This guide helps you migrate from a single-model WVA installation to a multi-model architecture that supports multiple llm-d stacks across different namespaces. + +## Overview + +**Prior to v0.4.3**, the WVA Helm chart installed both the controller and model-specific resources together. This meant that installing WVA for a new model would overwrite the resources from existing models, making it impossible to support multiple llm-d stacks. + +**Starting with v0.4.3**, WVA supports three installation modes that enable you to decouple the controller from model resources: + +- `all` (default) - Install both controller and model resources together +- `controller-only` - Install only the WVA controller +- `model-resources-only` - Install only model-specific resources + +## When to Migrate + +You should consider migrating to the new multi-model architecture if you: + +- Have multiple llm-d stacks in different namespaces +- Want to add models without affecting existing models +- Need to scale different models independently +- Want to manage model lifecycles separately from the controller + +## Migration Steps + +### Step 1: Document Current Configuration + +Before starting the migration, document your current setup: + +```bash +# List existing WVA installations +helm ls -A | grep workload-variant-autoscaler + +# Save current VariantAutoscaling resources +kubectl get variantautoscaling -A -o yaml > /tmp/existing-va.yaml + +# Save current HPA resources +kubectl get hpa -A | grep vllm > /tmp/existing-hpa.txt + +# Save current model configuration +kubectl get variantautoscaling -A -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.namespace}{"\t"}{.spec.modelID}{"\n"}{end}' > /tmp/models.txt +``` + +### Step 2: Backup Current Installation + +Create a backup of your current Helm values: + +```bash +# Get current values +helm get values workload-variant-autoscaler -n workload-variant-autoscaler-system > /tmp/current-values.yaml +``` + +### Step 3: Uninstall Old Installation + +Remove the existing WVA installation: + +```bash +# Uninstall WVA +helm uninstall workload-variant-autoscaler -n workload-variant-autoscaler-system + +# Verify resources are removed +kubectl get pods -n workload-variant-autoscaler-system +kubectl get variantautoscaling -A +kubectl get hpa -A | grep vllm +``` + +**Note**: The VariantAutoscaling and HPA resources will be deleted. This is expected as we will recreate them in the next steps. + +### Step 4: Install WVA Controller (Cluster-Wide) + +Install the WVA controller once for the entire cluster: + +```bash +# Install controller in controller-only mode +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="" \ + --set wva.prometheus.tls.insecureSkipVerify= \ + --set-file wva.prometheus.caCert= # If needed + +# Verify controller is running +kubectl get pods -n workload-variant-autoscaler-system +kubectl logs -n workload-variant-autoscaler-system -l app.kubernetes.io/name=workload-variant-autoscaler +``` + +### Step 5: Deploy Model Resources for Each Model + +For each llm-d stack/model, install the model-specific resources: + +#### Example: Model A + +```bash +helm install wva-model-a ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-a \ + --set llmd.modelName=ms-model-a-llm-d-modelservice \ + --set llmd.modelID="meta-llama/Llama-2-7b-hf" \ + --set va.enabled=true \ + --set va.accelerator=H100 \ + --set va.sloTpot=10 \ + --set va.sloTtft=1000 \ + --set hpa.enabled=true \ + --set hpa.maxReplicas=10 \ + --set vllmService.enabled=true \ + --set vllmService.nodePort=30000 \ + --set vllmService.interval=15s +``` + +#### Example: Model B + +```bash +helm install wva-model-b ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-b \ + --set llmd.modelName=ms-model-b-llm-d-modelservice \ + --set llmd.modelID="mistralai/Mistral-7B-v0.1" \ + --set va.enabled=true \ + --set va.accelerator=A100 \ + --set va.sloTpot=8 \ + --set va.sloTtft=800 \ + --set hpa.enabled=true \ + --set hpa.maxReplicas=8 \ + --set vllmService.enabled=true \ + --set vllmService.nodePort=30001 \ + --set vllmService.interval=15s +``` + +**Tip**: Use the information from Step 1 to configure each model correctly. + +### Step 6: Verify Migration + +Verify that all components are working correctly: + +```bash +# Check controller +kubectl get pods -n workload-variant-autoscaler-system +kubectl logs -n workload-variant-autoscaler-system -l app.kubernetes.io/name=workload-variant-autoscaler + +# Check model resources for each namespace +kubectl get variantautoscaling -A +kubectl get hpa -A | grep vllm +kubectl get service -A | grep vllm +kubectl get servicemonitor -A | grep vllm + +# Check that the controller is watching all namespaces +kubectl logs -n workload-variant-autoscaler-system -l app.kubernetes.io/name=workload-variant-autoscaler | grep "Starting workers" + +# Verify autoscaling is working +kubectl describe variantautoscaling -n llm-d-model-a +kubectl describe hpa -n llm-d-model-a +``` + +## Post-Migration Architecture + +After migration, your architecture will look like this: + +``` +Cluster +├── workload-variant-autoscaler-system (namespace) +│ └── wva-controller (Deployment) ← Single controller watching all namespaces +│ +├── llm-d-model-a (namespace) +│ ├── llm-d resources (Gateway, Scheduler, vLLM) +│ └── wva-resources +│ ├── VariantAutoscaling +│ ├── HPA +│ ├── Service +│ └── ServiceMonitor +│ +├── llm-d-model-b (namespace) +│ ├── llm-d resources (Gateway, Scheduler, vLLM) +│ └── wva-resources +│ ├── VariantAutoscaling +│ ├── HPA +│ ├── Service +│ └── ServiceMonitor +│ +└── llm-d-model-c (namespace) + ├── llm-d resources (Gateway, Scheduler, vLLM) + └── wva-resources + ├── VariantAutoscaling + ├── HPA + ├── Service + └── ServiceMonitor +``` + +## Adding New Models + +After migration, adding new models is straightforward: + +```bash +# Just install model resources for the new model +helm install wva-model-new ./charts/workload-variant-autoscaler \ + --set installMode=model-resources-only \ + --set llmd.namespace=llm-d-model-new \ + --set llmd.modelName=ms-model-new-llm-d-modelservice \ + --set llmd.modelID="new/model-id" \ + --set va.accelerator= +``` + +The new model resources won't affect existing models! + +## Removing Models + +To remove a model without affecting others: + +```bash +# Uninstall model resources +helm uninstall wva-model-a + +# Verify only that model's resources are removed +kubectl get variantautoscaling -A +kubectl get hpa -A | grep vllm +``` + +The controller and other models remain unaffected. + +## Troubleshooting + +### Controller Not Watching All Namespaces + +**Problem**: Controller only watches its own namespace. + +**Solution**: Ensure `wva.namespaceScoped=false` was set during controller installation: + +```bash +# Check controller arguments +kubectl get deployment -n workload-variant-autoscaler-system workload-variant-autoscaler-controller-manager -o yaml | grep watch-namespace + +# If --watch-namespace is present, reinstall with correct settings +helm uninstall wva-controller -n workload-variant-autoscaler-system +helm install wva-controller ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=controller-only \ + --set wva.namespaceScoped=false \ + --set wva.prometheus.baseURL="" +``` + +### Model Resources Not Reconciling + +**Problem**: VariantAutoscaling resources are not being reconciled. + +**Solution**: +1. Check controller logs for errors +2. Verify the controller has RBAC permissions for the model namespace +3. Ensure the VariantAutoscaling resource is correctly configured + +```bash +# Check controller logs +kubectl logs -n workload-variant-autoscaler-system -l app.kubernetes.io/name=workload-variant-autoscaler | grep ERROR + +# Check RBAC +kubectl auth can-i get variantautoscalings --as=system:serviceaccount:workload-variant-autoscaler-system:workload-variant-autoscaler-controller-manager -n llm-d-model-a +``` + +### HPA Shows Unknown Metrics + +**Problem**: HPA shows `` for the external metric. + +**Solution**: +1. Verify Prometheus Adapter is installed and configured +2. Check that the VariantAutoscaling resource is emitting metrics +3. Verify the HPA metric selector matches the VariantAutoscaling metric labels + +```bash +# Check external metrics API +kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/llm-d-model-a/inferno_desired_replicas" | jq + +# Check VariantAutoscaling status +kubectl describe variantautoscaling -n llm-d-model-a +``` + +## Rollback + +If you need to rollback to the old installation method: + +```bash +# Uninstall new components +helm uninstall wva-controller -n workload-variant-autoscaler-system +helm uninstall wva-model-a +helm uninstall wva-model-b +# ... uninstall all model releases + +# Reinstall using the old method (installMode=all) +helm install workload-variant-autoscaler ./charts/workload-variant-autoscaler \ + -n workload-variant-autoscaler-system \ + --create-namespace \ + --set installMode=all \ + -f /tmp/current-values.yaml # Use your saved values +``` + +## Benefits of Multi-Model Architecture + +After migration, you'll benefit from: + +- **Isolation**: Each model's resources are independent +- **Flexibility**: Add/remove models without affecting others +- **Scalability**: Scale models independently based on their workload +- **Simplicity**: Simpler management with one controller for all models +- **Reliability**: Failure in one model's resources doesn't affect others + +## Related Documentation + +- [Installation Guide](installation.md) +- [Configuration Guide](configuration.md) +- [Helm Chart README](../../charts/workload-variant-autoscaler/README.md) +- [Deployment Guide](../../deploy/README.md)