llm-d · shuynh2017 · Feb 15, 2026 · lionelvillard · Feb 16, 2026
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo
 
 - **Intelligent Autoscaling**: Optimizes replica count by observing the current state of the system
 - **Cost Optimization**: Minimizes infrastructure costs by picking the correct accelerator variant
-<!-- 
+<!--
 - **Performance Modeling**: Uses queueing theory (M/M/1/k, M/G/1 models) for accurate latency and throughput prediction
 - **Multi-Model Support**: Manages multiple models with different service classes and priorities -->
 
@@ -26,17 +26,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo
 
 ### Install with Helm (Recommended)
 
-```bash
-# Add the WVA Helm repository (when published)
-helm upgrade -i workload-variant-autoscaler ./charts/workload-variant-autoscaler \
-  --namespace workload-variant-autoscaler-system \
-  --set-file prometheus.caCert=/tmp/prometheus-ca.crt \
-  --set variantAutoscaling.accelerator=L40S \
-  --set variantAutoscaling.modelID=unsloth/Meta-Llama-3.1-8B \
-  --set vllmService.enabled=true \
-  --set vllmService.nodePort=30000
-  --create-namespace
-```
+Go to the **INSTALL (on OpenShift)** section [here](charts/workload-variant-autoscaler/README.md) for detailed steps.
 
 ### Try it Locally with Kind (No GPU Required!)
 
@@ -64,7 +54,7 @@ See the [Installation Guide](docs/user-guide/installation.md) for detailed instr
 - [CRD Reference](docs/user-guide/crd-reference.md)
 - [Multi-Controller Isolation](docs/user-guide/multi-controller-isolation.md)
 
-<!-- 
+<!--
 
 ### Tutorials
 - [Quick Start Demo](docs/tutorials/demo.md)
@@ -76,13 +66,13 @@ See the [Installation Guide](docs/user-guide/installation.md) for detailed instr
 - [KEDA Integration](docs/integrations/keda-integration.md)
 - [Prometheus Metrics](docs/integrations/prometheus.md)
 
-<!-- 
+<!--
 
 ### Design & Architecture
 - [Architecture Overview](docs/design/modeling-optimization.md)
 - [Architecture Diagrams](docs/design/diagrams/) - Visual architecture and workflow diagrams
 -->
-<!-- 
+<!--
 ### Developer Guide
 - [Development Setup](docs/developer-guide/development.md)
 - [Contributing](CONTRIBUTING.md)
@@ -101,24 +91,24 @@ WVA consists of several key components:
 - **Reconciler**: Kubernetes controller that manages VariantAutoscaling resources
 - **Collector**: Gathers cluster state and vLLM server metrics
 -->
-<!-- 
+<!--
 - **Model Analyzer**: Performs per-model analysis using queueing theory
 - **Optimizer**: Makes global scaling decisions across models
 -->
-<!-- 
+<!--
 - **Optimizer**: Capacity model provides saturation based scaling based on threshold
 - **Actuator**: Emits metrics to Prometheus and updates deployment replicas
 -->
 
-<!-- 
+<!--
 For detailed architecture information, see the [design documentation](docs/design/modeling-optimization.md).
 -->
 ## How It Works
 
 1. Platform admin deploys llm-d infrastructure (including model servers) and waits for servers to warm up and start serving requests
 2. Platform admin creates a `VariantAutoscaling` CR for the running deployment
 3. WVA continuously monitors request rates and server performance via Prometheus metrics
-<!-- 
+<!--
 4. Model Analyzer estimates latency and throughput using queueing models
 5. Optimizer solves for minimal cost allocation meeting all SLOs
 -->
@@ -127,7 +117,7 @@ For detailed architecture information, see the [design documentation](docs/desig
 6. External autoscaler (HPA/KEDA) reads the metrics and scales the deployment accordingly
 
 **Important Notes**:
-<!-- 
+<!--
 - Create the VariantAutoscaling CR **only after** your deployment is warmed up to avoid immediate scale-down
 -->
 - WVA handles the creation order gracefully - you can create the VA before or after the deployment

diff --git a/charts/workload-variant-autoscaler/README.md b/charts/workload-variant-autoscaler/README.md
@@ -59,7 +59,7 @@ helm ls -A
 ```
 export OWNER="llm-d"
 export WVA_PROJECT="llm-d-workload-variant-autoscaler"
-export WVA_RELEASE="v0.4.1"
+export WVA_RELEASE="v0.5.1"
 export WVA_NS="workload-variant-autoscaler-system"
 export MON_NS="openshift-user-workload-monitoring"