diff --git a/README.md b/README.md index c6565428..f86547b7 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo - **Intelligent Autoscaling**: Optimizes replica count by observing the current state of the system - **Cost Optimization**: Minimizes infrastructure costs by picking the correct accelerator variant - @@ -26,17 +26,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo ### Install with Helm (Recommended) -```bash -# Add the WVA Helm repository (when published) -helm upgrade -i workload-variant-autoscaler ./charts/workload-variant-autoscaler \ - --namespace workload-variant-autoscaler-system \ - --set-file prometheus.caCert=/tmp/prometheus-ca.crt \ - --set variantAutoscaling.accelerator=L40S \ - --set variantAutoscaling.modelID=unsloth/Meta-Llama-3.1-8B \ - --set vllmService.enabled=true \ - --set vllmService.nodePort=30000 - --create-namespace -``` +Go to the **INSTALL (on OpenShift)** section [here](charts/workload-variant-autoscaler/README.md) for detailed steps. ### Try it Locally with Kind (No GPU Required!) @@ -64,7 +54,7 @@ See the [Installation Guide](docs/user-guide/installation.md) for detailed instr - [CRD Reference](docs/user-guide/crd-reference.md) - [Multi-Controller Isolation](docs/user-guide/multi-controller-isolation.md) - - - - - ## How It Works @@ -118,7 +108,7 @@ For detailed architecture information, see the [design documentation](docs/desig 1. Platform admin deploys llm-d infrastructure (including model servers) and waits for servers to warm up and start serving requests 2. Platform admin creates a `VariantAutoscaling` CR for the running deployment 3. WVA continuously monitors request rates and server performance via Prometheus metrics - @@ -127,7 +117,7 @@ For detailed architecture information, see the [design documentation](docs/desig 6. External autoscaler (HPA/KEDA) reads the metrics and scales the deployment accordingly **Important Notes**: - - WVA handles the creation order gracefully - you can create the VA before or after the deployment diff --git a/charts/workload-variant-autoscaler/README.md b/charts/workload-variant-autoscaler/README.md index 5556f3a0..d1dbf65d 100644 --- a/charts/workload-variant-autoscaler/README.md +++ b/charts/workload-variant-autoscaler/README.md @@ -59,7 +59,7 @@ helm ls -A ``` export OWNER="llm-d" export WVA_PROJECT="llm-d-workload-variant-autoscaler" -export WVA_RELEASE="v0.4.1" +export WVA_RELEASE="v0.5.1" export WVA_NS="workload-variant-autoscaler-system" export MON_NS="openshift-user-workload-monitoring"