Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 10 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo

- **Intelligent Autoscaling**: Optimizes replica count by observing the current state of the system
- **Cost Optimization**: Minimizes infrastructure costs by picking the correct accelerator variant
<!--
<!--
- **Performance Modeling**: Uses queueing theory (M/M/1/k, M/G/1 models) for accurate latency and throughput prediction
- **Multi-Model Support**: Manages multiple models with different service classes and priorities -->

Expand All @@ -26,17 +26,7 @@ The Workload Variant Autoscaler (WVA) is a Kubernetes-based global autoscaler fo

### Install with Helm (Recommended)

```bash
# Add the WVA Helm repository (when published)
helm upgrade -i workload-variant-autoscaler ./charts/workload-variant-autoscaler \
--namespace workload-variant-autoscaler-system \
--set-file prometheus.caCert=/tmp/prometheus-ca.crt \
--set variantAutoscaling.accelerator=L40S \
--set variantAutoscaling.modelID=unsloth/Meta-Llama-3.1-8B \
--set vllmService.enabled=true \
--set vllmService.nodePort=30000
--create-namespace
```
Go to the **INSTALL (on OpenShift)** section [here](charts/workload-variant-autoscaler/README.md) for detailed steps.

### Try it Locally with Kind (No GPU Required!)

Expand Down Expand Up @@ -64,7 +54,7 @@ See the [Installation Guide](docs/user-guide/installation.md) for detailed instr
- [CRD Reference](docs/user-guide/crd-reference.md)
- [Multi-Controller Isolation](docs/user-guide/multi-controller-isolation.md)

<!--
<!--

### Tutorials
- [Quick Start Demo](docs/tutorials/demo.md)
Expand All @@ -76,13 +66,13 @@ See the [Installation Guide](docs/user-guide/installation.md) for detailed instr
- [KEDA Integration](docs/integrations/keda-integration.md)
- [Prometheus Metrics](docs/integrations/prometheus.md)

<!--
<!--

### Design & Architecture
- [Architecture Overview](docs/design/modeling-optimization.md)
- [Architecture Diagrams](docs/design/diagrams/) - Visual architecture and workflow diagrams
-->
<!--
<!--
### Developer Guide
- [Development Setup](docs/developer-guide/development.md)
- [Contributing](CONTRIBUTING.md)
Expand All @@ -101,24 +91,24 @@ WVA consists of several key components:
- **Reconciler**: Kubernetes controller that manages VariantAutoscaling resources
- **Collector**: Gathers cluster state and vLLM server metrics
-->
<!--
<!--
- **Model Analyzer**: Performs per-model analysis using queueing theory
- **Optimizer**: Makes global scaling decisions across models
-->
<!--
<!--
- **Optimizer**: Capacity model provides saturation based scaling based on threshold
- **Actuator**: Emits metrics to Prometheus and updates deployment replicas
-->

<!--
<!--
For detailed architecture information, see the [design documentation](docs/design/modeling-optimization.md).
-->
## How It Works

1. Platform admin deploys llm-d infrastructure (including model servers) and waits for servers to warm up and start serving requests
2. Platform admin creates a `VariantAutoscaling` CR for the running deployment
3. WVA continuously monitors request rates and server performance via Prometheus metrics
<!--
<!--
4. Model Analyzer estimates latency and throughput using queueing models
5. Optimizer solves for minimal cost allocation meeting all SLOs
-->
Expand All @@ -127,7 +117,7 @@ For detailed architecture information, see the [design documentation](docs/desig
6. External autoscaler (HPA/KEDA) reads the metrics and scales the deployment accordingly

**Important Notes**:
<!--
<!--
- Create the VariantAutoscaling CR **only after** your deployment is warmed up to avoid immediate scale-down
-->
- WVA handles the creation order gracefully - you can create the VA before or after the deployment
Expand Down
2 changes: 1 addition & 1 deletion charts/workload-variant-autoscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ helm ls -A
```
export OWNER="llm-d"
export WVA_PROJECT="llm-d-workload-variant-autoscaler"
export WVA_RELEASE="v0.4.1"
export WVA_RELEASE="v0.5.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5.1 hasn't been released yet. Please use v0.5.0

export WVA_NS="workload-variant-autoscaler-system"
export MON_NS="openshift-user-workload-monitoring"
Expand Down