Conversation
|
@lionelvillard pls review. Thanks. |
mamy-CS
left a comment
There was a problem hiding this comment.
Some comments. Thank you for the cleanup @shuynh2017
| wva: | ||
| controllerInstance: "my-unique-instance-id" | ||
| ``` | ||
| When running multiple WVA controllers in the same cluster (e.g., for parallel e2e tests or multi-tenant environments), use the `controllerInstance` configuration to prevent metrics conflicts between controllers. See [Multi-Controller Isolation](../../docs/user-guide/multi-controller-isolation.md) for details configuration. |
There was a problem hiding this comment.
| When running multiple WVA controllers in the same cluster (e.g., for parallel e2e tests or multi-tenant environments), use the `controllerInstance` configuration to prevent metrics conflicts between controllers. See [Multi-Controller Isolation](../../docs/user-guide/multi-controller-isolation.md) for details configuration. | |
| When running multiple WVA controllers in the same cluster (e.g., for parallel e2e tests or multi-tenant environments), use the `controllerInstance` configuration to prevent metrics conflicts between controllers. See [Multi-Controller Isolation](../../docs/user-guide/multi-controller-isolation.md) for detailed configuration. |
| # - Prometheus and monitoring stack | ||
| # - vLLM emulator for testing | ||
| # See deploy/kind-emulator/README.md for detailed instructions | ||
| make deploy-llm-d-wva-emulated-on-kind |
There was a problem hiding this comment.
The chart readme had a cleanup section that was removed. Consider adding cleanup instructions here maybe?
| ``` | ||
| helm upgrade -i wva-model-a ./workload-variant-autoscaler \ | ||
| -n $WVA_NS \ | ||
| --set controller.enabled=false \ | ||
| --set va.enabled=true \ | ||
| --set hpa.enabled=true \ | ||
| --set llmd.namespace=team-a \ | ||
| --set llmd.modelName=my-model-a \ | ||
| --set llmd.modelID="meta-llama/Llama-3.1-8" |
There was a problem hiding this comment.
missing some important configuration options that were in the original chart readme, such as va.accelerator. Add the complete example
helm upgrade -i wva-model-a ./workload-variant-autoscaler \
-n $WVA_NS \
--set controller.enabled=false \
--set va.enabled=true \
--set hpa.enabled=true \
--set va.accelerator=L40S \
--set llmd.namespace=team-a \
--set llmd.modelName=my-model-a \
--set llmd.modelID="meta-llama/Llama-3.1-8" \
--set vllmService.enabled=true \
--set vllmService.nodePort=30000
| export WVA_PROJECT=$PWD | ||
| helm repo add prometheus-community https://prometheus-community.github.io/helm-charts | ||
| helm repo update | ||
| ``` |
There was a problem hiding this comment.
Maybe add a validation step after step 1 to verify that the CA cert file was created, to avoid issues?
| --set va.enabled=false \ | ||
| --set hpa.enabled=false \ | ||
| --set vllmService.enabled=false | ||
| ``` |
There was a problem hiding this comment.
Maybe add a verification step after step 3 to confirm the controller is running, before moving further?
| ## Installation Methods | ||
|
|
||
| ### Option 1: Helm Installation (Recommended) | ||
| ### Option 1: Helm Installation (Recommended, on OpenShift) |
There was a problem hiding this comment.
The heading says recommended, on OpenShift, but the content is entirely openShift-specific. Might be confusing for other non openshift users. Clarify here.
| helm upgrade -i workload-variant-autoscaler ./workload-variant-autoscaler \ | ||
| -n $WVA_NS \ | ||
| --set-file wva.prometheus.caCert=/tmp/prometheus-ca.crt \ | ||
| --set controller.enabled=true \ |
There was a problem hiding this comment.
The --set controller.enabled=true is explicit but redundant (it's the default). This is fine for clarity, but you could also remove it to reduce verbosity. Either approach works, I guess keeping it makes the intent clear.
|
@mamy-CS thank you for your comments. @lionelvillard also provided ideas for further organization. I will update the PR. |
This PR: