Skip to content

Commit b4f4a0d

Browse files
clubandersonclaude
andcommitted
fix: resolve CI typos, broken links, and PR e2e-openshift failures
- _typos.toml: add "abd" to false positive list (Go module hash in go.sum) - deploy/README.md: fix broken link to docs/architecture.md (removed), point to docs/design/ instead - docs/metrics-health-monitoring.md: fix broken link to custom-metrics.md (removed), point to integrations/prometheus.md - docs/README.md: fix broken link to user-guide/faq.md (never created), point to user-guide/troubleshooting.md - ci-e2e-openshift.yaml: add MONITORING_NAMESPACE and WVA_METRICS_SECURE env vars to Model A1 and Model B deploy steps, and wva.metrics.secure helm value for Model B. Without these, OpenShift's user-workload monitoring cannot scrape the WVA controller metrics (same root cause as the nightly ServiceMonitor fix). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Andrew Anderson <andy@clubanderson.com>
1 parent 43d9343 commit b4f4a0d

File tree

5 files changed

+14
-3
lines changed

5 files changed

+14
-3
lines changed

.github/workflows/ci-e2e-openshift.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,6 +608,12 @@ jobs:
608608
VLLM_MAX_NUM_SEQS: ${{ env.MAX_NUM_SEQS }}
609609
# Decode replicas for e2e testing (start with 1 replica, let HPA scale)
610610
DECODE_REPLICAS: "1"
611+
# OpenShift uses built-in user-workload monitoring, not a separate namespace
612+
MONITORING_NAMESPACE: openshift-user-workload-monitoring
613+
# Disable bearer token auth on WVA /metrics endpoint — OpenShift's
614+
# user-workload-monitoring cannot authenticate with the controller-manager
615+
# SA token. The endpoint is still only accessible within the cluster network.
616+
WVA_METRICS_SECURE: "false"
611617
run: |
612618
echo "Deploying WVA and llm-d infrastructure..."
613619
echo " MODEL_ID: $MODEL_ID"
@@ -666,6 +672,9 @@ jobs:
666672
VLLM_MAX_NUM_SEQS: ${{ env.MAX_NUM_SEQS }}
667673
# Decode replicas for e2e testing (start with 1 replica, let HPA scale)
668674
DECODE_REPLICAS: "1"
675+
# OpenShift monitoring settings (same as Model A1 deploy)
676+
MONITORING_NAMESPACE: openshift-user-workload-monitoring
677+
WVA_METRICS_SECURE: "false"
669678
run: |
670679
echo "Deploying Model B infrastructure in $LLMD_NAMESPACE_B..."
671680
echo " MODEL_ID: $MODEL_ID"
@@ -707,6 +716,7 @@ jobs:
707716
--set va.accelerator="$ACCELERATOR_TYPE" \
708717
--set wva.baseName="inference-scheduling" \
709718
--set wva.prometheus.monitoringNamespace=openshift-user-workload-monitoring \
719+
--set wva.metrics.secure=false \
710720
--set wva.controllerInstance="$CONTROLLER_INSTANCE"
711721
712722
echo "Model B WVA resources deployed"

_typos.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ parms = "parms"
88
# Short code fragments / false positives
99
ot = "ot"
1010
vas = "vas"
11+
abd = "abd"
1112

1213
# Pre-existing typos in codebase (to be fixed separately)
1314
accelarator = "accelarator"

deploy/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -999,4 +999,4 @@ kubectl get configmap model-accelerator-data -n workload-variant-autoscaler-syst
999999
- **OpenShift Guide**: [openshift/README.md](openshift/README.md)
10001000
- **Helm Chart**: [charts/workload-variant-autoscaler](../charts/workload-variant-autoscaler/)
10011001
- **API Reference**: [api/v1alpha1](../api/v1alpha1/)
1002-
- **Architecture**: [docs/architecture.md](../docs/architecture.md)
1002+
- **Architecture**: [docs/design](../docs/design/)

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Contributing to WVA:
6363

6464
## Need Help?
6565

66-
- Check the [FAQ](user-guide/faq.md) (coming soon)
66+
- Check the [Troubleshooting Guide](user-guide/troubleshooting.md)
6767
- Open a [GitHub Issue](https://github.com/llm-d/llm-d-workload-variant-autoscaler/issues)
6868
- Join community meetings
6969

docs/metrics-health-monitoring.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,6 @@ When metrics are unavailable, WVA implements graceful degradation:
130130

131131
## Related Documentation
132132

133-
- [Custom Metrics](./custom-metrics.md)
133+
- [Prometheus Integration](./integrations/prometheus.md)
134134
- [ServiceMonitor Configuration](../config/prometheus)
135135

0 commit comments

Comments
 (0)