fix: resolve CI typos, broken links, and PR e2e-openshift failures

clubanderson · claude · clubanderson · commit b4f4a0dbeb2c · 2026-02-13T16:19:59.000-05:00
- _typos.toml: add "abd" to false positive list (Go module hash in
  go.sum)
- deploy/README.md: fix broken link to docs/architecture.md (removed),
  point to docs/design/ instead
- docs/metrics-health-monitoring.md: fix broken link to custom-metrics.md
  (removed), point to integrations/prometheus.md
- docs/README.md: fix broken link to user-guide/faq.md (never created),
  point to user-guide/troubleshooting.md
- ci-e2e-openshift.yaml: add MONITORING_NAMESPACE and WVA_METRICS_SECURE
  env vars to Model A1 and Model B deploy steps, and wva.metrics.secure
  helm value for Model B. Without these, OpenShift's user-workload
  monitoring cannot scrape the WVA controller metrics (same root cause
  as the nightly ServiceMonitor fix).

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
Signed-off-by: Andrew Anderson &lt;andy@clubanderson.com&gt;
diff --git a/.github/workflows/ci-e2e-openshift.yaml b/.github/workflows/ci-e2e-openshift.yaml
@@ -608,6 +608,12 @@ jobs:
           VLLM_MAX_NUM_SEQS: ${{ env.MAX_NUM_SEQS }}
           # Decode replicas for e2e testing (start with 1 replica, let HPA scale)
           DECODE_REPLICAS: "1"
+          # OpenShift uses built-in user-workload monitoring, not a separate namespace
+          MONITORING_NAMESPACE: openshift-user-workload-monitoring
+          # Disable bearer token auth on WVA /metrics endpoint — OpenShift's
+          # user-workload-monitoring cannot authenticate with the controller-manager
+          # SA token. The endpoint is still only accessible within the cluster network.
+          WVA_METRICS_SECURE: "false"
         run: |
           echo "Deploying WVA and llm-d infrastructure..."
           echo "  MODEL_ID: $MODEL_ID"
@@ -666,6 +672,9 @@ jobs:
           VLLM_MAX_NUM_SEQS: ${{ env.MAX_NUM_SEQS }}
           # Decode replicas for e2e testing (start with 1 replica, let HPA scale)
           DECODE_REPLICAS: "1"
+          # OpenShift monitoring settings (same as Model A1 deploy)
+          MONITORING_NAMESPACE: openshift-user-workload-monitoring
+          WVA_METRICS_SECURE: "false"
         run: |
           echo "Deploying Model B infrastructure in $LLMD_NAMESPACE_B..."
           echo "  MODEL_ID: $MODEL_ID"
@@ -707,6 +716,7 @@ jobs:
             --set va.accelerator="$ACCELERATOR_TYPE" \
             --set wva.baseName="inference-scheduling" \
             --set wva.prometheus.monitoringNamespace=openshift-user-workload-monitoring \
+            --set wva.metrics.secure=false \
             --set wva.controllerInstance="$CONTROLLER_INSTANCE"
 
           echo "Model B WVA resources deployed"
diff --git a/_typos.toml b/_typos.toml
@@ -8,6 +8,7 @@ parms = "parms"
 # Short code fragments / false positives
 ot = "ot"
 vas = "vas"
+abd = "abd"
 
 # Pre-existing typos in codebase (to be fixed separately)
 accelarator = "accelarator"
diff --git a/deploy/README.md b/deploy/README.md
@@ -999,4 +999,4 @@ kubectl get configmap model-accelerator-data -n workload-variant-autoscaler-syst
 - **OpenShift Guide**: [openshift/README.md](openshift/README.md)
 - **Helm Chart**: [charts/workload-variant-autoscaler](../charts/workload-variant-autoscaler/)
 - **API Reference**: [api/v1alpha1](../api/v1alpha1/)
-- **Architecture**: [docs/architecture.md](../docs/architecture.md)
+- **Architecture**: [docs/design](../docs/design/)
diff --git a/docs/README.md b/docs/README.md
@@ -63,7 +63,7 @@ Contributing to WVA:
 
 ## Need Help?
 
-- Check the [FAQ](user-guide/faq.md) (coming soon)
+- Check the [Troubleshooting Guide](user-guide/troubleshooting.md)
 - Open a [GitHub Issue](https://github.com/llm-d/llm-d-workload-variant-autoscaler/issues)
 - Join community meetings
 
diff --git a/docs/metrics-health-monitoring.md b/docs/metrics-health-monitoring.md
@@ -130,6 +130,6 @@ When metrics are unavailable, WVA implements graceful degradation:
 
 ## Related Documentation
 
-- [Custom Metrics](./custom-metrics.md)
+- [Prometheus Integration](./integrations/prometheus.md)
 - [ServiceMonitor Configuration](../config/prometheus)