-
Notifications
You must be signed in to change notification settings - Fork 33
π fix: resolve KEDA APIService conflict for external metrics in nightly E2E #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
919e372
98b92dc
c9610b3
b6e9cff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -608,6 +608,15 @@ jobs: | |||||||
| VLLM_MAX_NUM_SEQS: ${{ env.MAX_NUM_SEQS }} | ||||||||
| # Decode replicas for e2e testing (start with 1 replica, let HPA scale) | ||||||||
| DECODE_REPLICAS: "1" | ||||||||
| # OpenShift uses built-in user-workload monitoring, not a separate namespace | ||||||||
| MONITORING_NAMESPACE: openshift-user-workload-monitoring | ||||||||
| # Disable bearer token auth on WVA /metrics endpoint β OpenShift's | ||||||||
| # user-workload-monitoring cannot authenticate with the controller-manager | ||||||||
| # SA token. The endpoint is still only accessible within the cluster network. | ||||||||
| WVA_METRICS_SECURE: "false" | ||||||||
| # inference-scheduling guide has routing proxy disabled, so vLLM | ||||||||
| # serves directly on port 8000 (not 8200 behind proxy) | ||||||||
| VLLM_SVC_PORT: "8000" | ||||||||
| run: | | ||||||||
| echo "Deploying WVA and llm-d infrastructure..." | ||||||||
| echo " MODEL_ID: $MODEL_ID" | ||||||||
|
|
@@ -639,9 +648,38 @@ jobs: | |||||||
| - name: Wait for infrastructure to be ready | ||||||||
| run: | | ||||||||
| echo "Waiting for WVA controller to be ready..." | ||||||||
| kubectl wait --for=condition=available --timeout=300s deployment -l app.kubernetes.io/name=workload-variant-autoscaler -n "$WVA_NAMESPACE" || true | ||||||||
| kubectl rollout status deployment -l app.kubernetes.io/name=workload-variant-autoscaler -n "$WVA_NAMESPACE" --timeout=300s || true | ||||||||
| kubectl get pods -n "$WVA_NAMESPACE" | ||||||||
| echo "Waiting for llm-d deployment (Model A1) to be ready..." | ||||||||
|
|
||||||||
| # Ensure the vLLM deployment has the correct replica count. | ||||||||
| # A previous failed run's "Scale down GPU workloads" step may have set replicas=0 | ||||||||
| # and helmfile doesn't override manually-changed replicas on re-deploy. | ||||||||
| # kubectl rollout status returns instantly on 0-replica deployments, so we must | ||||||||
| # ensure replicas > 0 before waiting. | ||||||||
| DESIRED_REPLICAS="${DECODE_REPLICAS:-1}" | ||||||||
| CURRENT_REPLICAS=$(kubectl get deployment ms-inference-scheduling-llm-d-modelservice-decode -n "$LLMD_NAMESPACE" -o jsonpath='{.spec.replicas}' 2>/dev/null || echo "0") | ||||||||
| if [ "$CURRENT_REPLICAS" -eq 0 ]; then | ||||||||
| echo "WARNING: Model A1 deployment has 0 replicas (likely from previous failed run cleanup)" | ||||||||
| echo "Scaling to $DESIRED_REPLICAS replica(s)..." | ||||||||
| kubectl scale deployment/ms-inference-scheduling-llm-d-modelservice-decode -n "$LLMD_NAMESPACE" --replicas="$DESIRED_REPLICAS" || { | ||||||||
| echo "ERROR: Failed to scale Model A1 deployment" | ||||||||
| exit 1 | ||||||||
| } | ||||||||
| fi | ||||||||
|
|
||||||||
| echo "Waiting for Model A1 vLLM deployment to be ready (up to 25 minutes for model loading)..." | ||||||||
| # kubectl rollout status waits for all replicas to be Ready, unlike | ||||||||
| # --for=condition=available which is satisfied even at 0 ready replicas. | ||||||||
| # vLLM model loading takes 15-20 minutes, so we use a 25-minute timeout. | ||||||||
| kubectl rollout status deployment/ms-inference-scheduling-llm-d-modelservice-decode -n "$LLMD_NAMESPACE" --timeout=1500s || { | ||||||||
| echo "WARNING: Model A1 deployment not ready after 25 minutes" | ||||||||
| echo "=== Pod status ===" | ||||||||
| kubectl get pods -n "$LLMD_NAMESPACE" | ||||||||
| echo "=== Deployment conditions ===" | ||||||||
| kubectl get deployment ms-inference-scheduling-llm-d-modelservice-decode -n "$LLMD_NAMESPACE" -o jsonpath='{.status.conditions}' | jq . || true | ||||||||
| echo "=== Recent events ===" | ||||||||
| kubectl get events -n "$LLMD_NAMESPACE" --sort-by='.lastTimestamp' | tail -20 | ||||||||
|
||||||||
| kubectl get events -n "$LLMD_NAMESPACE" --sort-by='.lastTimestamp' | tail -20 | |
| kubectl get events -n "$LLMD_NAMESPACE" --sort-by='.lastTimestamp' | tail -20 | |
| exit 1 |
Copilot
AI
Feb 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step pipes kubectl jsonpath output into jq (e.g. -o jsonpath='{.metadata.labels}' | jq ...). jsonpath output isnβt guaranteed to be valid JSON (often prints Go map syntax), so jq will fail and the diagnostics will silently be missing. Use kubectl ... -o json | jq ... (or -o jsonpath without jq) to ensure the output is parseable.
| kubectl get pod "$VLLM_POD" -n "$ns" -o jsonpath='{.metadata.labels}' | jq -r 'to_entries[] | " \(.key)=\(.value)"' 2>/dev/null || true | |
| kubectl get pod "$VLLM_POD" -n "$ns" -o json | jq -r '.metadata.labels | to_entries[] | " \(.key)=\(.value)"' 2>/dev/null || true |
Copilot
AI
Feb 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, kubectl get svc ... -o jsonpath='{.spec.selector}' | jq . will typically not produce valid JSON for jq to parse, so the βService selectorβ debug output may be empty. Switch to -o json | jq '.spec.selector' (or drop jq) so this diagnostic is reliable.
| kubectl get svc "$SVC_NAME" -n "$ns" -o jsonpath='{.spec.selector}' 2>/dev/null | jq . || true | |
| kubectl get svc "$SVC_NAME" -n "$ns" -o json 2>/dev/null | jq '.spec.selector' || true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| https://docs.google.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl get deployment ... -o jsonpath='{.status.conditions}' | jq .is unlikely to work because jsonpath output isnβt valid JSON for jq. Usekubectl get deployment ... -o json | jq '.status.conditions'so the deployment-condition diagnostics reliably show up when rollouts time out.