ARCHVTEAMS-1583 serialize k8s-training CI and fix kuberay test variable#858
Closed
aaronbfagan wants to merge 1 commit intomainfrom
Closed
ARCHVTEAMS-1583 serialize k8s-training CI and fix kuberay test variable#858aaronbfagan wants to merge 1 commit intomainfrom
aaronbfagan wants to merge 1 commit intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release Notes (Mandatory Description)
This PR introduces a minimal CI hardening change for
k8s-trainingonly:k8s-trainingTerraform Plan/Test runs across PRs.k8s-trainingtests.Problem
k8s-trainingruns in a shared, GPU-constrained test environment. Parallel PR runs can contend for the same limited resources and produce flaky/non-deterministic failures.Additionally, the KubeRay test suite referenced a deprecated variable name (
enable_kuberay) that is no longer declared by the module, causing warnings and reducing test signal quality.Changes
1) CI serialization for
k8s-trainingonlyUpdated
.github/workflows/terraform.ymlterraformmatrix job concurrency:k8s-traininguses a fixed concurrency group:k8s-training-gpu-cik8s-trainingsolutions keep per-run unique concurrency groupscancel-in-progress: falseensures queued runs wait instead of canceling active runsThis keeps the change narrowly scoped to the path with known shared-capacity contention.
2) KubeRay test variable fix
Updated
k8s-training/tests/k8s-training-kuberay.tftest.hcl:enable_kuberay = trueenable_kuberay_cluster = trueThis aligns the test with current module inputs and removes undeclared-variable warnings.
Why this approach
This is intentionally a small, self-contained change set to reduce risk:
k8s-trainingtest paths beyond existing run-level behaviorExpected outcome
k8s-trainingPlan/Test jobs no longer execute concurrently across PRs.enable_kuberay.Validation performed
.github/workflows/terraform.ymlk8s-training/tests/k8s-training-kuberay.tftest.hclk8s-trainingfor global serialization.DoD alignment
k8s-trainingCI execution is single-threaded across PRs.