-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Exclude ResourceHealthStatus tests from DRA jobs #35195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude ResourceHealthStatus tests from DRA jobs #35195
Conversation
The `ResourceHealthStatus` feature, introduced in kubernetes/kubernetes#130606, adds new node e2e tests that are protected by a feature gate. These new tests are labeled with `FeatureGate:ResourceHealthStatus`. However, existing DRA presubmit jobs (e.g., `pull-kubernetes-node-e2e-crio-cgrpv1-dra`) were selecting them based on the broader `DRA` label. These jobs do not enable the `ResourceHealthStatus` feature gate, causing the new tests to fail. This blocks the corresponding Kubernetes PR. This change modifies the `dra.jinja` template to add `!FeatureGate:ResourceHealthStatus` to the `label-filter` for all `node` and `e2e` job types. This prevents these jobs from selecting the new feature-gated tests, which should only run in their dedicated presubmit job where the feature is enabled. This change regenerates the presubmit, CI, and canary job configurations to apply the updated filter.
this change looks OK. One qq - do we need to include DRA in |
trap atexit EXIT | ||
KUBECONFIG=${HOME}/.kube/config ${ginkgo} run --nodes=8 --timeout=24h --silence-skips --force-newlines --no-color --label-filter="DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Alpha && !Flaky && !Slow" ${e2e_test} -- -provider=local -report-dir="${ARTIFACTS}" -report-complete-ginkgo -report-complete-junit & | ||
KUBECONFIG=${HOME}/.kube/config ${ginkgo} run --nodes=8 --timeout=24h --silence-skips --force-newlines --no-color --label-filter="DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !FeatureGate:ResourceHealthStatus && !Alpha && !Flaky && !Slow" ${e2e_test} -- -provider=local -report-dir="${ARTIFACTS}" -report-complete-ginkgo -report-complete-junit & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has !Alpha
so we probably don't need to change it? But I guess this is generated from the exciting jinja template ... 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these changes were auto generated from the jinja
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No matter where this expression appears, this does not look right to me. The test has the Alpha
and Feature:OffByDefault
labels and shouldn't have run. Why did it?
E2eNode Suite: [It] [sig-node] [DRA] [Feature:DynamicResourceAllocation] [FeatureGate:DynamicResourceAllocation] Resource Health [FeatureGate:ResourceHealthStatus] [Alpha] [Feature:OffByDefault] [Serial] should reflect device health changes in the Pod's status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are modifying the wrong job, this is pull-kubernetes-kind-dra-canary
.
Yes, when it goes GA the featuregate labeling of the tests can/should be dropped (not a hard requirement, but it doesn't make sense to have these, especially for conformance tests, which some of them should be promoted to) So we could drop that as it goes GA and that might also solve it. Another thought was there are two types of jobs here:
1) should already be fine, and 2) could be fixed by enabling this featuregate on those jobs instead, which might make more sense, to get more coverage? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jpsassine, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Jpsassine: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- --gcp-zone=us-central1-b | ||
- --parallelism=1 | ||
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !Flaky {%- if not ci %} && !Slow {%- endif %}' | ||
- '--label-filter=DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !FeatureGate:ResourceHealthStatus && !Flaky {%- if not ci %} && !Slow {%- endif %}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bogus, please remove again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here in follow up, #35225
{%- endif %} | ||
|
||
KUBECONFIG=${HOME}/.kube/config ${ginkgo} run --nodes=8 --timeout=24h --silence-skips --force-newlines --no-color --label-filter="DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } {%- if kubelet_skew|int > 0 %}$kubelet_label_filter{%- endif %} {%- if not all_features %} && !Alpha {%- endif %} && !Flaky {%- if not ci and not allow_slow %} && !Slow {%- endif %}" ${e2e_test} -- -provider=local -report-dir="${ARTIFACTS}" -report-complete-ginkgo -report-complete-junit & | ||
KUBECONFIG=${HOME}/.kube/config ${ginkgo} run --nodes=8 --timeout=24h --silence-skips --force-newlines --no-color --label-filter="DRA && Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation } && !FeatureGate:ResourceHealthStatus {%- if kubelet_skew|int > 0 %}$kubelet_label_filter{%- endif %} {%- if not all_features %} && !Alpha {%- endif %} && !Flaky {%- if not ci and not allow_slow %} && !Slow {%- endif %}" ${e2e_test} -- -provider=local -report-dir="${ARTIFACTS}" -report-complete-ginkgo -report-complete-junit & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a deny-list of all unsupported features is the conceptually wrong approach. It'll break each time someone adds a new feature gate with a corresponding e2e_node test.
The right expression is Feature: isSubsetOf { OffByDefault, DynamicResourceAllocation }
, then the job runs only tests for features that are enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-term we need two sets of DRA e2e_node jobs, just as we do now for e2e:
- Only on-by-default features. That can be merge-blocking.
- All features, i.e. including alpha features. Not merge blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, apologies, this was a short sighted immediate resolve. I have taken the advice and created a follow-up PR here #35225 which should now properly abide by the exclusionary !Alpha
label. I've also removed the deny-list hacky approach.
This pull request is a follow-up to kubernetes#35195 and addresses feedback from @pohly. The previous PR used a "deny-list" to exclude `ResourceHealthStatus` tests from DRA jobs. This change implements a more robust "allow-list" approach by using the `!Alpha` filter for node jobs, ensuring that only tests for on-by-default features are run. This prevents jobs from breaking when new, unrelated Alpha features are added in the future, improving the long-term maintainability of the test configuration.
The
ResourceHealthStatus
feature, introduced in kubernetes/kubernetes#130606, adds new node e2e tests that are protected by a feature gate.These new tests are labeled with
FeatureGate:ResourceHealthStatus
. However, existing DRA presubmit jobs (e.g.,pull-kubernetes-node-e2e-crio-cgrpv1-dra
) were selecting them based on the broaderDRA
label.These jobs do not enable the
ResourceHealthStatus
feature gate, causing the new tests to fail. This blocks the corresponding Kubernetes PR.This change modifies the
dra.jinja
template to add!FeatureGate:ResourceHealthStatus
to thelabel-filter
for allnode
ande2e
job types. This prevents these jobs from selecting the new feature-gated tests, which should only run in their dedicated presubmit job where the feature is enabled.This change regenerates the presubmit, CI, and canary job configurations to apply the updated filter.