chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1#3127
chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1#3127google-oss-prow[bot] merged 6 commits intokubeflow:masterfrom
Conversation
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Fixes: #3104 |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the project to Go 1.25, Kubernetes 1.35, and controller-runtime v0.23.1, and regenerates API clients/CRDs/OpenAPI artifacts to align with the new Kubernetes APIs and controller-runtime webhook changes.
Changes:
- Bump Go toolchain version and key Kubernetes/controller-runtime dependencies (including kube-openapi, utils, structured-merge-diff, json, cobra, pflag, etc.).
- Update admission webhooks to use the new typed validator interfaces (e.g.,
TrainJobValidator,TrainingRuntimeValidator,ClusterTrainingRuntimeValidator) and adjust webhook setup wiring accordingly. - Regenerate Go clientsets/informers/applyconfigurations, Python OpenAPI models, and CRDs/Swagger to match Kubernetes 1.35 schema updates (new fields, updated descriptions, toleration operators, Job/Pod/PVC semantics, etc.).
Reviewed changes
Copilot reviewed 66 out of 68 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod / go.sum | Bump Go version to 1.25.0, upgrade Kubernetes libraries to v0.35.0, controller-runtime to v0.23.1, and refresh related dependency versions. |
| pkg/webhooks/trainjob_webhook.go | Replace TrainJobWebhook with typed TrainJobValidator, adapt webhook manager construction to the new controller-runtime API, and keep TrainJob validation logic wired through the runtime registry. |
| pkg/webhooks/trainingruntime_webhook.go | Convert to TrainingRuntimeValidator with typed signatures and new ctrl.NewWebhookManagedBy(mgr, &trainer.TrainingRuntime{}) usage; validation still delegates to validateReplicatedJobs. |
| pkg/webhooks/clustertrainingruntime_webhook.go | Convert to ClusterTrainingRuntimeValidator with typed signatures and updated webhook setup; maintains deprecation warning logic and reused validateReplicatedJobs. |
| pkg/webhooks/setup.go | Update webhook setup wiring so only TrainJobValidator receives the runtime registry; cluster and namespaced TrainingRuntime webhooks no longer depend on it. |
| pkg/webhooks/trainjob_webhook_test.go | Extend TestValidateCreate for TrainJob to cover unsupported and deprecated runtimes, and switch to the new TrainJobValidator type. |
| pkg/webhooks/clustertrainingruntime_webhook_test.go | Update tests to construct ClusterTrainingRuntimeValidator directly. |
| pkg/client/clientset/versioned/fake/clientset_generated.go | Regenerated fake clientset, including a marker for WatchList semantics support and watch reactor updates (auto-generated). |
| pkg/client/informers/externalversions/** | Swap raw &cache.ListWatch{...} with cache.ToListWatcherWithWatchListSemantics to align with new Kubernetes reflector semantics (auto-generated). |
| pkg/client/applyconfiguration/trainer/v1alpha1/** | Regenerated apply-configuration types with richer field-level documentation for Trainer/TrainJob/TrainingRuntime and related types (auto-generated). |
| manifests/base/crds/** & charts/kubeflow-trainer/crds/** | Regenerated CRDs with Kubernetes 1.35 schema updates (toleration operators, volume resize semantics, PodCertificate userAnnotations, workloadRef, etc.). |
| api/python_api/kubeflow_trainer_api/models/** | Regenerated Python client models and added new types (e.g., WorkloadReference, InternalEvent, JobSet volume policies), plus docstring updates to track the new OpenAPI schema. |
| api/openapi-spec/swagger.json | Regenerated swagger spec to reflect the updated Kubernetes and trainer APIs, including new fields and updated descriptions. |
| CONTRIBUTING.md | Update documented minimum Go version requirement from 1.24 to 1.25. |
| .golangci.yaml / .golangci-kal.yml | Drop explicit Go version override and tweak a kal rule list format to remain valid YAML with the newer toolchain. |
| runtimeRefGK := runtime.RuntimeRefToRuntimeRegistryKey(obj.Spec.RuntimeRef) | ||
| runtime, ok := w.runtimes[runtimeRefGK] | ||
| if !ok { | ||
| return nil, fmt.Errorf("unsupported runtime: %s", runtimeRefGK) | ||
| } | ||
| warnings, errors := runtime.ValidateObjects(ctx, nil, trainJob) | ||
| warnings, errors := runtime.ValidateObjects(ctx, nil, obj) |
There was a problem hiding this comment.
ValidateCreate returns a plain fmt.Errorf when the referenced runtime is not found, but the associated test (TestValidateCreate's "unsupported runtime" case) expects an aggregated field.ErrorList marking spec.RuntimeRef as invalid, so this implementation both breaks that test and surfaces a less precise validation error to API consumers.
| runtimeRefGK := runtime.RuntimeRefToRuntimeRegistryKey(newObj.Spec.RuntimeRef) | ||
| runtime, ok := w.runtimes[runtimeRefGK] | ||
| if !ok { | ||
| return nil, fmt.Errorf("unsupported runtime: %s", runtimeRefGK) | ||
| } | ||
| warnings, errors := runtime.ValidateObjects(ctx, oldTrainJob, newTrainJob) | ||
| warnings, errors := runtime.ValidateObjects(ctx, oldObj, newObj) |
There was a problem hiding this comment.
ValidateUpdate mirrors ValidateCreate by returning a plain fmt.Errorf for unsupported runtimes, but for consistency with TestValidateCreate and to provide field-scoped admission errors it should also return an aggregated field.ErrorList that points at spec.RuntimeRef instead of a generic error.
|
|
||
| from __future__ import annotations | ||
| import pprint | ||
| import re # noqa: F401 |
There was a problem hiding this comment.
Import of 're' is not used.
| import re # noqa: F401 |
|
|
||
| from __future__ import annotations | ||
| import pprint | ||
| import re # noqa: F401 |
There was a problem hiding this comment.
Import of 're' is not used.
| import re # noqa: F401 |
|
|
||
| from __future__ import annotations | ||
| import pprint | ||
| import re # noqa: F401 |
There was a problem hiding this comment.
Import of 're' is not used.
| import re # noqa: F401 |
|
|
||
| from __future__ import annotations | ||
| import pprint | ||
| import re # noqa: F401 |
There was a problem hiding this comment.
Import of 're' is not used.
| import re # noqa: F401 |
Pull Request Test Coverage Report for Build 21405922660Details
💛 - Coveralls |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
| runtimeRec := NewTrainingRuntimeReconciler( | ||
| mgr.GetClient(), | ||
| mgr.GetEventRecorderFor("trainer-trainingruntime-controller"), | ||
| mgr.GetEventRecorder("trainer-trainingruntime-controller"), |
There was a problem hiding this comment.
I also migrated to the new Events API due to this: kubernetes-sigs/controller-runtime#3262
Let me know if that looks good @tenzen-y @astefanutti.
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
9ee9da5 to
4d314e8
Compare
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
1eef167 to
cb1e102
Compare
| version: "2" | ||
|
|
||
| run: | ||
| go: "1.24" |
There was a problem hiding this comment.
why do we need to remove this?
There was a problem hiding this comment.
We don't have it in Kueue: https://github.com/kubernetes-sigs/kueue/blob/main/.golangci.yaml
I was thinking that it is better to maintain Go version in a go.mod file only.
WDYT @tenzen-y?
There was a problem hiding this comment.
I have no preference here. I'm asking if you faced any problems or not.
There was a problem hiding this comment.
It looks to be working fine, so I prefer to remove it from .golangci.yaml
There was a problem hiding this comment.
It defaults to use Go version from the go.mod file so it's probably better to remove it.
|
|
||
| // +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-trainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=trainingruntimes,verbs=create;update,versions=v1alpha1,name=validator.trainingruntime.trainer.kubeflow.org,admissionReviewVersions=v1 | ||
|
|
||
| var _ webhook.CustomValidator = (*TrainingRuntimeWebhook)(nil) |
There was a problem hiding this comment.
It would be better to keep a type compliance check.
There was a problem hiding this comment.
Sure, added it back.
|
|
||
| // +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-clustertrainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=clustertrainingruntimes,verbs=create;update,versions=v1alpha1,name=validator.clustertrainingruntime.trainer.kubeflow.org,admissionReviewVersions=v1 | ||
|
|
||
| var _ webhook.CustomValidator = (*ClusterTrainingRuntimeWebhook)(nil) |
There was a problem hiding this comment.
Ditto about the type compliance check.
|
|
||
| // +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-trainjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=trainjobs,verbs=create;update,versions=v1alpha1,name=validator.trainjob.trainer.kubeflow.org,admissionReviewVersions=v1 | ||
|
|
||
| var _ webhook.CustomValidator = (*TrainJobWebhook)(nil) |
There was a problem hiding this comment.
Ditto about the type compliance check.
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
robert-bell
left a comment
There was a problem hiding this comment.
Thanks @andreyvelich.
I’ve gone through the changes and tested it locally and everything looks solid on my end.
I had just one small comment, but lgtm otherwise.
| message = fmt.Sprintf("%s ...", message) | ||
| } | ||
| r.recorder.Event(&trainJob, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", message) | ||
| r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "TrainJobResourcesCreationFailed", message) |
There was a problem hiding this comment.
nit: should we use a different value for the action argument rather than duplicating the reason?
Maybe?
| r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "TrainJobResourcesCreationFailed", message) | |
| r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "Reconciling", message) |
There was a problem hiding this comment.
action is what action was taken/failed regarding to the regarding object. It is machine-readable. This field cannot
be empty for new Events and it can have at most 128 characters.
I think, Reconciling should be fine, since we didn't put TrainJob to the Failed state.
Thoughts @tenzen-y @astefanutti ?
There was a problem hiding this comment.
Yes that sounds reasonable to me.
There was a problem hiding this comment.
Updated, let me know if that looks good @astefanutti @tenzen-y @robert-bell
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
|
Thanks @andreyvelich! /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Updating Go to 1.25, k8s to v1.35, and controller-runtime to v0.23.1
Also, updated the validation webhook due to this breaking change: kubernetes-sigs/controller-runtime#3360
/assign @astefanutti @tenzen-y @akshaychitneni @robert-bell
This is needed for JobSet v0.11.0 upgrade
cc @kannon92