Skip to content

chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1#3127

Merged
google-oss-prow[bot] merged 6 commits intokubeflow:masterfrom
andreyvelich:bump-go-1.25
Jan 27, 2026
Merged

chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1#3127
google-oss-prow[bot] merged 6 commits intokubeflow:masterfrom
andreyvelich:bump-go-1.25

Conversation

@andreyvelich
Copy link
Member

Updating Go to 1.25, k8s to v1.35, and controller-runtime to v0.23.1

Also, updated the validation webhook due to this breaking change: kubernetes-sigs/controller-runtime#3360

/assign @astefanutti @tenzen-y @akshaychitneni @robert-bell

This is needed for JobSet v0.11.0 upgrade
cc @kannon92

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Copilot AI review requested due to automatic review settings January 26, 2026 21:27
@google-oss-prow
Copy link

@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

Updating Go to 1.25, k8s to v1.35, and controller-runtime to v0.23.1

Also, updated the validation webhook due to this breaking change: kubernetes-sigs/controller-runtime#3360

/assign @astefanutti @tenzen-y @akshaychitneni @robert-bell

This is needed for JobSet v0.11.0 upgrade
cc @kannon92

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andreyvelich andreyvelich changed the title chore(deps): Bump Go 1.25, k8s to v1.35, and controller-runtime v0.23.1 chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1 Jan 26, 2026
@andreyvelich andreyvelich added the dependencies Pull requests that update a dependency file label Jan 26, 2026
@andreyvelich
Copy link
Member Author

Fixes: #3104

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the project to Go 1.25, Kubernetes 1.35, and controller-runtime v0.23.1, and regenerates API clients/CRDs/OpenAPI artifacts to align with the new Kubernetes APIs and controller-runtime webhook changes.

Changes:

  • Bump Go toolchain version and key Kubernetes/controller-runtime dependencies (including kube-openapi, utils, structured-merge-diff, json, cobra, pflag, etc.).
  • Update admission webhooks to use the new typed validator interfaces (e.g., TrainJobValidator, TrainingRuntimeValidator, ClusterTrainingRuntimeValidator) and adjust webhook setup wiring accordingly.
  • Regenerate Go clientsets/informers/applyconfigurations, Python OpenAPI models, and CRDs/Swagger to match Kubernetes 1.35 schema updates (new fields, updated descriptions, toleration operators, Job/Pod/PVC semantics, etc.).

Reviewed changes

Copilot reviewed 66 out of 68 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
go.mod / go.sum Bump Go version to 1.25.0, upgrade Kubernetes libraries to v0.35.0, controller-runtime to v0.23.1, and refresh related dependency versions.
pkg/webhooks/trainjob_webhook.go Replace TrainJobWebhook with typed TrainJobValidator, adapt webhook manager construction to the new controller-runtime API, and keep TrainJob validation logic wired through the runtime registry.
pkg/webhooks/trainingruntime_webhook.go Convert to TrainingRuntimeValidator with typed signatures and new ctrl.NewWebhookManagedBy(mgr, &trainer.TrainingRuntime{}) usage; validation still delegates to validateReplicatedJobs.
pkg/webhooks/clustertrainingruntime_webhook.go Convert to ClusterTrainingRuntimeValidator with typed signatures and updated webhook setup; maintains deprecation warning logic and reused validateReplicatedJobs.
pkg/webhooks/setup.go Update webhook setup wiring so only TrainJobValidator receives the runtime registry; cluster and namespaced TrainingRuntime webhooks no longer depend on it.
pkg/webhooks/trainjob_webhook_test.go Extend TestValidateCreate for TrainJob to cover unsupported and deprecated runtimes, and switch to the new TrainJobValidator type.
pkg/webhooks/clustertrainingruntime_webhook_test.go Update tests to construct ClusterTrainingRuntimeValidator directly.
pkg/client/clientset/versioned/fake/clientset_generated.go Regenerated fake clientset, including a marker for WatchList semantics support and watch reactor updates (auto-generated).
pkg/client/informers/externalversions/** Swap raw &cache.ListWatch{...} with cache.ToListWatcherWithWatchListSemantics to align with new Kubernetes reflector semantics (auto-generated).
pkg/client/applyconfiguration/trainer/v1alpha1/** Regenerated apply-configuration types with richer field-level documentation for Trainer/TrainJob/TrainingRuntime and related types (auto-generated).
manifests/base/crds/** & charts/kubeflow-trainer/crds/** Regenerated CRDs with Kubernetes 1.35 schema updates (toleration operators, volume resize semantics, PodCertificate userAnnotations, workloadRef, etc.).
api/python_api/kubeflow_trainer_api/models/** Regenerated Python client models and added new types (e.g., WorkloadReference, InternalEvent, JobSet volume policies), plus docstring updates to track the new OpenAPI schema.
api/openapi-spec/swagger.json Regenerated swagger spec to reflect the updated Kubernetes and trainer APIs, including new fields and updated descriptions.
CONTRIBUTING.md Update documented minimum Go version requirement from 1.24 to 1.25.
.golangci.yaml / .golangci-kal.yml Drop explicit Go version override and tweak a kal rule list format to remain valid YAML with the newer toolchain.

Comment on lines +47 to +52
runtimeRefGK := runtime.RuntimeRefToRuntimeRegistryKey(obj.Spec.RuntimeRef)
runtime, ok := w.runtimes[runtimeRefGK]
if !ok {
return nil, fmt.Errorf("unsupported runtime: %s", runtimeRefGK)
}
warnings, errors := runtime.ValidateObjects(ctx, nil, trainJob)
warnings, errors := runtime.ValidateObjects(ctx, nil, obj)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidateCreate returns a plain fmt.Errorf when the referenced runtime is not found, but the associated test (TestValidateCreate's "unsupported runtime" case) expects an aggregated field.ErrorList marking spec.RuntimeRef as invalid, so this implementation both breaks that test and surfaces a less precise validation error to API consumers.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +65
runtimeRefGK := runtime.RuntimeRefToRuntimeRegistryKey(newObj.Spec.RuntimeRef)
runtime, ok := w.runtimes[runtimeRefGK]
if !ok {
return nil, fmt.Errorf("unsupported runtime: %s", runtimeRefGK)
}
warnings, errors := runtime.ValidateObjects(ctx, oldTrainJob, newTrainJob)
warnings, errors := runtime.ValidateObjects(ctx, oldObj, newObj)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidateUpdate mirrors ValidateCreate by returning a plain fmt.Errorf for unsupported runtimes, but for consistency with TestValidateCreate and to provide field-scoped admission errors it should also return an aggregated field.ErrorList that points at spec.RuntimeRef instead of a generic error.

Copilot uses AI. Check for mistakes.

from __future__ import annotations
import pprint
import re # noqa: F401
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re # noqa: F401

Copilot uses AI. Check for mistakes.

from __future__ import annotations
import pprint
import re # noqa: F401
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re # noqa: F401

Copilot uses AI. Check for mistakes.

from __future__ import annotations
import pprint
import re # noqa: F401
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re # noqa: F401

Copilot uses AI. Check for mistakes.

from __future__ import annotations
import pprint
import re # noqa: F401
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re # noqa: F401

Copilot uses AI. Check for mistakes.
@coveralls
Copy link

coveralls commented Jan 26, 2026

Pull Request Test Coverage Report for Build 21405922660

Details

  • 11 of 45 (24.44%) changed or added relevant lines in 8 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.09%) to 51.217%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controller/trainjob_controller.go 0 2 0.0%
pkg/webhooks/setup.go 0 2 0.0%
pkg/controller/setup.go 0 3 0.0%
pkg/webhooks/trainjob_webhook.go 4 12 33.33%
pkg/webhooks/clustertrainingruntime_webhook.go 5 14 35.71%
pkg/webhooks/trainingruntime_webhook.go 0 10 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/webhooks/clustertrainingruntime_webhook.go 1 48.0%
pkg/webhooks/trainingruntime_webhook.go 1 62.79%
Totals Coverage Status
Change from base Build 21369554754: 0.09%
Covered Lines: 1241
Relevant Lines: 2423

💛 - Coveralls

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
runtimeRec := NewTrainingRuntimeReconciler(
mgr.GetClient(),
mgr.GetEventRecorderFor("trainer-trainingruntime-controller"),
mgr.GetEventRecorder("trainer-trainingruntime-controller"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also migrated to the new Events API due to this: kubernetes-sigs/controller-runtime#3262
Let me know if that looks good @tenzen-y @astefanutti.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
version: "2"

run:
go: "1.24"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have it in Kueue: https://github.com/kubernetes-sigs/kueue/blob/main/.golangci.yaml
I was thinking that it is better to maintain Go version in a go.mod file only.
WDYT @tenzen-y?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference here. I'm asking if you faced any problems or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to be working fine, so I prefer to remove it from .golangci.yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defaults to use Go version from the go.mod file so it's probably better to remove it.


// +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-trainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=trainingruntimes,verbs=create;update,versions=v1alpha1,name=validator.trainingruntime.trainer.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*TrainingRuntimeWebhook)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to keep a type compliance check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added it back.


// +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-clustertrainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=clustertrainingruntimes,verbs=create;update,versions=v1alpha1,name=validator.clustertrainingruntime.trainer.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*ClusterTrainingRuntimeWebhook)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about the type compliance check.


// +kubebuilder:webhook:path=/validate-trainer-kubeflow-org-v1alpha1-trainjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=trainer.kubeflow.org,resources=trainjobs,verbs=create;update,versions=v1alpha1,name=validator.trainjob.trainer.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*TrainJobWebhook)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about the type compliance check.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Copy link
Contributor

@robert-bell robert-bell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andreyvelich.

I’ve gone through the changes and tested it locally and everything looks solid on my end.

I had just one small comment, but lgtm otherwise.

message = fmt.Sprintf("%s ...", message)
}
r.recorder.Event(&trainJob, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", message)
r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "TrainJobResourcesCreationFailed", message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we use a different value for the action argument rather than duplicating the reason?

Maybe?

Suggested change
r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "TrainJobResourcesCreationFailed", message)
r.recorder.Eventf(&trainJob, nil, corev1.EventTypeWarning, "TrainJobResourcesCreationFailed", "Reconciling", message)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action is what action was taken/failed regarding to the regarding object. It is machine-readable. This field cannot
be empty for new Events and it can have at most 128 characters.

I think, Reconciling should be fine, since we didn't put TrainJob to the Failed state.
Thoughts @tenzen-y @astefanutti ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds reasonable to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, let me know if that looks good @astefanutti @tenzen-y @robert-bell

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@astefanutti
Copy link
Contributor

Thanks @andreyvelich!

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Jan 27, 2026
Copy link
Member Author

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 5568ea2 into kubeflow:master Jan 27, 2026
31 checks passed
@google-oss-prow google-oss-prow bot added this to the v2.2 milestone Jan 27, 2026
@andreyvelich andreyvelich deleted the bump-go-1.25 branch January 27, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dependencies Pull requests that update a dependency file lgtm size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants