Add migration job to handle mismatched field managers by andrewstucki · Pull Request #1249 · redpanda-data/redpanda-operator

andrewstucki · 2026-01-29T17:31:40Z

Cover Letter

In previous versions of the operator, when some of our synchronization code was ported to the kube library, a bug disallowing setting the field manager was introduced (see redpanda-data/common-go#126 for the relevant fix in the kube package as it exists today). Additionally, we have been inconsistent with the way we have set the field manager across our kube.Ctl usage.

This was resulting in some really odd behavior with the Kubernetes API server mangling resources due to conflicting field management versions. For example, service ports get merged via an identity of their (protocol, port) tuple. Having an old field manager saying it owned the service port (tcp, 9092) which was named "kafka" and then applying, with the new manager, a version of our CRD where the port was overwritten to be 19092 was resulting in the API server seeing both, due to the conflicting field manager names, ports (tcp, 9092) and (tcp, 19092) named "kafka", which failed validation.

This has an even more difficult to resolve knock-on effect when the resources being merged don't fail validation immediately. For example, StatefulSets will gladly take duplicated port names in their pod template container definitions. However, when they go to actually provision the Pods, then they will fail to.

What this means is that we have to:

Clear all of the field managers that are mis-named
Assume ownership over all fields as they currently exist in the resources that we have created via server-side apply, so that
When re-reconciliation kicks in, not only will resources that would otherwise fail validation succeed, but resources that are mangled due to things like pod template container ports being merged, will get cleared up due to our proper field owner owning all of the relevant spec fields.

The way this is resolved is through a post-upgrade migration job that was added to remove any unwanted field managers of any relevant resources related to Redpanda and Console CRDs, and forcibly assume ownership over their fields with the proper field manager. Subsequently our reconcilers will pick up and fix any malformed resources.

Attached are two quick scripted recreations of what we were experiencing with Services and StatefulSets:

service-demo.sh
statefulset-demo.sh

licenses/third_party.md

operator/chart/testdata/template-cases.golden.txtar

RafalKorepta

You need to add MigrationJobServiceAccount and PostUpgradeMigrationJob in

redpanda-operator/operator/chart/chart.go

Lines 51 to 64 in 0723c18

    
           manifests := []kube.Object{ 
        
           	Issuer(dot), 
        
           	Certificate(dot), 
        
           	ConfigMap(dot), 
        
           	MetricsService(dot), 
        
           	WebhookService(dot), 
        
           	MutatingWebhookConfiguration(dot), 
        
           	ValidatingWebhookConfiguration(dot), 
        
           	ServiceAccount(dot), 
        
           	ServiceMonitor(dot), 
        
           	Deployment(dot), 
        
           	PreInstallCRDJob(dot), 
        
           	CRDJobServiceAccount(dot), 
        
           }

andrewstucki · 2026-01-29T18:43:12Z

@RafalKorepta can you take a look again? I'm going to work on wiring up an acceptance/regression test for this now and fixing anything that breaks. In addition likely going to add a single pass SSA on any resources that need to be updated with the field specs as-is just so that our field manager will pick up any orphaned fields as part of the migration.

operator/cmd/migration/fieldmanagers.go

…ry conflicts

gene-redpanda

LGTM!

In previous versions of the operator, when some of our synchronization code was ported to the `kube` library, a bug disallowing setting the field manager was introduced (see redpanda-data/common-go#126 for the relevant fix in the `kube` package as it exists today). Additionally, we have been inconsistent with the way we have set the field manager across our `kube.Ctl` usage. This was resulting in some really odd behavior with the Kubernetes API server mangling resources due to conflicting field management versions. For example, service ports get merged via an identity of their (protocol, port) tuple. Having an old field manager saying it owned the service port (tcp, 9092) which was named "kafka" and then applying, with the new manager, a version of our CRD where the port was overwritten to be 19092 was resulting in the API server seeing both, due to the conflicting field manager names, ports (tcp, 9092) and (tcp, 19092) named "kafka", which failed validation. This has an even more difficult to resolve knock-on effect when the resources being merged don't fail validation immediately. For example, StatefulSets will gladly take duplicated port names in their pod template container definitions. However, when they go to actually provision the Pods, then they will fail to. What this means is that we have to: 1. Clear all of the field managers that are mis-named 2. Assume ownership over all fields as they currently exist in the resources that we have created via server-side apply, so that 3. When re-reconciliation kicks in, not only will resources that would otherwise fail validation succeed, but resources that are mangled due to things like pod template container ports being merged, will get cleared up due to our proper field owner owning all of the relevant spec fields. The way this is resolved is through a post-upgrade migration job that was added to remove any unwanted field managers of any relevant resources related to Redpanda and Console CRDs, and forcibly assume ownership over their fields with the proper field manager. Subsequently our reconcilers will pick up and fix any malformed resources. (cherry picked from commit f1112cb) # Conflicts: # acceptance/go.mod # acceptance/go.sum # acceptance/steps/register.go # charts/connectors/go.mod # charts/connectors/go.sum # charts/console/go.mod # charts/console/go.sum # charts/redpanda/go.mod # charts/redpanda/go.sum # charts/redpanda/render_state_nogotohelm.go # flake.nix # gen/go.mod # gen/go.sum # go.work.sum # gotohelm/go.mod # gotohelm/go.sum # gotohelm/testdata/src/example/go.mod # gotohelm/testdata/src/example/go.sum # harpoon/go.mod # harpoon/go.sum # licenses/third_party.md # operator/chart/rbac.go # operator/chart/templates/_chart.go.tpl # operator/chart/templates/_rbac.go.tpl # operator/chart/testdata/template-cases.golden.txtar # operator/cmd/main.go # operator/cmd/run/run.go # operator/go.mod # operator/go.sum # operator/multicluster/render_state_nogotohelm.go # pkg/go.mod # pkg/go.sum

In previous versions of the operator, when some of our synchronization code was ported to the `kube` library, a bug disallowing setting the field manager was introduced (see redpanda-data/common-go#126 for the relevant fix in the `kube` package as it exists today). Additionally, we have been inconsistent with the way we have set the field manager across our `kube.Ctl` usage. This was resulting in some really odd behavior with the Kubernetes API server mangling resources due to conflicting field management versions. For example, service ports get merged via an identity of their (protocol, port) tuple. Having an old field manager saying it owned the service port (tcp, 9092) which was named "kafka" and then applying, with the new manager, a version of our CRD where the port was overwritten to be 19092 was resulting in the API server seeing both, due to the conflicting field manager names, ports (tcp, 9092) and (tcp, 19092) named "kafka", which failed validation. This has an even more difficult to resolve knock-on effect when the resources being merged don't fail validation immediately. For example, StatefulSets will gladly take duplicated port names in their pod template container definitions. However, when they go to actually provision the Pods, then they will fail to. What this means is that we have to: 1. Clear all of the field managers that are mis-named 2. Assume ownership over all fields as they currently exist in the resources that we have created via server-side apply, so that 3. When re-reconciliation kicks in, not only will resources that would otherwise fail validation succeed, but resources that are mangled due to things like pod template container ports being merged, will get cleared up due to our proper field owner owning all of the relevant spec fields. The way this is resolved is through a post-upgrade migration job that was added to remove any unwanted field managers of any relevant resources related to Redpanda and Console CRDs, and forcibly assume ownership over their fields with the proper field manager. Subsequently our reconcilers will pick up and fix any malformed resources. (cherry picked from commit f1112cb) # Conflicts: # acceptance/go.mod # acceptance/go.sum # charts/connectors/go.mod # charts/connectors/go.sum # charts/console/go.mod # charts/console/go.sum # charts/redpanda/go.mod # charts/redpanda/go.sum # gen/go.mod # gen/go.sum # go.work.sum # gotohelm/go.mod # gotohelm/go.sum # gotohelm/testdata/src/example/go.mod # gotohelm/testdata/src/example/go.sum # harpoon/go.mod # harpoon/go.sum # licenses/third_party.md # operator/chart/templates/_rbac.go.tpl # operator/chart/testdata/template-cases.golden.txtar # operator/cmd/main.go # operator/go.mod # operator/go.sum # operator/multicluster/render_state_nogotohelm.go # pkg/go.mod # pkg/go.sum

github-actions · 2026-01-30T03:57:24Z

💚 All backports created successfully

Status	Branch	Result
✅	release/v25.1.x
✅	release/v25.2.x
✅	release/v25.3.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

…rs (#1249) (#1252) In previous versions of the operator, when some of our synchronization code was ported to the kube library, a bug disallowing setting the field manager was introduced (see redpanda-data/common-go#126 for the relevant fix in the kube package as it exists today). Additionally, we have been inconsistent with the way we have set the field manager across our kube.Ctl usage. This was resulting in some really odd behavior with the Kubernetes API server mangling resources due to conflicting field management versions. For example, service ports get merged via an identity of their (protocol, port) tuple. Having an old field manager saying it owned the service port (tcp, 9092) which was named "kafka" and then applying, with the new manager, a version of our CRD where the port was overwritten to be 19092 was resulting in the API server seeing both, due to the conflicting field manager names, ports (tcp, 9092) and (tcp, 19092) named "kafka", which failed validation. This has an even more difficult to resolve knock-on effect when the resources being merged don't fail validation immediately. For example, StatefulSets will gladly take duplicated port names in their pod template container definitions. However, when they go to actually provision the Pods, then they will fail to. What this means is that we have to: 1. Clear all of the field managers that are mis-named 2. Assume ownership over all fields as they currently exist in the resources that we have created via server-side apply, so that 3. When re-reconciliation kicks in, not only will resources that would otherwise fail validation succeed, but resources that are mangled due to things like pod template container ports being merged, will get cleared up due to our proper field owner owning all of the relevant spec fields. The way this is resolved is through a post-upgrade migration job that was added to remove any unwanted field managers of any relevant resources related to Redpanda and Console CRDs, and forcibly assume ownership over their fields with the proper field manager. Subsequently our reconcilers will pick up and fix any malformed resources. --------- Co-authored-by: Andrew Stucki <andrew.stucki@redpanda.com>

Add migration job to handle mismatched field managers

a0ab525

andrewstucki requested review from RafalKorepta, chrisseto and gene-redpanda as code owners January 29, 2026 17:31

andrewstucki added v25.1.x v25.2.x labels Jan 29, 2026

paulohtb6 reviewed Jan 29, 2026

View reviewed changes

licenses/third_party.md Outdated Show resolved Hide resolved

Fix up bad license generation code

71d1b79

RafalKorepta reviewed Jan 29, 2026

View reviewed changes

operator/chart/testdata/template-cases.golden.txtar Show resolved Hide resolved

RafalKorepta requested changes Jan 29, 2026

View reviewed changes

Regen with job and account manifest rendering added

080c5dd

andrewstucki requested a review from RafalKorepta January 29, 2026 18:43

Add apply to take over potentially orphaned fields

7d4db8f

RafalKorepta approved these changes Jan 29, 2026

View reviewed changes

gene-redpanda reviewed Jan 29, 2026

View reviewed changes

operator/cmd/migration/fieldmanagers.go Outdated Show resolved Hide resolved

gene-redpanda reviewed Jan 29, 2026

View reviewed changes

operator/cmd/migration/fieldmanagers.go Outdated Show resolved Hide resolved

gene-redpanda reviewed Jan 29, 2026

View reviewed changes

operator/cmd/migration/fieldmanagers.go Outdated Show resolved Hide resolved

andrewstucki added 2 commits January 29, 2026 16:32

Add tests, upgrade common-go, and put in hacky fix for protobuf libra…

a13e63e

…ry conflicts

Fix typo

2eb4778

andrewstucki added the v25.3.x label Jan 29, 2026

andrewstucki added 2 commits January 29, 2026 17:11

deep copy object before capturing managers

8a344c3

Make sure to bring in proper version of fixed common-go

dc31439

andrewstucki enabled auto-merge (squash) January 29, 2026 22:42

gene-redpanda approved these changes Jan 29, 2026

View reviewed changes

revert accidentally decreasing number of k3d brokers in acceptance tests

332b3cf

andrewstucki disabled auto-merge January 29, 2026 22:55

andrewstucki enabled auto-merge (squash) January 29, 2026 22:55

andrewstucki merged commit f1112cb into main Jan 30, 2026
10 checks passed

github-actions bot mentioned this pull request Jan 30, 2026

[release/v25.1.x] Add migration job to handle mismatched field managers (#1249) #1250

Closed

github-actions bot mentioned this pull request Jan 30, 2026

[release/v25.2.x] Add migration job to handle mismatched field managers (#1249) #1251

Merged

github-actions bot mentioned this pull request Jan 30, 2026

[release/v25.3.x] Add migration job to handle mismatched field managers (#1249) #1252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add migration job to handle mismatched field managers#1249

Add migration job to handle mismatched field managers#1249
andrewstucki merged 9 commits intomainfrom
as/add-migration-job

andrewstucki commented Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

RafalKorepta left a comment •

edited

Loading

Uh oh!

andrewstucki commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gene-redpanda left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	manifests := []kube.Object{
	Issuer(dot),
	Certificate(dot),
	ConfigMap(dot),
	MetricsService(dot),
	WebhookService(dot),
	MutatingWebhookConfiguration(dot),
	ValidatingWebhookConfiguration(dot),
	ServiceAccount(dot),
	ServiceMonitor(dot),
	Deployment(dot),
	PreInstallCRDJob(dot),
	CRDJobServiceAccount(dot),
	}

Conversation

andrewstucki commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cover Letter

Uh oh!

Uh oh!

Uh oh!

RafalKorepta left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewstucki commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gene-redpanda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewstucki commented Jan 29, 2026 •

edited

Loading

RafalKorepta left a comment •

edited

Loading