Skip to content

restart redpanda on changes to the cluster configuration#603

Closed
simon0191 wants to merge 1 commit intomainfrom
ss-restart-cluster-on-config-change
Closed

restart redpanda on changes to the cluster configuration#603
simon0191 wants to merge 1 commit intomainfrom
ss-restart-cluster-on-config-change

Conversation

@simon0191
Copy link
Member

@simon0191 simon0191 commented Mar 31, 2025

In operator v2, we're missing the capability of restarting Redpanda on cluster config changes that require restart.

As an initial implementation, this PR adds the entire cluster configuration to the statefulset-checksum-annotation so that the cluster is restarted in any config change.

Even though this approach is not ideal, as configuration changes are infrequent, this change should not be too disruptive.

A smarter implementation that checks for config properties that require restart based in the config schema is in the roadmap https://redpandadata.atlassian.net/browse/K8S-499.

@CLAassistant
Copy link

CLAassistant commented Mar 31, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@simon0191 simon0191 force-pushed the ss-restart-cluster-on-config-change branch from ac35e0f to d3f2f7f Compare March 31, 2025 14:37
@jan-g
Copy link
Contributor

jan-g commented Mar 31, 2025

@simon0191 - looks like you need to run task generate against this.

@simon0191 simon0191 force-pushed the ss-restart-cluster-on-config-change branch from d3f2f7f to 045ed72 Compare April 3, 2025 11:02
@simon0191 simon0191 force-pushed the ss-restart-cluster-on-config-change branch from 045ed72 to c857965 Compare April 3, 2025 11:24
// NB: Seed servers is excluded to avoid a rolling restart when only
// replicas is changed.
dependencies = append(dependencies, RedpandaConfigFile(dot, false))
dependencies = append(dependencies, RedpandaConfigFile(dot, false), RedpandaClusterConfig(dot))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in slack but saying it here as well, this won't work as expected. This will trigger a restart before the cluster config is applied.

I think our best option is to do some hackery within the operator when useFlux is false (This is now the default in Azure so we should be good to go).

In broad strokes, you'll want to get the cluster config's version and inject it into an annotation on the redpanda Pods.

In reconcileResources (which is reconcileDeflux in the release/2.3.x and release/2.4.x branch), you should acquire the version and set it:

# Everything here is nullable, so you'll need to do some nasty if chaining.
values.Statefulset.PodTemplate.Annotations["some-magic-key"] = clusterConfigVersion

I see ~2 options for getting the version. You can either make an admin client and pull it directly (1) or you could update reconcileClusterConfig stash the cluster config version onto the Status and/or Condition (2) and then read that within reconcileResources.

Option 1 feels kinda hacky but I think it'll be the fastest way to get this done.

Option 2 "fits" into the operator a bit better IMO. We're it me, I'd update the syncer.Sync to return (ClusterConfigVersion, error) and then stash the version in the Message of the cluster config condition:

	apimeta.SetStatusCondition(rp.GetConditions(), metav1.Condition{
		Type:               redpandav1alpha2.ClusterConfigSynced,
		Status:             metav1.ConditionTrue,
		ObservedGeneration: rp.Generation,
		Reason:             "ConfigSynced",
                 Message:            fmt.Sprintf("ClusterConfig at Version %d", version), 
	})

Then you can set the annotation to the value of message.

I'd strongly recommend adding a test case to RedpandaControllerSuite to make sure this works as expected.

You'll also likely want to work off the release/2.3.x branch as that's what's currently being deployed into cloud.

@simon0191
Copy link
Member Author

Closing in favor of #672

@simon0191 simon0191 closed this Apr 8, 2025
@chrisseto chrisseto deleted the ss-restart-cluster-on-config-change branch May 7, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants