Basic Lifecycle Management Interfaces + v2 Concrete Implementations#568
Basic Lifecycle Management Interfaces + v2 Concrete Implementations#568andrewstucki merged 8 commits intomainfrom
Conversation
RafalKorepta
left a comment
There was a problem hiding this comment.
Looks good to me.
In my local environment I found the following test failing.
=== RUN TestClientWatchResources
=== RUN TestClientWatchResources/base
client_test.go:382:
Error Trace: /Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:382
/Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:223
Error: Not equal:
expected: "*resources.MockCluster"
actual : "*lifecycle.MockCluster"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-*resources.MockCluster
+*lifecycle.MockCluster
Test: TestClientWatchResources/base
2025-03-26T13:15:12+01:00 INFO Stopping and waiting for non leader election runnables
2025-03-26T13:15:12+01:00 INFO Stopping and waiting for leader election runnables
2025-03-26T13:15:12+01:00 INFO Stopping and waiting for caches
W0326 13:15:12.978273 90427 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232: watch of *lifecycle.MockCluster ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
W0326 13:15:12.978347 90427 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232: watch of *v1.CustomResourceDefinition ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
2025-03-26T13:15:12+01:00 INFO Stopping and waiting for webhooks
2025-03-26T13:15:12+01:00 INFO Stopping and waiting for HTTP servers
2025-03-26T13:15:12+01:00 INFO Wait completed, proceeding to shutdown the manager
--- FAIL: TestClientWatchResources/base (5.40s)
Expected :*resources.MockCluster
Actual :*lifecycle.MockCluster
<Click to see difference>
=== RUN TestClientWatchResources/cluster-scoped-resources
client_test.go:382:
Error Trace: /Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:382
/Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:223
Error: Not equal:
expected: "*resources.MockCluster"
actual : "*lifecycle.MockCluster"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-*resources.MockCluster
+*lifecycle.MockCluster
Test: TestClientWatchResources/cluster-scoped-resources
2025-03-26T13:15:19+01:00 INFO Stopping and waiting for non leader election runnables
2025-03-26T13:15:19+01:00 INFO Stopping and waiting for leader election runnables
2025-03-26T13:15:19+01:00 INFO Stopping and waiting for caches
2025-03-26T13:15:19+01:00 INFO Stopping and waiting for webhooks
2025-03-26T13:15:19+01:00 INFO Stopping and waiting for HTTP servers
2025-03-26T13:15:19+01:00 INFO Wait completed, proceeding to shutdown the manager
--- FAIL: TestClientWatchResources/cluster-scoped-resources (6.22s)
Expected :*resources.MockCluster
Actual :*lifecycle.MockCluster
<Click to see difference>
=== RUN TestClientWatchResources/namespace-and-cluster-scoped-resources
client_test.go:382:
Error Trace: /Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:382
/Users/rafalkorepta/workspace/redpanda/redpanda-operator/operator/internal/lifecycle/client_test.go:223
Error: Not equal:
expected: "*resources.MockCluster"
actual : "*lifecycle.MockCluster"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-*resources.MockCluster
+*lifecycle.MockCluster
Test: TestClientWatchResources/namespace-and-cluster-scoped-resources
2025-03-26T13:15:24+01:00 INFO Stopping and waiting for non leader election runnables
2025-03-26T13:15:24+01:00 INFO Stopping and waiting for leader election runnables
2025-03-26T13:15:24+01:00 INFO Stopping and waiting for caches
W0326 13:15:24.952195 90427 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232: watch of *lifecycle.MockCluster ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
W0326 13:15:24.952203 90427 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232: watch of *v1.CustomResourceDefinition ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
2025-03-26T13:15:24+01:00 INFO Stopping and waiting for webhooks
2025-03-26T13:15:24+01:00 INFO Stopping and waiting for HTTP servers
2025-03-26T13:15:24+01:00 INFO Wait completed, proceeding to shutdown the manager
--- FAIL: TestClientWatchResources/namespace-and-cluster-scoped-resources (5.75s)
Expected :*resources.MockCluster
Actual :*lifecycle.MockCluster
<Click to see difference>
--- FAIL: TestClientWatchResources (17.37s)
| // OutOfDateReplicas is the number of replicas that don't currently | ||
| // match their node pool definitions. If OutOfDateReplicas is not 0 | ||
| // it should mean that the operator will soon roll this many pods. |
There was a problem hiding this comment.
Historically rolling update should be blocked/paused if Redpanda can not enter into maintenance mode. In Redpanda helm base deployment this is not the case, as lifecycle hook script will wait for Redpanda to enter maintenance mode in the half of the terminationGracePeriod of Redpanda Pod definition. In operator v1 it is enforced.
|
@RafalKorepta thanks -- totatlly forgot to change the string after I renamed the package 😅 -- I responded to a few comments and will try and add some more doc comments soonish. |
chrisseto
left a comment
There was a problem hiding this comment.
Nothing blocking and nothing that can't be deferred.
| if err := r.patchOwnedResource(ctx, owner, resource); err != nil { | ||
| errs = append(errs, err) | ||
| } | ||
| delete(toDelete, gvkObject{ |
There was a problem hiding this comment.
Super nit: can this be hoisted above the patchOwnedResource line? Every time I see it something in me starts screaming that there's an edge in the error case 😓
| }) | ||
| } | ||
|
|
||
| for _, resource := range toDelete { |
There was a problem hiding this comment.
Can we add in logs for deletions? It shouldn't be noisy but if it is that would very likely indicate a problem, no?
| for _, resource := range toDelete { | ||
| if err := r.client.Delete(ctx, resource); err != nil { | ||
| if !k8sapierrors.IsNotFound(err) { | ||
| errs = append(errs, err) |
There was a problem hiding this comment.
let's annotate these errors with some information about what resource we're trying to delete.
| // WatchResources configures resource watching for the given cluster, including StatefulSets and other resources. | ||
| func (r *ResourceClient[T, U]) WatchResources(builder Builder, cluster U) error { | ||
| // set that this is for the cluster | ||
| builder.For(cluster) |
There was a problem hiding this comment.
Does builder mutate the instance you call the methods on? Seems like this should be builder = builder.For(cluster)
| return resources, nil | ||
| } | ||
|
|
||
| // patchOwnedResource applies a patch to a resource owned by the cluster. |
There was a problem hiding this comment.
Semantically, an apply is different from a patch. Could we name this applyOwnedResource to indicate that it's an Apply (I guess PUT like?) operation and distinct from a standard PATCH.
|
|
||
| // V2SimpleResourceRenderer represents an simple resource renderer for v2 clusters. | ||
| type V2SimpleResourceRenderer struct { | ||
| kubeConfig clientcmdapi.Config |
There was a problem hiding this comment.
This will need to be a rest.Config soon, if it doesn't already. You may be racing with Rafal's update PR. Not too difficult to fix in a conflict thankfully.
| } | ||
|
|
||
| // Update updates the given Redpanda v2 cluster with the given cluster status. | ||
| func (m *V2ClusterStatusUpdater) Update(cluster *redpandav1alpha2.Redpanda, status ClusterStatus) bool { |
There was a problem hiding this comment.
This could totally be a closure :P
| } | ||
|
|
||
| // SyncAll synchronizes the simple resources associated with the given cluster, | ||
| // cleaning up any resources that should no longer exist. |
There was a problem hiding this comment.
Can you include the phrase "garbage collect" or "GC" in here or even in a comment in the struct itself. We and Kubernetes refer to the process as garbage collection, adding the key words here will make it a lot easier to find this for those that aren't intimately familiar with this section of the code base.
|
|
||
| // since resources are cluster-scoped we need to call a Watch on them with some | ||
| // custom mappings | ||
| builder.Watches(resourceType, handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, o client.Object) []reconcile.Request { |
There was a problem hiding this comment.
Tricky, I would have reached for filter predicates at first but I see why that doesn't work.
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
management for initially the v2 operator. Currently it is not leveraged in a controller, as that needs to be subsequently broken out from the original PoC.
e1a13ff to
82f6d67
Compare
This adds the basic interfaces for StatefulSet-based lifecycle management for initially the v2 operator. Currently it is not leveraged in a controller, as that needs to be subsequently broken out from the original PoC.
Included in all of this is a fairly large amount of unit testing that puts the entire package at ~90% coverage. Aside from incorporating some of the feedback from #526, it also moves the subpackage from
resourcestolifecycle.