-
Notifications
You must be signed in to change notification settings - Fork 15
Implement alternative sidecar broker and pvc decommissioner #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also, just FYI, this is what the events look like while the test is running: |
|
Also, with the new ghost broker test here's an ellided stream of events while watching the test run with some rough annotations as to what's happening when: # Initial 5 broker cluster deployed
Controller ID: 0 All nodes: [0 1 2 3 4] Nodes down: []
# Helm upgrade to 4 broker cluster
Controller ID: 0 All nodes: [0 1 2 3 4] Nodes down: [4]
# Rolling Restart
delete Pod basic-3 in StatefulSet basic successful
# decommissioner picks up old broker and PVC
brokers needing decommissioning: [4], decommissioning: 4
unbound persistent volume claims: [testenv-n99nf/datadir-basic-4], decommissioning: testenv-n99nf/datadir-basic-4
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [3]
create Pod basic-3 in StatefulSet basic successful
delete Pod basic-2 in StatefulSet basic successful
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [2]
create Pod basic-2 in StatefulSet basic successful
delete Pod basic-1 in StatefulSet basic successful
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [1]
create Pod basic-1 in StatefulSet basic successful
delete Pod basic-0 in StatefulSet basic successful
create Pod basic-0 in StatefulSet basic successful
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: [0]
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: []
# rolling restart completes, node is tainted, pod evicted and unschedulable
create Pod basic-0 in StatefulSet basic successful
0/5 nodes are available: 1 node(s) had untolerated taint {decommission-test: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: [0]
# PVC and pod are manually deleted like PVC unbinder
create Claim datadir-basic-0 Pod basic-0 in StatefulSet basic success
create Pod basic-0 in StatefulSet basic successful
Successfully provisioned volume pvc-70d93e27-c1fc-4913-a7c8-627e83e17f29
Successfully assigned testenv-n99nf/basic-0 to k3d-decommissioning-agent-2
# New broker comes up, but not yet healthy
Controller ID: 2 All nodes: [0 1 2 3 5] Nodes down: [0 5]
# New broker healthy, old broker has ordinal collision
Controller ID: 2 All nodes: [0 1 2 3 5] Nodes down: [0]
# decommissioner detects broker needing decommissioning
brokers needing decommissioning: [0], decommissioning: 0
Controller ID: 2 All nodes: [1 2 3 5] Nodes down: [] |
chrisseto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fully review, just flushing comments for now.
chrisseto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits on documentation. Blocking comment on MergeMaps
|
|
||
| cmd.Flags().StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.") | ||
| cmd.Flags().StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
| cmd.Flags().StringVar(&pprofAddr, "pprof-bind-address", ":8082", "The address the metric endpoint binds to.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: port 8082 in default Redpanda helm chart deployment is used by HTTP listener. Could you add test to deploy Redpanda helm chart with this sidecar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the test once I get the rest of the sidecar command finished off with the configwatcher port and pvcunbinder. Will leave this comment open so I don't forget!
| return nil | ||
| } | ||
|
|
||
| if !strings.HasPrefix(claim.Name, datadirVolume+"-") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Pods that have 2 local-path persistent volumes (datadir and shadow-index). The second shadow-index is used in cloud (operator v1) to separate data directory with tiered storage cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I miss understood which controller (decommission) would be replaced, but still cloud is using in operator v2 mode the tiered storage with PersistentVolume mount type
https://github.com/redpanda-data/helm-charts/blob/b0a8f611127d405d0811eae052fe75d56a70799d/charts/redpanda/statefulset.go#L854-L865
https://github.com/redpanda-data/helm-charts/blob/b0a8f611127d405d0811eae052fe75d56a70799d/charts/redpanda/statefulset.go#L1068-L1076
https://github.com/redpanda-data/helm-charts/blob/b0a8f611127d405d0811eae052fe75d56a70799d/charts/redpanda/statefulset.go#L1216-L1255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think that's a good call-out that we should likely handle cleaning up the PVCs that we're optionally creating as well. The original code was based purely off of what the decommission controller was doing previously, which was for just datadir, but I think it'd make sense to check for any of our optionally provisioned PVCs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I'm probably just going to leave this as-is for now since we don't try and decommission PVCs for non-datadir right now anyway. But I think it'd make sense to do so as a future iteration.
operator/internal/decommissioning/statefulset_decommissioner_test.go
Outdated
Show resolved
Hide resolved
| values = functional.MergeMaps(values, overrides) | ||
| } | ||
|
|
||
| release, err := s.helm.Install(s.ctx, "redpandadata/redpanda", helm.InstallOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem: What would be better option as we have charts in redpanda-operator repository, so it could just reference path within this repo.
RafalKorepta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, beside the potential panic.
541a4d6 to
90ffa2b
Compare
This is the first work towards unifying any sort of long-lived containers into a unified
sidecarentrypoint. It implements the entrypoint via a new alternativeDecommissioner, which, though I may have missed some corner cases:Let me know if there are any additional tests you'd want to add other than the basic scale down/node failure decommission case that's there?
EDIT:
Added support for an additional fetching mechanism so that we support something other than just grabbing helm values and converting them to a Redpanda CR for getting a client. In the process, added support for getting an rpk Profile and handing that to our factory to initialize all of our connections in the factory -- so now the factory can be used for initializing low-level Schema Registry/Kafka/Admin API clients via a
redpanda.yamland environment variables too, which should be perfect for sidecar utilizations that don't really use Helm (i.e. ArgoCD).Also, if we see that the FS read operations are too heavy for initializing TLS, etc. the factory and fetchers can have a filesystem specified and use something like an
afero.CacheOnReadFs, though I figure that's too premature to introduce for now.