Skip to content

Conversation

@andrewstucki
Copy link
Contributor

@andrewstucki andrewstucki commented Dec 4, 2024

This is the first work towards unifying any sort of long-lived containers into a unified sidecar entrypoint. It implements the entrypoint via a new alternative Decommissioner, which, though I may have missed some corner cases:

  1. simplifies some of the old decommissioner code
  2. removes the code that patches StatefulSets (instead publishing decommissioning events tied to the set)
  3. calls out some current arguably buggy edge cases that k8s core saves us on (i.e. prematurely deleting pvcs)
  4. tries to be both more robust and guarded about which brokers are actually allowed to be decommissioned
  5. removes the reliance on calling the helm-cli internals, but just pulls and decodes helm release info directly

Let me know if there are any additional tests you'd want to add other than the basic scale down/node failure decommission case that's there?

EDIT:
Added support for an additional fetching mechanism so that we support something other than just grabbing helm values and converting them to a Redpanda CR for getting a client. In the process, added support for getting an rpk Profile and handing that to our factory to initialize all of our connections in the factory -- so now the factory can be used for initializing low-level Schema Registry/Kafka/Admin API clients via a redpanda.yaml and environment variables too, which should be perfect for sidecar utilizations that don't really use Helm (i.e. ArgoCD).

Also, if we see that the FS read operations are too heavy for initializing TLS, etc. the factory and fetchers can have a filesystem specified and use something like an afero.CacheOnReadFs, though I figure that's too premature to introduce for now.

@andrewstucki
Copy link
Contributor Author

Also, just FYI, this is what the events look like while the test is running:

Name:               basic
Namespace:          testenv-4wn2h
...
Events:
  Type    Reason                                        Age                 From                    Message
  ----    ------                                        ----                ----                    -------
...
  Normal  DecommissioningBroker                         47s                 broker-decommissioner   brokers needing decommissioning: [4], decommissioning: 4
  Normal  DecommissioningUnboundPersistentVolumeClaims  42s                 broker-decommissioner   unbound persistent volume claims: [testenv-4wn2h/datadir-basic-4], decommissioning: testenv-4wn2h/datadir-basic-4
...

@andrewstucki
Copy link
Contributor Author

andrewstucki commented Dec 5, 2024

Also, with the new ghost broker test here's an ellided stream of events while watching the test run with some rough annotations as to what's happening when:

# Initial 5 broker cluster deployed
Controller ID: 0 All nodes: [0 1 2 3 4] Nodes down: []
# Helm upgrade to 4 broker cluster
Controller ID: 0 All nodes: [0 1 2 3 4] Nodes down: [4]
# Rolling Restart
delete Pod basic-3 in StatefulSet basic successful
# decommissioner picks up old broker and PVC
brokers needing decommissioning: [4], decommissioning: 4
unbound persistent volume claims: [testenv-n99nf/datadir-basic-4], decommissioning: testenv-n99nf/datadir-basic-4
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [3]
create Pod basic-3 in StatefulSet basic successful
delete Pod basic-2 in StatefulSet basic successful
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [2]
create Pod basic-2 in StatefulSet basic successful
delete Pod basic-1 in StatefulSet basic successful
Controller ID: 0 All nodes: [0 1 2 3] Nodes down: [1]
create Pod basic-1 in StatefulSet basic successful
delete Pod basic-0 in StatefulSet basic successful
create Pod basic-0 in StatefulSet basic successful
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: [0]
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: []
# rolling restart completes, node is tainted, pod evicted and unschedulable
create Pod basic-0 in StatefulSet basic successful
0/5 nodes are available: 1 node(s) had untolerated taint {decommission-test: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
Controller ID: 2 All nodes: [0 1 2 3] Nodes down: [0]
# PVC and pod are manually deleted like PVC unbinder
create Claim datadir-basic-0 Pod basic-0 in StatefulSet basic success
create Pod basic-0 in StatefulSet basic successful
Successfully provisioned volume pvc-70d93e27-c1fc-4913-a7c8-627e83e17f29
Successfully assigned testenv-n99nf/basic-0 to k3d-decommissioning-agent-2
# New broker comes up, but not yet healthy
Controller ID: 2 All nodes: [0 1 2 3 5] Nodes down: [0 5]
# New broker healthy, old broker has ordinal collision
Controller ID: 2 All nodes: [0 1 2 3 5] Nodes down: [0]
# decommissioner detects broker needing decommissioning
brokers needing decommissioning: [0], decommissioning: 0
Controller ID: 2 All nodes: [1 2 3 5] Nodes down: []

Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fully review, just flushing comments for now.

Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits on documentation. Blocking comment on MergeMaps


cmd.Flags().StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
cmd.Flags().StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
cmd.Flags().StringVar(&pprofAddr, "pprof-bind-address", ":8082", "The address the metric endpoint binds to.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: port 8082 in default Redpanda helm chart deployment is used by HTTP listener. Could you add test to deploy Redpanda helm chart with this sidecar?

https://github.com/redpanda-data/helm-charts/blob/b0a8f611127d405d0811eae052fe75d56a70799d/charts/redpanda/ci/16-controller-sidecar-values.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the test once I get the rest of the sidecar command finished off with the configwatcher port and pvcunbinder. Will leave this comment open so I don't forget!

return nil
}

if !strings.HasPrefix(claim.Name, datadirVolume+"-") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Pods that have 2 local-path persistent volumes (datadir and shadow-index). The second shadow-index is used in cloud (operator v1) to separate data directory with tiered storage cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think that's a good call-out that we should likely handle cleaning up the PVCs that we're optionally creating as well. The original code was based purely off of what the decommission controller was doing previously, which was for just datadir, but I think it'd make sense to check for any of our optionally provisioned PVCs as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I'm probably just going to leave this as-is for now since we don't try and decommission PVCs for non-datadir right now anyway. But I think it'd make sense to do so as a future iteration.

values = functional.MergeMaps(values, overrides)
}

release, err := s.helm.Install(s.ctx, "redpandadata/redpanda", helm.InstallOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem: What would be better option as we have charts in redpanda-operator repository, so it could just reference path within this repo.

Copy link
Contributor

@RafalKorepta RafalKorepta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, beside the potential panic.

@andrewstucki andrewstucki force-pushed the sidecar-decommissioner branch from 541a4d6 to 90ffa2b Compare December 10, 2024 22:15
@andrewstucki andrewstucki merged commit 8f29324 into main Dec 11, 2024
5 checks passed
@andrewstucki andrewstucki deleted the sidecar-decommissioner branch December 11, 2024 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants