Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions operator/config/rbac/bases/operator/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,17 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- endpoints
- limitranges
- pods/log
- replicationcontrollers
- resourcequotas
verbs:
- get
- list
- apiGroups:
- ""
resources:
Expand Down
16 changes: 16 additions & 0 deletions operator/config/rbac/v2-manager-role/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,22 @@ rules:
- get
- list
- watch
- apiGroups:
- ""
resources:
- endpoints
- events
- limitranges
- persistentvolumeclaims
- pods
- pods/log
- replicationcontrollers
- resourcequotas
- serviceaccounts
- services
verbs:
- get
- list
- apiGroups:
- ""
resources:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,10 @@ type RedpandaReconciler struct {
// The leases is used by controller-runtime in sidecar. Operator main reconciliation needs to have leases permissions in order to create role that have the same permissions.
// +kubebuilder:rbac:groups=coordination.k8s.io,namespace=default,resources=leases,verbs=get;list;watch;create;update;patch;delete

// rpk bundle additional rbac permissions
// When Redpanda chart values has set `rbac.enabled = true`, then operator needs to have elevated permissions to create ClusterRole
// +kubebuilder:rbac:groups=core,resources=endpoints;events;limitranges;persistentvolumeclaims;pods;pods/log;replicationcontrollers;resourcequotas;serviceaccounts;services,verbs=get;list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I would have expected some test in the helm chart to start failing with this change. I bet the chart is still pointed at a static version.

So this change will make the rbac.createRPKBundleCRs flag, eventually, always be true.

I thought we had a corresponding flag in the redpanda helm chart but I guess we don't 🤔

I don't love the idea of the operator having the permissions required for rpk debug bundle all the time. I guess we can expand the scope until someone complains as there's not a great alternative :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected some test in the helm chart to start failing with this change.

You are right it will fail when we bump operator version

- https://raw.githubusercontent.com/redpanda-data/redpanda-operator/v2.3.8-24.3.6/operator/config/rbac/leader-election-role/role.yaml

I bet the chart is still pointed at a static version

The rbac definition in operator helm chart is created manually. Then we assert compatibility between kustomize and helm

func TestHelmKustomizeEquivalence(t *testing.T) {
ctx := testutil.Context(t)
client, err := helm.New(helm.Options{ConfigHome: testutil.TempDir(t)})
require.NoError(t, err)
kustomization, err := os.ReadFile("testdata/kustomization.yaml")
require.NoError(t, err)
require.Containsf(t, string(kustomization), Chart.Metadata().AppVersion, "kustomization.yaml should reference the current appVersion: %s", Chart.Metadata().AppVersion)
values := PartialValues{FullnameOverride: ptr.To("redpanda"), RBAC: &PartialRBAC{CreateAdditionalControllerCRs: ptr.To(true)}}
rendered, err := client.Template(ctx, ".", helm.TemplateOptions{
Name: "redpanda",
Namespace: "",
Values: values,
})
require.NoError(t, err)
fSys := filesys.MakeFsOnDisk()
buffy := new(bytes.Buffer)
cmd := build.NewCmdBuild(
fSys, build.MakeHelp(konfig.ProgramName, "build"), buffy)
require.NoError(t, cmd.RunE(cmd, []string{"testdata"}))
helmObjs, err := kube.DecodeYAML(rendered, Scheme)
require.NoError(t, err)
require.NoError(t, apiextensionsv1.AddToScheme(Scheme))
kustomizeObjs, err := kube.DecodeYAML(buffy.Bytes(), Scheme)
require.NoError(t, err)
helmClusterRoleRules, helmRoleRules := ExtractRules(helmObjs)
kClusterRoleRules, kRoleRules := ExtractRules(kustomizeObjs)
assert.JSONEq(t, jsonStr(helmRoleRules), jsonStr(kRoleRules), "difference in Roles\n--- Helm / Missing from Kustomize\n+++ Kustomize / Missing from Helm")
assert.JSONEq(t, jsonStr(helmClusterRoleRules), jsonStr(kClusterRoleRules), "difference in ClusterRoles\n--- Helm / Missing from Kustomize\n+++ Kustomize / Missing from Helm")
}

That kustomize definition is static.

So this change will make the rbac.createRPKBundleCRs flag, eventually, always be true.

Agree and I don't like it too.

I thought we had a corresponding flag in the redpanda helm chart but I guess we don't

It's uber flag rbac.enabled. Redpanda until sidecar didn't need any Kubernetes RBAC rules. As we use controller-runtime in sidecar we always create necessary Roles and ClusterRoles for sidecar. It could be improved to differentiate sidecar (and it's modes: decommission, pvc un-binder, "simple" health check) with rpk-bundle.

Never the less operator deployment needs to have same or more permissions than rpk-bundle needs. Users that want to use rpk-bundle would need to perform an operator helm chart upgrade (rpk-bundle Roles and ClusterRoles), then Redpanda resource upgrade (rbac.enabled=true) only then rpk bundle on Redpanda container/pod would work. I start to think that rpk bundle should be aware of KUBECONFIG, so that it could be run outside of the Kubernetes cluster like kubectl.

Another option would be standalone yaml that holds RBAC and Pod definition for rpk bundle to create an artefact from within cluster, so that operator and redpanda helm charts are much simpler.

I don't love the idea of the operator having the permissions required for rpk debug bundle all the time. I guess we can expand the scope until someone complains as there's not a great alternative :/

As I agree with you I'm proposing a small cheat. I could add rpk-bundle RBAC to the test suite manually as we are doing with port-forward capability.

role.Rules = append(role.Rules, rbacv1.PolicyRule{
APIGroups: []string{""},
Resources: []string{"pods/portforward"},
Verbs: []string{"*"},
})


// SetupWithManager sets up the controller with the Manager.
func (r *RedpandaReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager) error {
if err := registerHelmReferencedIndex(ctx, mgr, "statefulset", &appsv1.StatefulSet{}); err != nil {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -731,7 +731,7 @@ func (s *RedpandaControllerSuite) TestStableUIDAndGeneration() {
flipped := s.snapshotCluster(filter)
s.compareSnapshot(fresh, flipped, isStable)

s.T().Logf("toggling useFlux: %t -> %t", useFlux, !useFlux)
s.T().Logf("toggling useFlux: %t -> %t", !useFlux, useFlux)
rp.Spec.ChartRef.UseFlux = ptr.To(useFlux)
s.applyAndWait(rp)

Expand All @@ -743,12 +743,18 @@ func (s *RedpandaControllerSuite) TestStableUIDAndGeneration() {
// HelmRelease and HelmChart are checked explicitly here, but any test that would left behind mentioned resource
// will prevent from namespace deletion. In other words if test suite can not delete namespace, then with high
// probability resource with finalizer prevents from namespace deletion.
var hr v2beta2.HelmRelease
err := s.client.Get(s.ctx, types.NamespacedName{Name: rp.GetHelmReleaseName(), Namespace: rp.Namespace}, &hr)
s.True(apierrors.IsNotFound(err))

// In the flux base deployment the HelmRelease will be deleted after Redpanda is fully removed from
// Kubernetes API server. The ownerReference tight together Redpanda with HelmRelease, but there is
// delay between HelmRelease and Redpadna custom resource deletion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// delay between HelmRelease and Redpadna custom resource deletion.
// delay between HelmRelease and Redpanda custom resource deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also change the deletion policy to foreground, which should stall the delete until all "children" as GC'd.

s.EventuallyWithT(func(t *assert.CollectT) {
var hr v2beta2.HelmRelease
err := s.client.Get(s.ctx, types.NamespacedName{Name: rp.GetHelmReleaseName(), Namespace: rp.Namespace}, &hr)
assert.True(t, apierrors.IsNotFound(err))
}, time.Minute, time.Second, "HelmRelease not GC'd")

var hc sourcecontrollerv1beta2.HelmChart
err = s.client.Get(s.ctx, types.NamespacedName{Name: rp.Namespace + "-" + rp.Name, Namespace: rp.Namespace}, &hc)
err := s.client.Get(s.ctx, types.NamespacedName{Name: rp.Namespace + "-" + rp.Name, Namespace: rp.Namespace}, &hc)
s.True(apierrors.IsNotFound(err))
}
}
Expand Down Expand Up @@ -958,7 +964,7 @@ func (s *RedpandaControllerSuite) minimalRP(useFlux bool) *redpandav1alpha2.Redp
},
},
RBAC: &redpandav1alpha2.RBAC{
Enabled: ptr.To(false),
Enabled: ptr.To(true),
},
},
},
Expand Down
16 changes: 16 additions & 0 deletions operator/internal/controller/redpanda/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,22 @@ rules:
- get
- list
- watch
- apiGroups:
- ""
resources:
- endpoints
- events
- limitranges
- persistentvolumeclaims
- pods
- pods/log
- replicationcontrollers
- resourcequotas
- serviceaccounts
- services
verbs:
- get
- list
- apiGroups:
- ""
resources:
Expand Down