Skip to content

Fix integration tests suite#541

Closed
RafalKorepta wants to merge 1 commit intorelease/v2.3.xfrom
rk/fix-integration-tests
Closed

Fix integration tests suite#541
RafalKorepta wants to merge 1 commit intorelease/v2.3.xfrom
rk/fix-integration-tests

Conversation

@RafalKorepta
Copy link
Contributor

@RafalKorepta RafalKorepta commented Mar 20, 2025

Based on #539

operator: Include ClusterRole permission for redpanda controller

In the redpanda package the kubebuilder comment does not have all possible
variants of ClusterRole permissions neccessery to handle creation of all
Redpanda helm chart resources. When rpk bundle ClusterRole was included to
the Redpanda helm chart deployment, then the integration test suite failed with
flux reporting the following:

creation of clusterroles.rbac.authorization.k8s.io "rp-9gd31r-rpk-bundle" is forbidden:
user "system:serviceaccount:testenv-g5jfk:testenv-pzy3ce"
(groups=["system:serviceaccounts" "system:serviceaccounts:testenv-g5jfk" "system:authenticated"])
is attempting to grant RBAC permissions not currently held:

{APIGroups:[""], Resources:["endpoints"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["events"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["limitranges"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["persistentvolumeclaims"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["pods"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["pods/log"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["replicationcontrollers"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["resourcequotas"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["serviceaccounts"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["services"], Verbs:["get" "list"]}

The setup of integration test suite included only permissions defined in redpanda package.
Kustomize and Operator helm chart includes those missing permissions.

Reference

https://redpandadata.atlassian.net/browse/K8S-425

In the redpanda package the kubebuilder comment does not have all possible
variants of ClusterRole permissions neccessery to handle creation of all
Redpanda helm chart resources. When rpk bundle ClusterRole was included to
the Redpanda helm chart deployment, then the integration test suite failed with
flux reporting the following:
```
creation of clusterroles.rbac.authorization.k8s.io "rp-9gd31r-rpk-bundle" is forbidden:
user "system:serviceaccount:testenv-g5jfk:testenv-pzy3ce"
(groups=["system:serviceaccounts" "system:serviceaccounts:testenv-g5jfk" "system:authenticated"])
is attempting to grant RBAC permissions not currently held:

{APIGroups:[""], Resources:["endpoints"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["events"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["limitranges"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["persistentvolumeclaims"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["pods"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["pods/log"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["replicationcontrollers"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["resourcequotas"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["serviceaccounts"], Verbs:["get" "list"]}
{APIGroups:[""], Resources:["services"], Verbs:["get" "list"]}
```

The setup of integration test suite included only permissions defined in redpanda package.
Kustomize and Operator helm chart includes those missing permissions.
@RafalKorepta RafalKorepta force-pushed the rk/fix-integration-tests branch from 90e5fcc to 7a39c1a Compare March 20, 2025 19:19

// In the flux base deployment the HelmRelease will be deleted after Redpanda is fully removed from
// Kubernetes API server. The ownerReference tight together Redpanda with HelmRelease, but there is
// delay between HelmRelease and Redpadna custom resource deletion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// delay between HelmRelease and Redpadna custom resource deletion.
// delay between HelmRelease and Redpanda custom resource deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also change the deletion policy to foreground, which should stall the delete until all "children" as GC'd.


// rpk bundle additional rbac permissions
// When Redpanda chart values has set `rbac.enabled = true`, then operator needs to have elevated permissions to create ClusterRole
// +kubebuilder:rbac:groups=core,resources=endpoints;events;limitranges;persistentvolumeclaims;pods;pods/log;replicationcontrollers;resourcequotas;serviceaccounts;services,verbs=get;list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I would have expected some test in the helm chart to start failing with this change. I bet the chart is still pointed at a static version.

So this change will make the rbac.createRPKBundleCRs flag, eventually, always be true.

I thought we had a corresponding flag in the redpanda helm chart but I guess we don't 🤔

I don't love the idea of the operator having the permissions required for rpk debug bundle all the time. I guess we can expand the scope until someone complains as there's not a great alternative :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected some test in the helm chart to start failing with this change.

You are right it will fail when we bump operator version

- https://raw.githubusercontent.com/redpanda-data/redpanda-operator/v2.3.8-24.3.6/operator/config/rbac/leader-election-role/role.yaml

I bet the chart is still pointed at a static version

The rbac definition in operator helm chart is created manually. Then we assert compatibility between kustomize and helm

func TestHelmKustomizeEquivalence(t *testing.T) {
ctx := testutil.Context(t)
client, err := helm.New(helm.Options{ConfigHome: testutil.TempDir(t)})
require.NoError(t, err)
kustomization, err := os.ReadFile("testdata/kustomization.yaml")
require.NoError(t, err)
require.Containsf(t, string(kustomization), Chart.Metadata().AppVersion, "kustomization.yaml should reference the current appVersion: %s", Chart.Metadata().AppVersion)
values := PartialValues{FullnameOverride: ptr.To("redpanda"), RBAC: &PartialRBAC{CreateAdditionalControllerCRs: ptr.To(true)}}
rendered, err := client.Template(ctx, ".", helm.TemplateOptions{
Name: "redpanda",
Namespace: "",
Values: values,
})
require.NoError(t, err)
fSys := filesys.MakeFsOnDisk()
buffy := new(bytes.Buffer)
cmd := build.NewCmdBuild(
fSys, build.MakeHelp(konfig.ProgramName, "build"), buffy)
require.NoError(t, cmd.RunE(cmd, []string{"testdata"}))
helmObjs, err := kube.DecodeYAML(rendered, Scheme)
require.NoError(t, err)
require.NoError(t, apiextensionsv1.AddToScheme(Scheme))
kustomizeObjs, err := kube.DecodeYAML(buffy.Bytes(), Scheme)
require.NoError(t, err)
helmClusterRoleRules, helmRoleRules := ExtractRules(helmObjs)
kClusterRoleRules, kRoleRules := ExtractRules(kustomizeObjs)
assert.JSONEq(t, jsonStr(helmRoleRules), jsonStr(kRoleRules), "difference in Roles\n--- Helm / Missing from Kustomize\n+++ Kustomize / Missing from Helm")
assert.JSONEq(t, jsonStr(helmClusterRoleRules), jsonStr(kClusterRoleRules), "difference in ClusterRoles\n--- Helm / Missing from Kustomize\n+++ Kustomize / Missing from Helm")
}

That kustomize definition is static.

So this change will make the rbac.createRPKBundleCRs flag, eventually, always be true.

Agree and I don't like it too.

I thought we had a corresponding flag in the redpanda helm chart but I guess we don't

It's uber flag rbac.enabled. Redpanda until sidecar didn't need any Kubernetes RBAC rules. As we use controller-runtime in sidecar we always create necessary Roles and ClusterRoles for sidecar. It could be improved to differentiate sidecar (and it's modes: decommission, pvc un-binder, "simple" health check) with rpk-bundle.

Never the less operator deployment needs to have same or more permissions than rpk-bundle needs. Users that want to use rpk-bundle would need to perform an operator helm chart upgrade (rpk-bundle Roles and ClusterRoles), then Redpanda resource upgrade (rbac.enabled=true) only then rpk bundle on Redpanda container/pod would work. I start to think that rpk bundle should be aware of KUBECONFIG, so that it could be run outside of the Kubernetes cluster like kubectl.

Another option would be standalone yaml that holds RBAC and Pod definition for rpk bundle to create an artefact from within cluster, so that operator and redpanda helm charts are much simpler.

I don't love the idea of the operator having the permissions required for rpk debug bundle all the time. I guess we can expand the scope until someone complains as there's not a great alternative :/

As I agree with you I'm proposing a small cheat. I could add rpk-bundle RBAC to the test suite manually as we are doing with port-forward capability.

role.Rules = append(role.Rules, rbacv1.PolicyRule{
APIGroups: []string{""},
Resources: []string{"pods/portforward"},
Verbs: []string{"*"},
})

@github-actions
Copy link

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Mar 27, 2025
@RafalKorepta
Copy link
Contributor Author

@chrisseto I believe you will fix this issue soon by adding RBAC to run rpk bundle by default.

@github-actions github-actions bot removed the stale label Mar 29, 2025
@github-actions
Copy link

github-actions bot commented Apr 3, 2025

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@RafalKorepta
Copy link
Contributor Author

This PR is obsolete as #580 addressed the issue

@chrisseto chrisseto deleted the rk/fix-integration-tests branch May 7, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants