feat: create agent RBAC resources on the control plane side#7268
feat: create agent RBAC resources on the control plane side#7268Park-Jiyeonn wants to merge 1 commit intokarmada-io:masterfrom
Conversation
|
Welcome @Park-Jiyeonn! It looks like this is your first PR to karmada-io/karmada 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security posture of Karmada by revamping the RBAC management for pull-mode Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances security by shifting the creation of karmada-agent RBAC resources from client-side bootstrap tokens to the cluster-controller on the control plane, which now provisions least-privilege RBAC for new pull-mode clusters. However, the current implementation introduces critical and high-severity RBAC misconfigurations. The cluster_controller allows agents to specify arbitrary namespaces for their RBAC resources, risking privilege escalation into sensitive namespaces like kube-system. Additionally, agents are granted excessive cluster-wide permissions to list all clusters and manage all leases, leading to information disclosure and potential Denial of Service. These vulnerabilities require stricter validation of agent-provided namespaces and adherence to the principle of least privilege for generated RBAC rules. Furthermore, a new function needs refactoring to align with the project's style guide regarding parameter count and to improve maintainability through documentation.
| func getClusterNamespace(cluster *clusterv1alpha1.Cluster) string { | ||
| if cluster.Spec.SecretRef != nil && cluster.Spec.SecretRef.Namespace != "" { | ||
| return cluster.Spec.SecretRef.Namespace | ||
| } | ||
| if cluster.Spec.ImpersonatorSecretRef != nil && cluster.Spec.ImpersonatorSecretRef.Namespace != "" { | ||
| return cluster.Spec.ImpersonatorSecretRef.Namespace | ||
| } | ||
| return defaultPullModeClusterNamespace | ||
| } |
There was a problem hiding this comment.
The cluster_controller determines the namespace for agent-specific RBAC resources based on the SecretRef.Namespace or ImpersonatorSecretRef.Namespace fields in the Cluster object's specification. Since the karmada-agent has permission to create and potentially update its own Cluster object, a compromised agent can specify an arbitrary namespace (e.g., kube-system or karmada-system). The cluster_controller will then create a Role and RoleBinding in that namespace, granting the agent permissions to create any Secret and manage specific secrets in that sensitive namespace. This allows an attacker to escalate privileges by interfering with system components or accessing sensitive credentials.
There was a problem hiding this comment.
This namespace-selection model predates this PR. --cluster-namespace is an existing user-facing option, and the previous karmadactl register flow also generated the agent secret Role/RoleBinding in that namespace.
| APIGroups: []string{"cluster.karmada.io"}, | ||
| Resources: []string{"clusters"}, | ||
| Verbs: []string{"create", "list", "watch"}, | ||
| }, | ||
| { |
There was a problem hiding this comment.
The generateClusterRole function grants every agent list and watch permissions on all clusters resources at the cluster scope. This allows any registered agent to discover and monitor all other clusters in the Karmada federation, leaking sensitive metadata such as API endpoints, cluster IDs, and status information. This violates the principle of least privilege, as an agent only needs access to its own Cluster object.
There was a problem hiding this comment.
The previous karmadactl-generated agent ClusterRole already granted the same clusters access, and this PR preserves that existing steady-state RBAC instead of broadening it.
| APIGroups: []string{"coordination.k8s.io"}, | ||
| Resources: []string{"leases"}, | ||
| Verbs: []string{"get", "create", "update"}, | ||
| }, | ||
| { |
There was a problem hiding this comment.
The generateClusterRole function grants every agent get, create, and update permissions on all leases resources across all namespaces. Karmada uses leases for cluster heartbeats and potentially for leader election. An attacker with control over one agent can use these permissions to modify or delete leases belonging to other clusters or system components, leading to a Denial of Service (by making other clusters appear offline) or interfering with the control plane's operational logic. These permissions should be restricted using ResourceNames to only allow the agent to manage its own lease.
There was a problem hiding this comment.
This lease permission also existed in the previous agent RBAC and is preserved by this PR rather than newly introduced by it.
| - apiGroups: | ||
| - cluster.karmada.io | ||
| resources: | ||
| - clusters | ||
| verbs: | ||
| - create |
There was a problem hiding this comment.
The RBAC configuration grants the system:karmada:agents group the create permission on all clusters resources. Since every registered agent's certificate belongs to this group, any compromised agent can create arbitrary new Cluster objects in the control plane. This can be used to clutter the system, block the registration of legitimate clusters by pre-empting their names, or trigger the automatic RBAC generation for fake clusters. The create permission should be restricted to the bootstrap token group used during initial registration.
There was a problem hiding this comment.
Previously, the Cluster object was created during karmadactl register. In this PR, it is created by karmada-agent after certificate bootstrap so that the privileged system:karmada:agent:rbac-generator path can be removed entirely.
I agree that constraining cluster creation further would require an additional mechanism beyond RBAC alone (for example admission-level validation or a different bootstrap handshake), so I see that as follow-up scope rather than part of this PR.
| { | ||
| APIGroups: []string{"cluster.karmada.io"}, | ||
| Resources: []string{"clusters"}, | ||
| ResourceNames: []string{clusterName}, | ||
| Verbs: []string{"get", "delete"}, | ||
| }, | ||
| { |
There was a problem hiding this comment.
The generateClusterRole function grants get and delete permissions on the agent's own Cluster resource but omits the update and patch verbs. However, the karmada-agent implementation attempts to update its own Cluster object's specification (e.g., in pkg/util/cluster.go). Without these permissions, the agent will be unable to update its metadata (like API endpoints or provider info) after the initial registration, leading to a functional failure or a state where the control plane has stale information about the cluster.
There was a problem hiding this comment.
This comment was based on an earlier revision. In the current implementation the agent no longer updates its own Cluster spec after registration;
cmd/agent/app/agent.go
Outdated
| return true, nil | ||
| } | ||
|
|
||
| func waitForPullModeAgentRBACReady(ctx context.Context, controlPlaneKubeClient kubeclientset.Interface, karmadaClient karmadaclientset.Interface, clusterName, clusterNamespace string, timeout time.Duration) error { |
There was a problem hiding this comment.
This function has 6 parameters, which exceeds the recommended limit of 5 according to the repository style guide. To improve readability and maintainability, consider encapsulating these parameters into a struct.
Additionally, this function contains key logic for the agent's startup sequence but lacks a doc comment. Adding a comment explaining its purpose—polling until the control plane has provisioned the necessary RBAC permissions for the agent—would improve code clarity.
For example:
type rbacReadyOptions struct {
controlPlaneKubeClient kubeclientset.Interface
karmadaClient karmadaclientset.Interface
clusterName string
clusterNamespace string
timeout time.Duration
}
// waitForPullModeAgentRBACReady polls the control plane to check if the required RBAC resources
// for the pull-mode agent have been created. It performs dry-run checks for cluster, secret,
// and work access until they are all permitted or the timeout is reached.
func waitForPullModeAgentRBACReady(ctx context.Context, opts rbacReadyOptions) error {
// ...
}References
- A function should not have more than 5 parameters. If it exceeds this, consider refactoring the function or encapsulating the parameters. (link)
- Add doc comments to unexported functions that contain key logic to improve maintainability.
There was a problem hiding this comment.
This no longer applies to the current revision because waitForPullModeAgentRBACReady() was removed during the refactor.`
fa62429 to
afda1ce
Compare
There was a problem hiding this comment.
Pull request overview
This PR shifts pull-mode karmada-agent RBAC resource creation/cleanup from edge-side CLI operations to the Karmada control plane, reducing the privileges required during cluster registration and removing the legacy “RBAC generator” user flow.
Changes:
- Add control-plane
cluster_controllerlogic to generate and clean up per-agent RBAC resources for pull-mode clusters. - Update
karmada-agentstartup flow to create theClusterobject first and wait until its RBAC becomes available before proceeding with registration steps that require those permissions. - Remove RBAC-generator-related logic from
karmadactl register/unregisterand update bootstrap-token RBAC manifests to the new, narrower permissions model.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/util/cluster.go | Skips duplicate-cluster validation for agents on Forbidden list errors; refactors create-or-update to “create-first then fallback to get/update”. |
| pkg/util/cluster_test.go | Adds/updates tests for the new agent validation behavior and create-first semantics. |
| pkg/karmadactl/unregister/unregister.go | Removes CLI-side deletion/manual reporting of karmada-agent RBAC resources on the control plane. |
| pkg/karmadactl/unregister/unregister_test.go | Adjusts unregister test setup to reflect removed RBAC resource handling. |
| pkg/karmadactl/register/register.go | Removes legacy RBAC-generator code path from pull-mode registration. |
| pkg/karmadactl/cmdinit/karmada/rbac.go | Removes “agent RBAC generator” cluster-admin-like RBAC creation. |
| pkg/karmadactl/cmdinit/karmada/rbac_test.go | Drops tests for removed RBAC-generator grant function. |
| pkg/karmadactl/cmdinit/karmada/deploy.go | Adds creation of RBAC allowing agents to create Cluster objects during initial registration. |
| pkg/karmadactl/cmdinit/bootstraptoken/agent/tlsbootstrap.go | Adds RBAC helper to allow karmada agents to create Cluster objects. |
| pkg/karmadactl/cmdinit/bootstraptoken/agent/tlsbootstrap_test.go | Adds tests verifying the new cluster-role/cluster-rolebinding creation/update behavior. |
| pkg/controllers/cluster/cluster_controller.go | Ensures pull-mode agent RBAC on sync; cleans it up on cluster removal. |
| pkg/controllers/cluster/cluster_controller_test.go | Adds tests for ensuring and cleaning up agent RBAC resources. |
| pkg/controllers/cluster/agent_rbac.go | Introduces control-plane RBAC resource generation/cleanup for pull-mode agents. |
| cmd/agent/app/agent.go | Creates the Cluster object early; polls until RBAC is usable before continuing registration. |
| charts/karmada/templates/_karmada_bootstrap_token_configuration.tpl | Updates bootstrap RBAC manifests from RBAC-generator user to agent cluster creation permissions. |
| artifacts/deploy/bootstrap-token-configuration.yaml | Updates the static bootstrap-token RBAC manifest accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmd/agent/app/agent.go
Outdated
| secret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Namespace: clusterNamespace, | ||
| Name: clusterName, | ||
| }, | ||
| Data: map[string][]byte{ | ||
| clusterv1alpha1.SecretTokenKey: []byte("dummy-token"), | ||
| clusterv1alpha1.SecretCADataKey: []byte("dummy-ca-data"), | ||
| }, | ||
| } | ||
| _, err = controlPlaneKubeClient.CoreV1().Secrets(clusterNamespace).Create(ctx, secret, metav1.CreateOptions{ | ||
| DryRun: []string{metav1.DryRunAll}, | ||
| }) | ||
| if apierrors.IsForbidden(err) || apierrors.IsNotFound(err) { | ||
| return false, nil | ||
| } | ||
| if err != nil && !apierrors.IsAlreadyExists(err) { | ||
| return false, err | ||
| } | ||
|
|
There was a problem hiding this comment.
waitForPullModeAgentRBACReady() always does a dry-run Secret create in clusterNamespace to decide RBAC readiness. If the Cluster was created without SecretRef/ImpersonatorSecretRef (e.g. --report-secrets=None), the controller will generate secret RBAC in its default namespace (karmada-cluster), but this check still targets the user-supplied clusterNamespace and can remain Forbidden forever when a non-default --cluster-namespace is used. Consider skipping the Secret check when secrets aren’t being reported, or deriving the namespace from the Cluster spec (same logic as the controller) before probing permissions.
There was a problem hiding this comment.
This no longer applies to the current revision because waitForPullModeAgentRBACReady() was removed during the refactor.`
| if err := c.Client.Create(ctx, obj); err != nil && !apierrors.IsAlreadyExists(err) { | ||
| return err | ||
| } | ||
| return nil |
There was a problem hiding this comment.
createObjectIfNotExists() treats AlreadyExists as success and never reconciles spec/rules. If a per-cluster RBAC object already exists with outdated or broader rules (e.g., upgrades from older versions), the controller will leave it unchanged even though syncCluster() calls this an “ensure”. Consider switching to a create-or-update/patch approach so the generated RBAC converges to the intended minimal rules.
| if err := c.Client.Create(ctx, obj); err != nil && !apierrors.IsAlreadyExists(err) { | |
| return err | |
| } | |
| return nil | |
| // Try to create the object first. | |
| if err := c.Client.Create(ctx, obj); err == nil { | |
| return nil | |
| } else if !apierrors.IsAlreadyExists(err) { | |
| // If the error is something other than AlreadyExists, surface it. | |
| return err | |
| } | |
| // If the object already exists, fetch it and reconcile its spec/rules | |
| // with the desired state represented by obj. | |
| key := client.ObjectKeyFromObject(obj) | |
| existingObj, ok := obj.DeepCopyObject().(client.Object) | |
| if !ok { | |
| // Should not happen for well-formed client.Object types, but if it does | |
| // we fail fast rather than silently skipping reconciliation. | |
| return fmt.Errorf("failed to deep-copy object for reconciliation: %T", obj) | |
| } | |
| if err := c.Client.Get(ctx, key, existingObj); err != nil { | |
| if apierrors.IsNotFound(err) { | |
| // Object was deleted after the initial AlreadyExists; try to create again. | |
| return c.Client.Create(ctx, obj) | |
| } | |
| return err | |
| } | |
| switch desired := obj.(type) { | |
| case *rbacv1.Role: | |
| current, ok := existingObj.(*rbacv1.Role) | |
| if !ok { | |
| return fmt.Errorf("existing object type mismatch for Role: %T", existingObj) | |
| } | |
| current.Rules = desired.Rules | |
| current.Labels = desired.Labels | |
| current.Annotations = desired.Annotations | |
| case *rbacv1.ClusterRole: | |
| current, ok := existingObj.(*rbacv1.ClusterRole) | |
| if !ok { | |
| return fmt.Errorf("existing object type mismatch for ClusterRole: %T", existingObj) | |
| } | |
| current.Rules = desired.Rules | |
| current.Labels = desired.Labels | |
| current.Annotations = desired.Annotations | |
| case *rbacv1.RoleBinding: | |
| current, ok := existingObj.(*rbacv1.RoleBinding) | |
| if !ok { | |
| return fmt.Errorf("existing object type mismatch for RoleBinding: %T", existingObj) | |
| } | |
| current.Subjects = desired.Subjects | |
| current.RoleRef = desired.RoleRef | |
| current.Labels = desired.Labels | |
| current.Annotations = desired.Annotations | |
| case *rbacv1.ClusterRoleBinding: | |
| current, ok := existingObj.(*rbacv1.ClusterRoleBinding) | |
| if !ok { | |
| return fmt.Errorf("existing object type mismatch for ClusterRoleBinding: %T", existingObj) | |
| } | |
| current.Subjects = desired.Subjects | |
| current.RoleRef = desired.RoleRef | |
| current.Labels = desired.Labels | |
| current.Annotations = desired.Annotations | |
| default: | |
| // For object kinds not explicitly handled here, fall back to a no-op | |
| // on updates to avoid unintentionally mutating unknown types. | |
| return nil | |
| } | |
| return c.Client.Update(ctx, existingObj) |
There was a problem hiding this comment.
This is already addressed in the current revision. The controller now reconciles agent RBAC with CreateOrUpdate instead of treating AlreadyExists as success, so existing objects are converged to the desired rules.`
| func generateAgentRBACResources(clusterName, clusterNamespace string) *agentRBACResources { | ||
| return &agentRBACResources{ | ||
| clusterRoles: []*rbacv1.ClusterRole{generateClusterRole(clusterName)}, | ||
| clusterRoleBindings: []*rbacv1.ClusterRoleBinding{generateClusterRoleBinding(clusterName)}, | ||
| roles: []*rbacv1.Role{generateSecretAccessRole(clusterName, clusterNamespace), generateWorkAccessRole(clusterName)}, | ||
| roleBindings: []*rbacv1.RoleBinding{generateSecretAccessRoleBinding(clusterName, clusterNamespace), generateWorkAccessRoleBinding(clusterName)}, | ||
| } |
There was a problem hiding this comment.
generateAgentRBACResources() always includes secret-related Role/RoleBinding, even when the Cluster has no SecretRef/ImpersonatorSecretRef. That grants secret create/get/patch permissions in a (potentially shared) namespace even when the agent isn’t configured to report or use secrets. Consider only generating the secret RBAC resources when at least one of the secret refs is set on the Cluster spec, to better match the “minimal RBAC” goal.
There was a problem hiding this comment.
The previous karmadactl flow also generated the secret-related Role/RoleBinding as part of the agent RBAC bundle. This PR significantly improves the security posture by removing the bootstrap-time all-privileged rbac-generator identity; it does not attempt to fully minimize every steady-state agent permission in the same change.
| func generateClusterRole(clusterName string) *rbacv1.ClusterRole { | ||
| clusterRoleName := fmt.Sprintf("system:karmada:%s:agent", clusterName) | ||
| return &rbacv1.ClusterRole{ | ||
| TypeMeta: metav1.TypeMeta{ | ||
| APIVersion: rbacv1.SchemeGroupVersion.String(), | ||
| Kind: "ClusterRole", |
There was a problem hiding this comment.
The RBAC generation logic here largely duplicates pkg/karmadactl/register/register.go (GenerateClusterRole/Binding, GenerateSecretAccessRole, etc.). Having the same rule sets defined in two places increases drift risk when permissions evolve. Consider extracting the shared RBAC builder into a non-CLI package (e.g. pkg/util/cluster/...) that both the controller and karmadactl can reuse.
There was a problem hiding this comment.
This was addressed during the refactor. The agent RBAC builders were extracted into a shared utility package, and karmadactl no longer carries its own copy of those control-plane RBAC rules.`
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7268 +/- ##
==========================================
+ Coverage 42.03% 42.15% +0.11%
==========================================
Files 874 875 +1
Lines 53551 53731 +180
==========================================
+ Hits 22511 22648 +137
- Misses 29349 29376 +27
- Partials 1691 1707 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
some tests failed, I'll fix it |
Previously, joining a pull-mode cluster required the bootstrap token to have full admin-like permissions (via the `system:karmada:agent:rbac-generator` role) just so `karmadactl register` could create the necessary RBAC resources for the agent. This was a security risk. This commit shifts that responsibility to the control plane: - Removed the over-privileged `rbac-generator` role. - Granted the bootstrap token minimal permissions (only allowed to create `Cluster` objects). - The `karmada-agent` now creates a `Cluster` object on startup and waits for its RBAC to be ready. - The `cluster_controller` on the control plane watches for new pull-mode clusters and automatically provisions (and cleans up) the specific RBAC resources for that agent. - Cleaned up the old RBAC generation logic from `karmadactl`. Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
afda1ce to
30833a7
Compare
|
The latest commit has been pushed and the related unit tests are now passing ( |
|
This PR includes a small e2e update because it changes |
|
Hey @RainbowMango @zhzhuang-zju @XiShanYongYe-Chang , could you please help review this PR? |
|
They have gone to attend the KubeCon conference, so it might take some time. |
What this PR does:
This PR improves the security of pull-mode cluster bootstrap, as discussed in #7147.
Previously, joining a pull-mode cluster required
karmadactl registerto use a bootstrap credential bound to the highly privilegedsystem:karmada:agent:rbac-generatorrole so it could generate the control-plane RBAC needed bykarmada-agent. If a member cluster were compromised, that credential could be abused.So, here is the new flow:
system:karmada:agent:rbac-generatorrole is removed.karmada-agentcreates the pull-modeClusterobject and waits for control-plane bootstrap to complete.cluster_controllerwatches pull-modeClusterobjects and automatically creates the cluster-specific RBAC resources required by that agent.cluster_controlleralso cleans up those RBAC resources.karmadactl registerandkarmadactl unregister.Which issue this PR fixes:
Closes #7147