Skip to content

Conversation

rashmigottipati
Copy link
Member

This PR proposes the deprecation of the serviceAccount field in the OLMv1 ClusterExtension API.

Key points:

  • Deprecates the serviceAccount field in the ClusterExtension API
  • Emphasizes that only cluster-admins should have write access to the ClusterExtension API
  • Discusses RBAC implications and security considerations

@openshift-ci openshift-ci bot requested review from celebdor and sdodson October 9, 2025 19:37
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign stleerh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rashmigottipati rashmigottipati changed the title Deprecate serviceAccount field in OLMv1 ClusterExtension API OPRUN-4172: Deprecate serviceAccount field in OLMv1 ClusterExtension API Oct 9, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2025

@rashmigottipati: This pull request references OPRUN-4172 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR proposes the deprecation of the serviceAccount field in the OLMv1 ClusterExtension API.

Key points:

  • Deprecates the serviceAccount field in the ClusterExtension API
  • Emphasizes that only cluster-admins should have write access to the ClusterExtension API
  • Discusses RBAC implications and security considerations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 9, 2025
@rashmigottipati
Copy link
Member Author

cc @trgeiger @grokspawn

@everettraven
Copy link
Contributor

This looks like it makes changes to APIs on OpenShift but doesn't include an API reviewer in the set of approvers. Please reach out on the #forum-api-review channel in slack to request an API reviewer be assigned to review this enhancement.

Thanks!

@rashmigottipati
Copy link
Member Author

Thanks @everettraven. I’ve reached out in #forum-api-review on Slack to request an API reviewer for this enhancement. I’ll update the PR’s approvers list once a reviewer is assigned.

- "@grokspawn"
approvers:
- "@trgeiger"
- "@grokspawn"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "@grokspawn"
- "@grokspawn"
- "@joelanford"


Deprecate the `.spec.serviceAccount` field from the ClusterExtension API in the operator-controller. This field was originally introduced to enforce least privilege by requiring users to provide a ServiceAccount with the necessary RBAC permissions to manage extension content. This proposal removes that requirement and simplifies controller logic and behavior.

**Note**: This deprecation is an interim step. While the intent is to eventually remove the `.spec.serviceAccount` field, we will keep it as optional and mark it as deprecated during this transition period spanning multiple releases. This approach allows users time to adapt without disruption before the field is fully removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the graduation plan is a good place to discuss this, and maybe I'm just imagining that we're going to end up having a lively discussion on the path forward and not wanting to have to update two places.
Maybe it's just me... it seems like a good callout aside from that. 😄

## Motivation

The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd understood that it also arose from the need to implement a rollout revision approach (using boxcutter) which required cluster admin permissions anyway, so we couldn't reasonably reduce the privileges of a "package manager" from cluster admin levels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to use boxcutter for a revision rollout approach?

What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?


- Make `.spec.serviceAccount` field **optional** in the `ClusterExtension` API.
- Update the controller to **ignore** the field during reconciliation.
- Log a warning when `.spec.serviceAccount` is set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to adopt an approach similar to how k8s tracks deprecation (see: https://kubernetes.io/blog/2020/09/03/warnings/#metrics). For example, we could implement a clusterextension_requested_deprecated_api metric, or an admission webhook to warn when we try to create a ClusterExtension with a serviceaccount.

// Deprecated: This field is ignored and will be removed in a future release.
// +optional
// +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored"
ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered moving this to a pointer type to be a potentially breaking change we could avoid, so we're using go1.24's omitzero capability to omit this field from the struct if it is set to only 'default' values: https://github.com/operator-framework/operator-controller/pull/2242/files#diff-387659018fbee6e27fcf5ab561b45489e14b62ae73ecf00fed3a0f4c017714d1R75

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand, was an empty string ("") a valid value here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It would've failed the DNS1123 label check.

- The controller now uses a static `rest.Config` created from its own identity (its default ServiceAccount).
- The RestConfigMapper will now be directly set to `ClusterAdminRestConfigMapper(mgr.GetConfig())`, i.e. the controller always uses its own identity (cluster-admin level config) for all reconciliation and watching operations.
This config is passed to the helm.ActionConfigGetter, ensuring that all Helm operations (install, upgrade, uninstall) use the same identity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include the removal of the preflight permissions checks here, or is that not required because those are not GA'd?


### Graduation Criteria / Deprecation Plan

We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we really need arch guidance. We've been informed that GA'd OCP APIs have functionally never been removed, even though nominally we could -- after a reasonable period -- up-version the API and remove the field, including a conversion webhook to handle upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what our process here for an up-version of an API is in OpenShift, but if there is one, I imagine it wouldn't be too dissimilar from Kubernetes (https://kubernetes.io/docs/reference/using-api/deprecation-policy/#rest-resources-aka-api-objects).

You cannot remove the API version until it has "aged out" - in which case you are confident that all users have switched over to the newer, non-deprecated API. In the upstream case it looks like release N+9. You also must have a field in the next version of the API to map the old field value to handle round-tripping without data loss.

As far as I know, OpenShift has never removed a stable API version so I would default to not being able to do this.

@JoelSpeed may have more familiarity with whether or not this is something we might have a way to allow.

I know OLM is an upstream project that we sync the APIs downstream, and we generally aren't as restrictive there because we don't entirely control the API surface, but IIUC we consider OLM a core component of the OpenShift payload so I would expect that we end up treating it with a bit more caution.

- The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised.
- Cluster administrators must carefully manage the controller’s permissions.
- Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost signal if an operator upgrade requires permissions not provided by the existing, configured serviceaccount (since OLMv1 can do anything, we might perpetrate the upgrade and then have the operator fail).

Signed-off-by: Rashmi Gottipati <[email protected]>
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@rashmigottipati: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint b1627bf link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +11 to +13
- "@everettraven"
- "@trgeiger"
- "@grokspawn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there should generally be one approver that is responsible for helping drive to consensus amongst the set of stakeholders (generally represented in the reviewers section).

With multiple approvers it gets difficult to know who is responsible for actually helping to get this "over the line".

The approver should generally be someone from your team, usually a team lead or a staff engineer.

I should also be placed under a specific subgroup of approvers called api-approvers

- Custom `rest.Config` generation and dynamic HTTP clients
- Token expiration edge cases

Given the limited benefit and high complexity of this approach, we propose simplifying the model:
Copy link
Contributor

@everettraven everettraven Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if you could clarify what the "limited benefit" here is.

You start this section talking about how the intention was originally for following a least-privilege principle which, on the surface, sounds like it would actually be a pretty big benefit.

After all, looking at the documentation for OLMv1 it seems like "secure by default" is a core tenant tenet: https://github.com/operator-framework/operator-controller/blob/292c0dbc236747e57f77fd914830a865e87fad6b/docs/project/olmv1_design_decisions.md?plain=1#L179-L186

As a reviewer, I'd like to better understand:

  • Why this tenant tenet was originally seen as a win for end users
  • Why this tenant tenet turned out to not be a win for end users

## Motivation

The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to use boxcutter for a revision rollout approach?

What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?

Comment on lines +39 to +41
- Token acquisition, rotation, and error handling
- Custom `rest.Config` generation and dynamic HTTP clients
- Token expiration edge cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actively experiencing issues with being able to handle these scenarios?

From what I remember this logic is all already implemented and is flexed during testing. I'm curious how many times the logic to do these things have been problematic from both an end-user perspective and a maintenance perspective.

- Optional
- Deprecated via struct tags and documentation
- Marked as deprecated in the CRD schema using OpenAPI extensions:
- x-kubernetes-deprecated: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this exists?

As far as I am aware, there is not yet a way to mark a specific field in a CRD schema as deprecated. https://pkg.go.dev/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions#JSONSchemaProps would be your canonical source of things you can set for a particular schema property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like maybe it hasn't landed yet: kubernetes/kubernetes#131817
So the only option is to comment that these are deprecated (in addition to programmatic capture of "a deprecated thing is being used").

// Deprecated: This field is ignored and will be removed in a future release.
// +optional
// +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored"
ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand, was an empty string ("") a valid value here?


## Drawbacks
- Users lose fine-grained permission control per extension via separate ServiceAccounts.
- The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but there are likely other ServiceAccounts on the cluster that already have broad enough permissions to do damage. To do that damage you have to first get access to the cluster.

This could certainly be considered a drawback to increase the permissions associated with the OLM Service Account, but I think a bigger drawback I don't see explicitly mentioned here is the potential for OLM to be the thing that injected a malicious actor into the cluster because it no longer will fail to stamp out permissions it wasn't explicitly told it could.

While it may still be difficult, we've seen in the past malicious actors sneak something into a popular project that is then shipped out en masse. With this change, OLM could be used to get that access to a cluster.


1. Keep `.spec.serviceAccount` and Improve Token Handling
Instead of removing the field, we could improve token management (like automatic refresh and better error handling).
- However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I'm not convinced on the "rarely helpful" part. Please do update this document appropriately to explicitly state why this has been found to be rarely helpful.

- Cluster administrators must carefully manage the controller’s permissions.
- Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities.

## Alternatives (Not Implemented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some alternatives I've not seen mentioned as being considered. Curious if you've thought of them:

  • Building better client tooling that makes managing permission requirements easier for users looking to install cluster extensions.
  • Building a permission requesting system that makes it easier for users to request installation of a cluster extension and OLM creating a "permission request" that an admin can either approve or deny to automatically stamp out the RBAC needed to complete the installation.

- This would make the API more complex, go against common Kubernetes practices, and could confuse users.

3. Do Nothing – Keep Supporting the Field as Is
- The added complexity and risks from impersonation outweigh the benefits, so leaving it unchanged is not a good option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I still don't have a solid understanding of how painful this added complexity and risks of impersonation actually are

- Remove all logic that:
- Token Acquisition: Eliminate use of TokenRequest API to fetch short-lived tokens for user-provided ServiceAccounts.
- Rest Config Mapping: Remove the ServiceAccountRestConfigMapper, which dynamically generated rest.Config objects for impersonation.
- Synthetic Permissions: Remove conditional logic for SyntheticPermissions that depended on impersonated clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do "Synthetic Permissions" play a role in service account based permissions? What are "Synthetic Permissions"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants