Skip to content

Conversation

ndk
Copy link
Contributor

@ndk ndk commented Aug 4, 2025

#1469 · feat: declarative Grafana Service Account management

Design proposal: #003


Why

The Grafana Operator lets you manage Grafana through Kubernetes CRs, but service accounts were still a manual step (GUI or HTTP API). This PR lets you declare SAs in the Grafana CR so the operator can:

  • Manage service accounts in Grafana
  • Generate API tokens
  • Store each token in a Kubernetes Secret
  • Clean everything up when the CR changes or is removed

What's inside

  • New GrafanaServiceAccount CR.
  • Full management flow for service accounts, tokens, and secrets.
  • Operator records managed items in status.serviceAccounts and exposes conditions.
  • Chainsaw e2e: tests/e2e/grafanaserviceaccount.

Design notes

  • orgId defaults to the default organization

Out of scope (for now)

  • Multi-org support. All calls target the default Grafana org.
  • Cross-namespace Secret writes.
  • Automatic token rotation (expires).
  • Permissions (it's an Enterprise/Cloud-only feature).

Known limitations

  • Token secrets are always written to the same namespace as the GrafanaServiceAccount.

TODO

CR example

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaServiceAccount
metadata:
  name: mysa
spec:
  name: my-service-account
  role: Admin
  instanceName: grafana
  isDisabled: false
  tokens:
    - name: my-token
      secretName: ddd
      expires: 2029-12-31T14:00:00+02:00

@github-actions github-actions bot added documentation Issues relating to documentation, missing, non-clear etc. feature this PR introduces a new feature labels Aug 4, 2025
@ndk ndk force-pushed the feat_sa2 branch 9 times, most recently from e2c8577 to 12800fc Compare August 8, 2025 11:09
Copy link
Collaborator

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave it this for now and review the remaining functions when I have more time to familiarize myself with the service accounts api :)

@ndk ndk force-pushed the feat_sa2 branch 4 times, most recently from 5dd705f to 27ddc60 Compare August 11, 2025 11:06
@ndk ndk force-pushed the feat_sa2 branch 3 times, most recently from 2c92519 to 079dbac Compare August 13, 2025 09:56
Copy link
Collaborator

@theSuess theSuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code & flow looks good to me! Left some minor points of improvement but nothing major. Good work @ndk!

Copy link
Collaborator

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too much to ask, would you include an example of using the new CR under examples/serviceaccounts/?

@ndk ndk requested a review from Baarsgaard August 26, 2025 13:57
Copy link
Collaborator

@theSuess theSuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's see if @Baarsgaard has any last comments - other than that, I'm happy to merge this

Copy link
Collaborator

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this I found a minor oversight, which might be a regression from an earlier version:
The operator currently fails to take ownership if a Service account with the same name already exists in Grafana:

status:
  conditions:
  - lastTransitionTime: "2025-08-30T21:23:29Z"
    message: |-
      ServiceAccount failed to be applied for 1 out of 1 instances. Errors:
      - grafana: upserting service account: creating service account: [POST /serviceaccounts][400] createServiceAccountBadRequest {"message":"service account already exists"}
    observedGeneration: 1
    reason: ApplyFailed
    status: "False"
    type: ServiceAccountSynchronized

Sidenote @theSuess, does it ever make sense to create a GrafanaServiceAccount with an empty token list?
Currently, if the token list is empty, the operator will delete any manually created tokens as outlined in the Proposal, but is there any use for serviceaccounts that does not contain tokens?
If not, should there be a MinItems validation on the .spec.tokens field?

@Baarsgaard
Copy link
Collaborator

Everything else I tested worked flawlessly! Really nice work @ndk :D

@Baarsgaard
Copy link
Collaborator

Baarsgaard commented Aug 30, 2025

Last note:
Most other CRs that have a .spec.title or .spec.name have them as optional and derive the name from the .metadata.name if omitted.
Should the same be done here with the difference being that .spec.name remains immutable and cannot be updated after creation?

@ndk
Copy link
Contributor Author

ndk commented Aug 30, 2025

While testing this I found a minor oversight, which might be a regression from an earlier version: The operator currently fails to take ownership if a Service account with the same name already exists in Grafana:

status:
  conditions:
  - lastTransitionTime: "2025-08-30T21:23:29Z"
    message: |-
      ServiceAccount failed to be applied for 1 out of 1 instances. Errors:
      - grafana: upserting service account: creating service account: [POST /serviceaccounts][400] createServiceAccountBadRequest {"message":"service account already exists"}
    observedGeneration: 1
    reason: ApplyFailed
    status: "False"
    type: ServiceAccountSynchronized

Sidenote @theSuess, does it ever make sense to create a GrafanaServiceAccount with an empty token list? Currently, if the token list is empty, the operator will delete any manually created tokens as outlined in the Proposal, but is there any use for serviceaccounts that does not contain tokens? If not, should there be a MinItems validation on the .spec.tokens field?

On the existing service account case. There's no perfect solution. If we take ownership, a simple typo in the name could end up wiping tokens and breaking existing workflows. I'd rather surface the conflict so the user decides what to do. Maybe a flag to control this behavior (takeOwnership: true) could work.

As for empty token lists. I'd stick to mirroring Grafana’s API. If someone wants a service account without tokens, why block it? Adding minItems doesn't really help UX, it just limits flexibility.

@Baarsgaard
Copy link
Collaborator

Baarsgaard commented Sep 1, 2025

On the existing service account case. There's no perfect solution. If we take ownership, a simple typo in the name could end up wiping tokens and breaking existing workflows. I'd rather surface the conflict so the user decides what to do. Maybe a flag to control this behavior (takeOwnership: true) could work.

We discussed this in the weekly a while back, if I recall correctly, the reasoning is twofold:

  1. The same risk/behaviour is present for all the other CRs and this behaving differently would be weird.
  2. In a disaster recovery scenario where external Grafanas are in play, the user will have to manually clean up old service accounts if the cluster state is lost, either by patching all CRs with the suggested field or delete the account manually by hand or script.

I checked the revised proposal and it's included under Scopes and limitations

As for empty token lists. I'd stick to mirroring Grafana’s API. If someone wants a service account without tokens, why block it? Adding minItems doesn't really help UX, it just limits flexibility.

Depends on which behaviour is considered more confusing?
If service accounts have no use without a token, then creating valid service accounts without tokens might lead users to think they have to create a token manually, which is then automatically removed shortly after on the next reconcile.

Where as the opposite would be that they cannot apply the resource without at least one token name defined.

@Baarsgaard Baarsgaard enabled auto-merge September 1, 2025 10:42
@Baarsgaard Baarsgaard added this pull request to the merge queue Sep 1, 2025
Merged via the queue into grafana:master with commit 9cc2eb4 Sep 1, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues relating to documentation, missing, non-clear etc. feature this PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants