Add opa policy to restrict PDBs, always allow at least 1 disruption by viktor-f · Pull Request #2459 · elastisys/compliantkubernetes-apps

viktor-f · 2025-03-11T10:53:40Z

Warning

This is a public repository, ensure not to disclose:

personal data beyond what is necessary for interacting with this pull request, nor
business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

kind/admin-change
kind/dev-change
kind/security
[kind/adr](set-me)

Platform Administrator notice

A new gatekeeper policy has been added that will deny any PodDisruptionBudget and connected Pod controller if the PodDisruptionBudget does not allow at least 1 Pod disruption. Note that this will apply in both sc and wc, it will also apply to namespaces even if they have the label owner=operator.

Application Developer notice

A new gatekeeper policy has been added that will deny any PodDisruptionBudget and connected Pod controller if the PodDisruptionBudget does not allow at least 1 Pod disruption.

What does this PR do / why do we need this PR?

This adds a new gatekeeper policy that will deny any PodDisruptionBudget and connected Pod controller if the PodDisruptionBudget does not allow at least 1 Pod disruption. Note that this will apply in both sc and wc, it will also apply to namespaces even if they have the label owner=operator.

Pod controllers only includes: Deployment, ReplicaSet, StatefulSet, ReplicaController.

The general logic for this is:

If creating or modifying PDB
  If maxUnavailable == 0 or "0%"
    deny request
  If minAvailable
    If any matching pod controller
      If minAvailable is >= number of replicas
        deny request
If creating or modifying pod controller
  If any matching PDB
    Use the logic above under "IF PDB"

In sentences this means that we want to stop either PDBs or pod controllers if they together do not allow for any disruption. If the PDB is using maxUnavailable, then this happens if it is set to 0 or 0%, regardless of the replicas in the pod controller. If the PDB is using minAvailable, then this happens if it is equal or higher than the replicas in the pod controller. The policy includes logic to stop requests that would create or edit both PDBs or pod controllers.

There is also an extra check to allow replicasets to violate this policy if they are controlled by a deployment. This is because any matching PDB will look at all of the pods for the deployment, not any single replicaset.

The logic has been based on an upstream policy, but it has almost entirely been reworked. The logic of matching selectors has been based on the logic in our networkpolicy gatekeeper policy.

I was planning on adding more checks to this. But this took longer to implement than planned, so I'm now planning on turning the rest into a separate task. The things that were not implemented are:

Denying pods without pod controllers that are matching a PDB. This is not a pattern we want to allow.
Denying PDBs and matching daemonsets/jobs/cronjobs if the PDB is matching a daemonsets/jobs/cronjobs. This is not a pattern we want to allow.
Denying PDBs and pods/podcontrollers if the PDB is matching multiple pod controllers or pods without controllers. This is not something we want to allow because "The eviction API will disallow eviction of any pod covered by multiple PDBs, so most users will want to avoid overlapping selectors." (ref)

This PR adds more resources that will be synced (cached) by gatekeeper. This is needed so that we can compare PDBs to pod controllers (otherwise you just have access to the object that is being validated).
But adding resources here will increase the resource usage (primarily memory) for gatekeeper. Syncing PDBs will likely not increase the usage significantly, since there are relatively few PDBs in a cluster. However pod controllers are a lot more common and will noticeably increase the memory usage, but as I show below I think that the usage has not increased so much more that it should prevent us from adding this feature. I did some testing to see how much the resource usage increased. Note that some of the testing included pods, because that would be needed for one of the extra features mentioned above that I was planning on adding but am now skipping.

Number of resources in the cluster:
Pods: 240
Deployments: 55
StatefulSets: 10
ReplicaSets: 73
ReplicationController: 0

Note: many small pod manifests in this test, actual usage with "real pods" is likely higher (tested in scenario 4 and 5 below)

Without pods or controllers:
Gatekeeper controllers 140 MB
Gatekeeper audit 160 MB, 215 MB spikes
With controllers:
Gatekeeper controllers 160 MB
Gatekeeper audit 175 MB idle, 250 MB spikes
With pods and controllers:
Gatekeeper controllers 180 MB
Gatekeeper audit 200 MB idle, 300 MB spikes
With pods and controllers and extra annotations (to get larger pod manifests):
Gatekeeper controllers 190MB
Gatekeeper audit 210 MB idle, 320 MB spikes
With pods and controllers and extra annotations on a larger cluster (1000 pods, 140 deployments, 160 replicasets)
Gatekeeper controllers 290MB idle, 370 MB spikes
Gatekeeper audit 310 MB idle, 520 MB spikes

Part of [3] Add Safeguard for pods that prevent CAPI cluster-autoscaler #2318

Even though this is part of an autoscaling task, this policy is good to have for any type of cluster. Since PDBs can disrupt any type of scenario where we want to drain nodes or similar.

Information to reviewers

As of creating the PR I have not yet added tests or public documentation. I will work on that next, but the code and config should be ok to review for now.

This PR also included some whitespace fixes that I stumbled upon. I hope it is ok that is is fixed in this PR. Otherwise let me know and I will move that to a separate PR.

Checklist

...le.d/charts/gatekeeper/constraints/templates/restrict-pod-disruption-budgets/constraint.yaml

helmfile.d/charts/gatekeeper/templates/policies/restrict-pod-disruption-budgets.rego

Xartos

Super nice! This should definitely fix some issues we have in some clusters

davidumea · 2025-03-19T15:09:23Z

What happens if a pdb that violates any of the rules already exists? Will it have to be manually removed?

viktor-f · 2025-03-19T15:34:54Z

What happens if a pdb that violates any of the rules already exists? Will it have to be manually removed?

That is a good point. The PDB will be left in place, but you cannot edit it or any deployment in a way that continues to violate these rules. But there should not be any issue to modify the PDB or deployment so that it then is valid. Alternatively remove the PDB and start over.

davidumea · 2025-03-20T13:10:02Z

Denying PDBs and matching daemonsets/jobs/cronjobs if the PDB is matching a daemonsets/jobs/cronjobs. This is not a pattern we want to allow.

This was one of the patterns you mentioned that was not included in this PR, was it written correctly? If yes, could you expand?

I like that the PR description is very extensive and informative, thanks for that! I think it would be nice if you wrote in text format as well which patterns to not allow were added in PR.

This adds a new gatekeeper policy that will deny any PodDisruptionBudget and connected Pod controller if the PodDisruptionBudget does not allow at least 1 Pod disruption.

I think this is good information but it doesn't explain what is actually happening under the hood 🙂

viktor-f · 2025-03-20T15:10:32Z

Denying PDBs and matching daemonsets/jobs/cronjobs if the PDB is matching a daemonsets/jobs/cronjobs. This is not a pattern we want to allow.

This was one of the patterns you mentioned that was not included in this PR, was it written correctly? If yes, could you expand?

I think the text is accurate, but maybe not very easy to read. Regardless, the idea is that I think PDBs should not be allowed for daemonsets, cronjobs, and jobs.
The part of the reasoning for this is that the PDBs does not have full functionality on these resources (on any resource that does not have the scale api).
For daemonsets it is also odd to have a PDB since the number of pods will change with the number of nodes and the number of nodes might not be controlled by the users that specify the PDB.
For Jobs and cronjobs the purpose is to at some point have the jobs complete, then the number of pods would naturally reduce and you will then eventually violate the PDB.

Does that make sense?

I like that the PR description is very extensive and informative, thanks for that! I think it would be nice if you wrote in text format as well which patterns to not allow were added in PR.

This adds a new gatekeeper policy that will deny any PodDisruptionBudget and connected Pod controller if the PodDisruptionBudget does not allow at least 1 Pod disruption.

I think this is good information but it doesn't explain what is actually happening under the hood 🙂

Thanks, I will try to clarify this.

viktor-f · 2025-03-20T15:27:31Z

Thanks, I will try to clarify this.

I have now updated the PR description with some more details. Please let me know if this explains it well or if I should clarify further.

davidumea

This PR also included some whitespace fixes that I stumbled upon. I hope it is ok that is is fixed in this PR. Otherwise let me know and I will move that to a separate PR.

I think this is fine, thanks for the cleanup!

I have now updated the PR description with some more details. Please let me know if this explains it well or if I should clarify further.

Thanks it's crystal clear now!

I plan on continuing my review tomorrow

...le.d/charts/gatekeeper/constraints/templates/restrict-pod-disruption-budgets/constraint.yaml

helmfile.d/charts/gatekeeper/templates/templates/config.yaml

viktor-f · 2025-03-20T16:10:39Z

Added a bunch of tests now. I have probably missed some tests, but this should cover most things. I at least have full code coverage:

❯ opa test restrict-pod-disruption-budgets.rego tests/restrict-pod-disruption-budgets.rego -v -c
{
  "files": {
    "restrict-pod-disruption-budgets.rego": {
      ...
      "covered_lines": 98,                                                                                                                                  
      "coverage": 100                                                                                                                                       
    },                                                                                                                                                      
    "tests/restrict-pod-disruption-budgets.rego": {
      ...
      "covered_lines": 257,
      "coverage": 100
    }
  },
  "covered_lines": 355,
  "not_covered_lines": 0,
  "coverage": 100
}

davidumea · 2025-03-21T10:15:27Z

helmfile.d/charts/gatekeeper/templates/policies/tests/restrict-pod-disruption-budgets.rego

+input_wrap(obj) = input {
+    input := {"review": {"object": obj}}
+}


Nice detail 🙂

davidumea · 2025-03-21T10:27:17Z

helmfile.d/charts/gatekeeper/templates/policies/restrict-pod-disruption-budgets.rego

+pod_controller_groups_kinds := [
+    {"group": "apps/v1", "kind": "Deployment"},
+    {"group": "apps/v1", "kind": "StatefulSet"},
+    {"group": "apps/v1", "kind": "ReplicaSet"},
+    {"group": "v1", "kind": "ReplicationController"}
+]


You only use the group here in 1 place (below), might want to split these into separate functions, one for group and one for kind.

compliantkubernetes-apps/helmfile.d/charts/gatekeeper/templates/policies/restrict-pod-disruption-budgets.rego

Line 26 in 8d6c09e

objs := [controllers | controllers := data.inventory.namespace[pdb.metadata.namespace][pod_controller_group_kind.group][pod_controller_group_kind.kind]]

Or do they need to be together to make sure it's the same object? If I understand it correctly it should take one object at a time so even if the information is fetched from two functions that shouldn't be a problem.

The idea was to ensure that it just uses these pairs of group and kind. This was the easiest way that I could think of to do this, but there are probably other ways.
I could also just have two separate lists of groups and kinds and then let it go through all combinations. I assume that is slightly less efficient, but probably not significantly.

IMO the current version feels nice to read and understand which groups and kinds belong to each other. But I think we should use the version that most people find easiest to read and understand. So I'm ok with changing it if that is what you and others want.

Okay, I would be fine with leaving it but I don't think it reads super well when referencing this pod_controller_group_kind.group

I will keep it like this for now then.

helmfile.d/charts/gatekeeper/templates/policies/restrict-pod-disruption-budgets.rego

helmfile.d/charts/gatekeeper/templates/policies/tests/restrict-pod-disruption-budgets.rego

davidumea

I think the code looks good. Really nice work 🙂

I assume the plan is to add public docs and update the links here before you want to merge this?

viktor-f · 2025-03-24T07:28:52Z

Public docs PR is now up as well https://github.com/elastisys/welkin/pull/1073
The code in this PR have been updated with links to the new public docs page (will not work until the public docs page is merged).
So I think that should be the last thing needed. Except for any additional comments you reviewers might have.

helmfile.d/charts/gatekeeper/templates/policies/restrict-pod-disruption-budgets.rego

helmfile.d/charts/gatekeeper/templates/policies/tests/restrict-pod-disruption-budgets.rego

davidumea

Really nice start on this 🙌

…ption

viktor-f · 2025-03-25T10:41:49Z

Task for the scenarios that were not covered in this PR: https://github.com/elastisys/welkin-apps/issues/68

viktor-f added the app/opa-gatekeeper Open Policy Agent Gatekeeper label Mar 11, 2025

viktor-f requested review from Xartos and anders-elastisys March 11, 2025 10:53

viktor-f self-assigned this Mar 11, 2025

viktor-f requested a review from a team as a code owner March 11, 2025 10:53

davidumea changed the title ~~Add opa policy to restrics PDBs, allways allow at least 1 disruption~~ Add opa policy to restrict PDBs, always allow at least 1 disruption Mar 11, 2025

Xartos reviewed Mar 12, 2025

View reviewed changes

Xartos approved these changes Mar 19, 2025

View reviewed changes

davidumea reviewed Mar 20, 2025

View reviewed changes

...le.d/charts/gatekeeper/constraints/templates/restrict-pod-disruption-budgets/constraint.yaml Outdated Show resolved Hide resolved

helmfile.d/charts/gatekeeper/templates/templates/config.yaml Show resolved Hide resolved

viktor-f force-pushed the vf/restrict-pdb-policy branch from e6ee32a to 8d6c09e Compare March 20, 2025 16:05

davidumea reviewed Mar 21, 2025

View reviewed changes

davidumea reviewed Mar 24, 2025

View reviewed changes

davidumea approved these changes Mar 25, 2025

View reviewed changes

viktor-f added 2 commits March 25, 2025 11:21

apps: fix whitespace in opa policy

7f26960

apps: add opa policy to restrics PDBs, allways allow at least 1 disru…

80516c9

…ption

viktor-f force-pushed the vf/restrict-pdb-policy branch from 1f6d705 to 80516c9 Compare March 25, 2025 10:21

viktor-f linked an issue Mar 25, 2025 that may be closed by this pull request

[3] Add Safeguard for pods that prevent CAPI cluster-autoscaler #2318

Closed

2 tasks

viktor-f merged commit 7ebda68 into main Mar 25, 2025
12 checks passed

viktor-f deleted the vf/restrict-pdb-policy branch March 25, 2025 10:42

AlbinB97 mentioned this pull request Mar 31, 2025

apps: Upgraded OPA/gatekeeper to v3.18.2 #2484

Merged

35 tasks

Ajarmar mentioned this pull request May 12, 2025

Gatekeeper warning to let developers set PDBs that lets kured reboot the nodes. #2330

Closed

1 task

Comments

Conversation

viktor-f commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What kind of PR is this?

Platform Administrator notice

Application Developer notice

What does this PR do / why do we need this PR?

Information to reviewers

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xartos left a comment

Choose a reason for hiding this comment

Uh oh!

davidumea commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktor-f commented Mar 19, 2025

Uh oh!

davidumea commented Mar 20, 2025

Uh oh!

viktor-f commented Mar 20, 2025

Uh oh!

viktor-f commented Mar 20, 2025

Uh oh!

davidumea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viktor-f commented Mar 20, 2025

Uh oh!

davidumea Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidumea Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

viktor-f Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidumea Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viktor-f Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidumea left a comment

Choose a reason for hiding this comment

Uh oh!

viktor-f commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidumea left a comment

Choose a reason for hiding this comment

Uh oh!

viktor-f commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viktor-f commented Mar 11, 2025 •

edited

Loading

davidumea commented Mar 19, 2025 •

edited

Loading

davidumea Mar 21, 2025 •

edited

Loading