-
Notifications
You must be signed in to change notification settings - Fork 181
CNTRLPLANE-180: check for user-based SCCs causing PSA violations #1881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ibihim
wants to merge
15
commits into
openshift:main
Choose a base branch
from
ibihim:ibihim/2025-07-30_check-for-user-scc-violation-git-commit-history
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
9fc9063
podsecurityreadinesscontroller: refactor, return test PSS, fix err ret
ibihim a2def2c
podsecurityreadinesscontroller: create classification, add README
ibihim 10fcc51
podsecurityreadinesscontroller: wire classification, simplify conditions
ibihim 8b0d2bb
podsecurityreadinesscontroller: ignore restricted SCCs, drop debuggin…
ibihim 465832d
go mod vendor
ibihim d6adece
podsecurityreadinesscontroller: wire down psapi.ParseLevel to classific
ibihim 3a06aeb
podsecurityreadinesscontroller: create filter without side-effects
ibihim b6899d2
podsecurityreadinesscontroller: change type from cu to unclassified
ibihim ca7b9ab
podsecurityreadinesscontroller: change actual CFE name
ibihim 177afc1
podsecurityreadinesscontroller: fix tests after renaming
ibihim 13a3228
podsecurityreadinesscontroller: add fine-grained classification
ibihim 7fb29f9
podsecurityreadinesscontroller: fix tests to match new logic
ibihim 0f47326
podsecurityreadinesscontroller: remove tautological condition
ibihim fa2281e
podsecurityreadinesscontroller: return err on no violating pod
ibihim 6c0f461
podsecurityreadinesscontroller: adjust unit tests to reflect change
ibihim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Pod Security Readiness Controller | ||
|
||
The Pod Security Readiness Controller evaluates namespace compatibility with Pod Security Admission (PSA) enforcement in clusters. | ||
|
||
## Purpose | ||
|
||
This controller performs dry-run PSA evaluations to determine which namespaces would experience pod creation failures if PSA enforcement labels were applied. | ||
|
||
The controller generates telemetry data for `ClusterFleetEvaluation` and helps us to understand PSA compatibility before enabling enforcement. | ||
|
||
## Implementation | ||
|
||
The controller follows this evaluation algorithm: | ||
|
||
1. **Namespace Discovery** - Find namespaces without PSA enforcement | ||
2. **PSA Level Determination** - Predict what enforcement level would be applied | ||
3. **Dry-Run Evaluation** - Test namespace against predicted PSA level | ||
4. **Violation Classification** - Categorize any violations found for telemetry | ||
|
||
### Namespace Discovery | ||
|
||
Selects namespaces without PSA enforcement labels: | ||
|
||
```go | ||
selector := "!pod-security.kubernetes.io/enforce" | ||
``` | ||
|
||
### PSA Level Determination | ||
|
||
The controller determines the effective PSA enforcement level using this precedence: | ||
|
||
1. `security.openshift.io/MinimallySufficientPodSecurityStandard` annotation | ||
2. Most restrictive of existing `pod-security.kubernetes.io/warn` or `pod-security.kubernetes.io/audit` labels, if owned by the PSA label syncer | ||
3. Kube API server's future global default: `restricted` | ||
|
||
### Dry-Run Evaluation | ||
|
||
The controller performs the equivalent of this oc command: | ||
|
||
```bash | ||
oc label --dry-run=server --overwrite namespace $NAMESPACE_NAME \ | ||
pod-security.kubernetes.io/enforce=$POD_SECURITY_STANDARD | ||
``` | ||
|
||
PSA warnings during dry-run indicate the namespace contains violating workloads. | ||
|
||
### Violation Classification | ||
|
||
Violating namespaces are categorized for telemetry analysis: | ||
|
||
| Classification | Criteria | Purpose | | ||
|------------------|-----------------------------------------------------------------|----------------------------------------| | ||
| `runLevelZero` | Core namespaces: `kube-system`, `default`, `kube-public` | Platform infrastructure tracking | | ||
| `openshift` | Namespaces with `openshift-` prefix | OpenShift component tracking | | ||
| `disabledSyncer` | Label `security.openshift.io/scc.podSecurityLabelSync: "false"` | Intentionally excluded namespaces | | ||
| `userSCC` | Contains user workloads that violate PSA | SCC vs PSA policy conflicts | | ||
| `customer` | All other violating namespaces | Customer workload compatibility issues | | ||
| `inconclusive` | Evaluation failed due to API errors | Operational problems | | ||
|
||
#### User SCC Detection | ||
|
||
The PSA label syncer bases its evaluation exclusively on a ServiceAccount's SCCs, ignoring a user's SCCs. | ||
When a pod's SCC assignment comes from user permissions rather than its ServiceAccount, the syncer's predicted PSA level may be incorrect. | ||
Therefore we need to evaluate the affected pods (if any) against the target PSA level. | ||
|
||
### Inconclusive Handling | ||
|
||
When the evaluation process fails, namespaces are marked as `inconclusive`. | ||
|
||
Common causes for inconclusive results: | ||
|
||
- **API server unavailable** - Network timeouts, etcd issues | ||
- **Resource conflicts** - Concurrent namespace modifications | ||
- **Invalid PSA levels** - Malformed enforcement level strings | ||
- **Pod listing failures** - RBAC issues or resource pressure | ||
|
||
High rates of inconclusive results across the fleet may indicate systematic issues that requires investigation. | ||
|
||
## Output | ||
|
||
The controller updates `OperatorStatus` conditions for each violation type: | ||
|
||
```go | ||
type podSecurityOperatorConditions struct { | ||
violatingRunLevelZeroNamespaces []string | ||
violatingOpenShiftNamespaces []string | ||
violatingDisabledSyncerNamespaces []string | ||
violatingCustomerNamespaces []string | ||
userSCCViolationNamespaces []string | ||
inconclusiveNamespaces []string | ||
} | ||
``` | ||
|
||
Conditions follow the pattern: | ||
|
||
- `PodSecurity{Type}EvaluationConditionsDetected` | ||
- Status: `True` (violations found) / `False` (no violations) | ||
- Message includes violating namespace list | ||
|
||
## Configuration | ||
|
||
The controller runs with a configurable interval (default: 4 hours) and uses rate limiting to avoid overwhelming the API server: | ||
|
||
```go | ||
kubeClientCopy.QPS = 2 | ||
kubeClientCopy.Burst = 2 | ||
``` | ||
|
||
## Integration Points | ||
|
||
- **PSA Label Syncer**: Reads syncer-managed PSA labels to predict enforcement levels | ||
- **Cluster Operator**: Reports status through standard operator conditions | ||
- **Telemetry**: Violation data feeds into cluster fleet analysis systems |
101 changes: 101 additions & 0 deletions
101
pkg/operator/podsecurityreadinesscontroller/classification.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
package podsecurityreadinesscontroller | ||
|
||
import ( | ||
"context" | ||
"errors" | ||
"strings" | ||
|
||
securityv1 "github.com/openshift/api/security/v1" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/util/sets" | ||
"k8s.io/klog/v2" | ||
psapi "k8s.io/pod-security-admission/api" | ||
"k8s.io/pod-security-admission/policy" | ||
) | ||
|
||
var ( | ||
runLevelZeroNamespaces = sets.New[string]( | ||
"default", | ||
"kube-system", | ||
"kube-public", | ||
"kube-node-lease", | ||
) | ||
errNoViolatingPods = errors.New("no violating pods in violating namespace") | ||
) | ||
|
||
func (c *PodSecurityReadinessController) classifyViolatingNamespace( | ||
ctx context.Context, | ||
conditions *podSecurityOperatorConditions, | ||
ns *corev1.Namespace, | ||
enforceLevel psapi.Level, | ||
) error { | ||
if runLevelZeroNamespaces.Has(ns.Name) { | ||
conditions.addViolatingRunLevelZero(ns) | ||
return nil | ||
} | ||
if strings.HasPrefix(ns.Name, "openshift") { | ||
conditions.addViolatingOpenShift(ns) | ||
return nil | ||
} | ||
if ns.Labels[labelSyncControlLabel] == "false" { | ||
conditions.addViolatingDisabledSyncer(ns) | ||
return nil | ||
} | ||
|
||
// Evaluate by individual pod. | ||
allPods, err := c.kubeClient.CoreV1().Pods(ns.Name).List(ctx, metav1.ListOptions{}) | ||
if err != nil { | ||
// Will end up in inconclusive as we couldn't diagnose the violation root | ||
// cause. | ||
klog.V(2).ErrorS(err, "Failed to list pods in namespace", "namespace", ns.Name) | ||
return err | ||
} | ||
|
||
isViolating := createPodViolationEvaluator(c.psaEvaluator, enforceLevel) | ||
violatingPods := []corev1.Pod{} | ||
for _, pod := range allPods.Items { | ||
if isViolating(pod) { | ||
violatingPods = append(violatingPods, pod) | ||
} | ||
} | ||
if len(violatingPods) == 0 { | ||
klog.V(2).ErrorS(errNoViolatingPods, "failed to find violating pod", "namespace", ns.Name) | ||
return errNoViolatingPods | ||
} | ||
|
||
violatingUserSCCPods := []corev1.Pod{} | ||
for _, pod := range violatingPods { | ||
if pod.Annotations[securityv1.ValidatedSCCSubjectTypeAnnotation] == "user" { | ||
violatingUserSCCPods = append(violatingUserSCCPods, pod) | ||
} | ||
} | ||
if len(violatingUserSCCPods) > 0 { | ||
conditions.addViolatingUserSCC(ns) | ||
} | ||
if len(violatingUserSCCPods) != len(violatingPods) { | ||
conditions.addUnclassifiedIssue(ns) | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func createPodViolationEvaluator(evaluator policy.Evaluator, enforcement psapi.Level) func(pod corev1.Pod) bool { | ||
return func(pod corev1.Pod) bool { | ||
results := evaluator.EvaluatePod( | ||
psapi.LevelVersion{ | ||
Level: enforcement, | ||
Version: psapi.LatestVersion(), | ||
}, | ||
&pod.ObjectMeta, | ||
&pod.Spec, | ||
) | ||
|
||
for _, result := range results { | ||
if !result.Allowed { | ||
return true | ||
} | ||
} | ||
return false | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that any of these can be true and the evalutions further below also apply?
I.e is it possible to have a user-created Pod running in a violating run-level zero namespace, openshift namespace, or namespace where the syncer is disabled?
Would it be valuable to still continue classification even if one of these is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issues above are very profound. It is in the place of "someone consciously" did something, while below we are in the space of exploration / figuring out what the root cause is.
Runlevel Zero is mostly excluded a layer above, so this actually doesn't happen anymore. It occurred up until 4.14 or so.
If OpenShift is triggered a team needs to evaluate what they did and set their PSA level for their namespace accordingly.
If a customer disables we simply don't care and assume they take ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we prevent users from creating workloads in these namespaces somehow?
If not, I could imagine getting reports from a cluster where an
openshift-*
namespace has violations from a user created workload, but because we don't root cause these violations we are oblivious to that fact.If this is something we consider extremely rare and unlikely to affect our metrics in any significant way, I'm fine with this as-is but is something that I think is worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't prevent users from creating
openshift-
namespaces. Some Red Hat department even suggest to createopenshfit-
-pre-fixed-namespaces when backing up etcd 😄In most cases
openshift-
namespaces were teams that didn't adjust yet to the PSA enforcement.