Skip to content

Commit f1c62f9

Browse files
committed
podsecurityreadinesscontroller: create classification, add README
1 parent 893dc07 commit f1c62f9

File tree

4 files changed

+730
-5
lines changed

4 files changed

+730
-5
lines changed
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Pod Security Readiness Controller
2+
3+
The Pod Security Readiness Controller evaluates namespace compatibility with Pod Security Admission (PSA) enforcement in clusters.
4+
5+
## Purpose
6+
7+
This controller performs dry-run PSA evaluations to determine which namespaces would experience pod creation failures if PSA enforcement labels were applied.
8+
9+
The controller generates telemetry data for `ClusterFleetEvaluation` and helps us to understand PSA compatibility before enabling enforcement.
10+
11+
## Implementation
12+
13+
The controller follows this evaluation algorithm:
14+
15+
1. **Namespace Discovery** - Find namespaces without PSA enforcement
16+
2. **PSA Level Determination** - Predict what enforcement level would be applied
17+
3. **Dry-Run Evaluation** - Test namespace against predicted PSA level
18+
4. **Violation Classification** - Categorize any violations found for telemetry
19+
20+
### Namespace Discovery
21+
22+
Selects namespaces without PSA enforcement labels:
23+
24+
```go
25+
selector := "!pod-security.kubernetes.io/enforce"
26+
```
27+
28+
### PSA Level Determination
29+
30+
The controller determines the effective PSA enforcement level using this precedence:
31+
32+
1. `security.openshift.io/MinimallySufficientPodSecurityStandard` annotation
33+
2. Most restrictive of existing `pod-security.kubernetes.io/warn` or `pod-security.kubernetes.io/audit` labels, if owned by the PSA label syncer
34+
3. Kube API server's future global default: `restricted`
35+
36+
### Dry-Run Evaluation
37+
38+
The controller performs the equivalent of this oc command:
39+
40+
```bash
41+
oc label --dry-run=server --overwrite namespace $NAMESPACE_NAME \
42+
pod-security.kubernetes.io/enforce=$POD_SECURITY_STANDARD
43+
```
44+
45+
PSA warnings during dry-run indicate the namespace contains violating workloads.
46+
47+
### Violation Classification
48+
49+
Violating namespaces are categorized for telemetry analysis:
50+
51+
| Classification | Criteria | Purpose |
52+
|------------------|-----------------------------------------------------------------|----------------------------------------|
53+
| `runLevelZero` | Core namespaces: `kube-system`, `default`, `kube-public` | Platform infrastructure tracking |
54+
| `openshift` | Namespaces with `openshift-` prefix | OpenShift component tracking |
55+
| `disabledSyncer` | Label `security.openshift.io/scc.podSecurityLabelSync: "false"` | Intentionally excluded namespaces |
56+
| `userSCC` | Contains user workloads that violate PSA | SCC vs PSA policy conflicts |
57+
| `customer` | All other violating namespaces | Customer workload compatibility issues |
58+
| `inconclusive` | Evaluation failed due to API errors | Operational problems |
59+
60+
#### User SCC Detection
61+
62+
The PSA label syncer bases its evaluation exclusively on a ServiceAccount's SCCs, ignoring a user's SCCs.
63+
When a pod's SCC assignment comes from user permissions rather than its ServiceAccount, the syncer's predicted PSA level may be incorrect.
64+
Therefore we need to evaluate the affected pods (if any) against the target PSA level.
65+
66+
### Inconclusive Handling
67+
68+
When the evaluation process fails, namespaces are marked as `inconclusive`.
69+
70+
Common causes for inconclusive results:
71+
72+
- **API server unavailable** - Network timeouts, etcd issues
73+
- **Resource conflicts** - Concurrent namespace modifications
74+
- **Invalid PSA levels** - Malformed enforcement level strings
75+
- **Pod listing failures** - RBAC issues or resource pressure
76+
77+
High rates of inconclusive results across the fleet may indicate systematic issues that requires investigation.
78+
79+
## Output
80+
81+
The controller updates `OperatorStatus` conditions for each violation type:
82+
83+
```go
84+
type podSecurityOperatorConditions struct {
85+
violatingRunLevelZeroNamespaces []string
86+
violatingOpenShiftNamespaces []string
87+
violatingDisabledSyncerNamespaces []string
88+
violatingCustomerNamespaces []string
89+
userSCCViolationNamespaces []string
90+
inconclusiveNamespaces []string
91+
}
92+
```
93+
94+
Conditions follow the pattern:
95+
96+
- `PodSecurity{Type}EvaluationConditionsDetected`
97+
- Status: `True` (violations found) / `False` (no violations)
98+
- Message includes violating namespace list
99+
100+
## Configuration
101+
102+
The controller runs with a configurable interval (default: 4 hours) and uses rate limiting to avoid overwhelming the API server:
103+
104+
```go
105+
kubeClientCopy.QPS = 2
106+
kubeClientCopy.Burst = 2
107+
```
108+
109+
## Integration Points
110+
111+
- **PSA Label Syncer**: Reads syncer-managed PSA labels to predict enforcement levels
112+
- **Cluster Operator**: Reports status through standard operator conditions
113+
- **Telemetry**: Violation data feeds into cluster fleet analysis systems
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
package podsecurityreadinesscontroller
2+
3+
import (
4+
"context"
5+
"fmt"
6+
"strings"
7+
8+
securityv1 "github.com/openshift/api/security/v1"
9+
corev1 "k8s.io/api/core/v1"
10+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
11+
"k8s.io/apimachinery/pkg/util/sets"
12+
"k8s.io/klog/v2"
13+
psapi "k8s.io/pod-security-admission/api"
14+
)
15+
16+
var (
17+
runLevelZeroNamespaces = sets.New[string](
18+
"default",
19+
"kube-system",
20+
"kube-public",
21+
"kube-node-lease",
22+
)
23+
)
24+
25+
func (c *PodSecurityReadinessController) classifyViolatingNamespace(ctx context.Context, conditions *podSecurityOperatorConditions, ns *corev1.Namespace, enforceLevel string) error {
26+
if runLevelZeroNamespaces.Has(ns.Name) {
27+
conditions.addViolatingRunLevelZero(ns)
28+
return nil
29+
}
30+
if strings.HasPrefix(ns.Name, "openshift") {
31+
conditions.addViolatingOpenShift(ns)
32+
return nil
33+
}
34+
if ns.Labels[labelSyncControlLabel] == "false" {
35+
conditions.addViolatingDisabledSyncer(ns)
36+
return nil
37+
}
38+
39+
// TODO@ibihim: increase log level
40+
klog.InfoS("Checking for user violations", "namespace", ns.Name, "enforceLevel", enforceLevel)
41+
isUserViolation, err := c.isUserViolation(ctx, ns, enforceLevel)
42+
if err != nil {
43+
klog.V(2).ErrorS(err, "Error checking user violations", "namespace", ns.Name)
44+
// Transient API server error or temporary resource unavailability (most likely).
45+
// Theoretically, psapi parsing errors could occur that retry without hope for recovery.
46+
return err
47+
}
48+
49+
// TODO@ibihim: increase log level
50+
klog.InfoS("User violation check result", "namespace", ns.Name, "isUserViolation", isUserViolation)
51+
if isUserViolation {
52+
// TODO@ibihim: increase log level
53+
klog.InfoS("Adding namespace to user SCC violations", "namespace", ns.Name)
54+
conditions.addViolatingUserSCC(ns)
55+
return nil
56+
}
57+
58+
// Historically, we assume that this is a customer issue, but
59+
// actually it means we don't know what the root cause is.
60+
conditions.addViolatingCustomer(ns)
61+
62+
return nil
63+
}
64+
65+
func (c *PodSecurityReadinessController) isUserViolation(ctx context.Context, ns *corev1.Namespace, label string) (bool, error) {
66+
var enforcementLevel psapi.Level
67+
switch strings.ToLower(label) {
68+
case "restricted":
69+
enforcementLevel = psapi.LevelRestricted
70+
case "baseline":
71+
enforcementLevel = psapi.LevelBaseline
72+
case "privileged":
73+
// If privileged is violating, something is seriously wrong
74+
// but testing against privileged level is pointless (everything passes)
75+
klog.V(2).InfoS("Namespace violating privileged level - skipping user check",
76+
"namespace", ns.Name)
77+
return false, nil
78+
default:
79+
return false, fmt.Errorf("unknown level: %q", label)
80+
}
81+
82+
allPods, err := c.kubeClient.CoreV1().Pods(ns.Name).List(ctx, metav1.ListOptions{})
83+
if err != nil {
84+
klog.V(2).ErrorS(err, "Failed to list pods in namespace", "namespace", ns.Name)
85+
return false, err
86+
}
87+
88+
var userPods []corev1.Pod
89+
for _, pod := range allPods.Items {
90+
// TODO@ibihim: we should exclude Pod that have restricted-v2.
91+
// restricted-v2 SCCs are allowed for all system:authenticated. ServiceAccounts
92+
// are able to use that, but they are not part of the group. So restricted-v2
93+
// will always result in user.
94+
if pod.Annotations[securityv1.ValidatedSCCSubjectTypeAnnotation] == "user" {
95+
userPods = append(userPods, pod)
96+
}
97+
}
98+
99+
if len(userPods) == 0 {
100+
return false, nil // No user pods = violation is from service accounts
101+
}
102+
103+
enforcementVersion := psapi.LatestVersion()
104+
for _, pod := range userPods {
105+
klog.InfoS("Evaluating user pod against PSA level",
106+
"namespace", ns.Name, "pod", pod.Name, "level", label,
107+
"podSecurityContext", pod.Spec.SecurityContext)
108+
109+
results := c.psaEvaluator.EvaluatePod(
110+
psapi.LevelVersion{Level: enforcementLevel, Version: enforcementVersion},
111+
&pod.ObjectMeta,
112+
&pod.Spec,
113+
)
114+
115+
klog.InfoS("PSA evaluation results",
116+
"namespace", ns.Name, "pod", pod.Name, "level", label,
117+
"resultCount", len(results))
118+
119+
for _, result := range results {
120+
klog.InfoS("PSA evaluation result",
121+
"namespace", ns.Name, "pod", pod.Name, "level", label,
122+
"allowed", result.Allowed, "reason", result.ForbiddenReason,
123+
"detail", result.ForbiddenDetail)
124+
if !result.Allowed {
125+
klog.InfoS("User pod violates PSA level",
126+
"namespace", ns.Name, "pod", pod.Name, "level", label)
127+
return true, nil
128+
}
129+
}
130+
}
131+
132+
return false, nil
133+
}

0 commit comments

Comments
 (0)