Skip to content

fix: verify pod ownership before operating on annotated pods#419

Open
ArmandoHerra wants to merge 8 commits intokubernetes-sigs:mainfrom
ArmandoHerra:fix/issue-265-pod-ownership-verification
Open

fix: verify pod ownership before operating on annotated pods#419
ArmandoHerra wants to merge 8 commits intokubernetes-sigs:mainfrom
ArmandoHerra:fix/issue-265-pod-ownership-verification

Conversation

@ArmandoHerra
Copy link

Summary

Fixes #265: Privilege escalation via agents.x-k8s.io/pod-name annotation.

  • Adds checkPodOwnership() helper that validates the pod's controllerRef UID against the Sandbox UID before any pod operation
  • Pods owned by a different controller are rejected
  • Pods with no controllerRef are only adopted (not deleted)
  • Only pods with a controllerRef UID matching the requesting Sandbox can be deleted
  • Warm pool adoption flow is fully preserved
  • Adds 4 new test cases covering all ownership verification scenarios
  • Updates 3 existing tests to reflect new security behavior

Follows maintainer guidance from @janetkuo: the annotation cannot be the sole source of truth; controllerRef must always be checked.

Test Plan

  • make build succeeds
  • make lint-go passes
  • make test-unit passes (12/12 TestReconcilePod subtests)
  • Manual attack reproduction on Kind cluster — victim pod survives

@netlify
Copy link

netlify bot commented Mar 15, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 985a58d
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69bcbba352e7b2000851e718

@k8s-ci-robot
Copy link
Contributor

Welcome @ArmandoHerra!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 15, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @ArmandoHerra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 15, 2026
@vicentefb
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 16, 2026
if pod.Labels == nil {
pod.Labels = make(map[string]string)
}
pod.Labels[sandboxLabel] = nameHash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider applying the label after the switch statement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment addressed, please re-review

"Owner.Kind", controllerRef.Kind, "Owner.Name", controllerRef.Name)

if _, exists := sandbox.Annotations[SandboxPodNameAnnotation]; exists {
patch := client.MergeFrom(sandbox.DeepCopy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding an informational log here, e.g., log.Info("Removing pod name annotation from sandbox", "Sandbox.Name", sandbox.Name). This would keep the logging behavior consistent with how the annotation is removed in the replicas=0 path above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment addressed, please re-review

wantSandboxAnnotations: map[string]string{"other-annotation": "keep-me"},
},
{
name: "refuses to adopt annotated pod owned by a different controller",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test asserts the behavior of refusing to adopt an annotated pod owned by a different controller, we should also verify that the malicious SandboxPodNameAnnotation was successfully stripped.

Do you mind please adding a wantSandboxAnnotations: map[string]string{} to this test case to enforce that the controller correctly cleans up the annotation ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment addressed, please re-review

ArmandoHerra added a commit to ArmandoHerra/agent-sandbox that referenced this pull request Mar 16, 2026
…ntefb

- Deduplicate label init and assignment by moving after switch block
- Add log.Info before annotation removal in podOwnedByOther for consistency
- Assert wantSandboxAnnotations in "refuses to adopt" test case
- Normalize nil/empty annotation map comparison in test harness
@ArmandoHerra
Copy link
Author

Comments were addressed, @vicentefb. Please re-review and let me know if there is anything else to fix.

Thanks!


// checkPodOwnership determines whether a Pod is owned by the given Sandbox,
// has no controller, or is owned by a different controller.
func checkPodOwnership(pod *corev1.Pod, sandbox *sandboxv1alpha1.Sandbox) podOwnership {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this method...

In handleSandboxExpiry(), the controller issues a Delete request for a pod and service named sandbox.Name without verifying ownership. Someone could name their sandbox after a "victim" pod and set ShutdownTime to trigger an immediate unauthorized deletion.

WDYT of changing the signature to func checkOwnership(obj client.Object, sandbox *sandboxv1alpha1.Sandbox) resourceOwnership so you can reuse this verification logic in handleSandboxExpiry() to safely check ownership of both the pod and service before issuing any Delete requests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, @vicentefb. I agree completely, and I think this is the right call.

The current handleSandboxExpiry() blindly deletes by name, which means someone could create a Sandbox named after a victim pod, set an expired ShutdownTime, and trigger an unauthorized deletion on the next reconcile. That's a real possible escalation vector in multi-tenant namespaces.

Here's what I'm planning:

  1. Generalize the ownership check

Rename podOwnershipresourceOwnership and change the signature to:

func checkOwnership(obj client.Object, sandbox *sandboxv1alpha1.Sandbox) (resourceOwnership, *metav1.OwnerReference)

Using client.Object instead of *corev1.Pod lets us reuse this for both Pods and Services. Returning the *metav1.OwnerReference as the second value also addresses your optimization comments we can use the returned controllerRef directly in the podOwnedByOther log messages instead of calling metav1.GetControllerOf() again (covers both the replicas=0 path and the adoption path).

  1. Harden handleSandboxExpiry()

Before each r.Delete(), do an r.Get() on the live object, run checkOwnership(), and only proceed if resourceOwnedBySandbox. For unowned or foreign-owned resources, log a warning and skip the deletion.

Same pattern for the Service deletion.

  1. Tests

I'll add expiry-specific test cases covering the attack scenario (foreign-owned pod survives expiry), unowned resources, and the happy path. The existing expiry tests will need OwnerReferences added to their initialObjs so they still pass under the stricter ownership check.

I will push a follow-up commit shortly.

log.Info("Refusing to delete pod: pod has no controllerRef pointing to this sandbox",
"Pod.Name", pod.Name, "Sandbox.Name", sandbox.Name)
case podOwnedByOther:
controllerRef := metav1.GetControllerOf(pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization: metav1.GetControllerOf(pod) is re-fetched here after checkPodOwnership() already extracted it. You can update checkPodOwnership() to return (podOwnership, *metav1.OwnerReference) to supply the controllerRef directly, avoiding redundant lookups.

ownership := checkPodOwnership(pod, sandbox)
switch ownership {
case podOwnedByOther:
controllerRef := metav1.GetControllerOf(pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization: Similar to the comment above, metav1.GetControllerOf(pod) is invoked again. Returning the controllerRef directly from the checkPodOwnership() helper would streamline these paths.

Copy link
Member

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArmandoHerra i just left a couple of optimization/perf comments, happy to lgtm afterwards

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 16, 2026
@ArmandoHerra
Copy link
Author

Comments addressed @vicentefb, let me know if any other concerns or optimizations come to your mind.

Thanks!

},
// Delete pod only if owned by this sandbox
pod := &corev1.Pod{}
if err := r.Get(ctx, types.NamespacedName{Name: sandbox.Name, Namespace: sandbox.Namespace}, pod); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pod lookup strictly uses sandbox.Name here, however, if the sandbox had successfully adopted a warm pool pod, its name would be tracked via the agents.x-k8s.io/pod-name annotation. Because this annotation is not evaluated here, the adopted pod will not be correctly deleted on expiry when ShutdownPolicy is Retain (leading to a compute resource leak).

Try extracting the pod name resolution logic into a shared helper function (e.g. resolvePodName(sandbox) string) and use it here

@ArmandoHerra
Copy link
Author

Hey @vicentefb — I just pushed the changes addressing your latest round of feedback. Here's a summary of everything in the last two commits.

  1. resolvePodName() helper — warm pool pod leak fix

You caught that handleSandboxExpiry() was hardcoding sandbox.Name for the pod lookup, which completely misses adopted warm-pool pods tracked via the agents.x-k8s.io/pod-name annotation.

Good catch, that's a silent compute resource leak when ShutdownPolicy is Retain.

I extracted the pod name resolution logic into a shared resolvePodName(sandbox) string helper and wired it into both reconcilePod() (replacing the inline resolution) and handleSandboxExpiry(). It's a pure function that returns the annotation value if present and non-empty, otherwise falls back to sandbox.Name.

I added TestResolvePodName covering all edge cases (nil annotations, missing key, empty value, valid warm pool name) plus a full integration test in TestReconcile that verifies an adopted pod named warmpool-abc-xyz is correctly deleted on
expiry.

  1. reconcileService() ownership verification — CWE-863 gap closure

While working through your feedback, I did a security pass across all the resource reconciliation paths to make sure we weren't leaving any other confused-deputy gaps open. Turns out reconcileService() had the same pattern we fixed in reconcilePod(), it finds an existing service by sandbox.Name and uses it directly without verifying ownership.

The deletion path in handleSandboxExpiry() was already protected (we added checkOwnership() there in the previous commit), but the reconciliation path wasn't.

The attack surface is narrower than the pod case since it doesn't lead to unauthorized deletion, but it does allow a sandbox to claim a foreign service in its status (.status.service,.status.serviceFQDN), which could misdirect traffic.

Same CWE-863 class as Issue #265.

The fix mirrors reconcilePod() exactly:

  • resourceOwnedBySandbox → proceed as before
  • resourceUnowned → adopt via SetControllerReference() + Update()
  • resourceOwnedByOther → log warning with owner details, return error

Added TestReconcileService with 4 cases covering all three ownership states plus the create-new-service happy path.

All 36 tests pass, build and vet clean.

With these two commits, every resource path in the controller (reconcilePod, reconcileService, handleSandboxExpiry) now has consistent ownership verification via checkOwnership().

service.Name, controllerRef.Kind, controllerRef.Name, sandbox.Name)

case resourceUnowned:
log.Info("Adopting unowned service", "Service.Name", service.Name, "Sandbox.Name", sandbox.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if i'm overthinking this but when the service adoption happens we aren't verifying the selector or ports... should we enforce the intended spec so that it doesn't lead to "hijacks" to the sandbox's traffic ?

case resourceOwnedByOther:
log.Info("Refusing to delete pod: pod is owned by a different controller",
"Pod.Name", pod.Name, "Sandbox.Name", sandbox.Name,
"Owner.Kind", controllerRef.Kind, "Owner.Name", controllerRef.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log controllerRef.UID here as well ?

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 17, 2026
Copy link
Member

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2026
@vicentefb
Copy link
Member

/assign @janetkuo

@ArmandoHerra
Copy link
Author

Hey @vicentefb I pushed two more commits addressing the remaining feedback and a proactive security hardeningg.

  1. Service adoption spec enforcement + Owner.UID logging

You raised a good point about the service adoption path not verifying the selector or ports. I agree, adopting a service as-is without enforcing the intended spec could lead to traffic being routed somewhere unexpected.

The resourceUnowned adoption case now does three things before completing the adoption:

  • Validates that ClusterIP is "None" (or empty). Since ClusterIP is immutable after creation, a non-headless service simply can't be adopted, we refuse it with a clear error rather than silently accepting a service that will never behave correctly.
  • Overwrites Spec.Selector to {sandboxLabel: nameHash} so the service always points at the sandbox's pod, not whatever the pre-existing service was selecting.
  • Enforces Labels[sandboxLabel] = nameHash for consistency with the create path.

Also applied your suggestion to log controllerRef.UID, added it to all six resourceOwnedByOther log messages and both ownership error returns across reconcilePod, reconcileService, and
handleSandboxExpiry. Should make debugging ownership conflicts significantly easier in production.

  1. PVC ownership verification, closing the last CWE-863 gap

While performing another security pass across the controller to ensure all resource paths had consistent ownership checks, I found that reconcilePVCs() was the last holdout. When it found an existing PVC matching the computed name (<template-name>-<sandbox-name>), it just did continue without any ownership verification.

The naming scheme is deterministic, so an attacker with namespace access could pre-create a PVC with the right name before the sandbox controller runs. The sandbox pod would then mount attacker-controlled storage, opening the door for data injection (pre-populate the volume with malicious configs or binaries the workload trusts) or data exfiltration (read back whatever the sandbox writes).

Same confused deputy class as the original Issue #265, just applied to PVCs instead of Pods. The fix follows the exact same checkOwnership() pattern:

  • resourceOwnedBySandbox → continue as before
  • resourceUnowned → adopt via SetControllerReference() + Update()
  • resourceOwnedByOther → refuse with error, log Owner.Kind, Owner.Name, Owner.UID

Added TestReconcilePVCs with four cases covering all ownership states.

@k8s-ci-robot
Copy link
Contributor

@ArmandoHerra: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmit-agent-sandbox-lint-api 3d2418c link true /test presubmit-agent-sandbox-lint-api

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
The Sandbox controller's reconcilePod() used the user-controlled agents.x-k8s.io/pod-name annotation as the sole source of truth for pod operations.

This allowed privilege escalation where a malicious Sandbox could delete or adopt any pod in the namespace.

Add checkPodOwnership() helper that validates the pod's controllerRef UID against the Sandbox UID before any pod operation. Pods owned by different controller are rejected. Pods with no controllerRef are only adopted (not deleted).

Only pods with a controllerRef UID matching the requesting Sandbox can be deleted.
Generated during e2e test runs via `make test-e2e` and Kind cluster deployments. These directories contain test logs, pod dumps, and Python virtual environments that should not be tracked.
…ntefb

- Deduplicate label init and assignment by moving after switch block
- Add log.Info before annotation removal in podOwnedByOther for consistency
- Assert wantSandboxAnnotations in "refuses to adopt" test case
- Normalize nil/empty annotation map comparison in test harness
- Rename podOwnership → resourceOwnership and generalize
  checkPodOwnership() → checkOwnership(client.Object) returning
  (resourceOwnership, *metav1.OwnerReference) to eliminate redundant
  GetControllerOf() calls in reconcilePod()
- Harden handleSandboxExpiry() to Get + checkOwnership before every
  Delete, preventing unauthorized deletion of pods/services not owned
  by the sandbox
- Add TestCheckOwnership covering Pod and Service in all ownership
  states, 3 new expiry attack-prevention test cases, and
  wantSurvivingObjs field in TestReconcile
- Add resolvePodName() that resolves pod name from the
  agents.x-k8s.io/pod-name annotation, falling back to sandbox.Name
- Fix handleSandboxExpiry() to use resolvePodName() instead of
  hard-coded sandbox.Name, preventing adopted warm pool pods from
  leaking when ShutdownPolicy is Retain
- Refactor reconcilePod() to use the shared helper
- Add TestResolvePodName and warm pool expiry integration test
- Add checkOwnership() call in reconcileService() when an existing
  service is found, mirroring the pattern in reconcilePod()
- Reject services owned by a different controller with logged warning
- Adopt unowned services via SetControllerReference + Update
- Add TestReconcileService with 4 table-driven test cases
…logs

- Validate ClusterIP is headless before adopting unowned services,
  refuse with error if immutable ClusterIP mismatches
- Enforce sandbox label and selector on adopted services to prevent
  traffic hijack
- Add Owner.UID to all resourceOwnedByOther log.Info and fmt.Errorf
  messages across reconcilePod, reconcileService, and
  handleSandboxExpiry for forensic debuggability
- Add checkOwnership() call in reconcilePVCs() when an existing PVC
  is found, closing the last confused deputy gap
- Reject PVCs owned by a different controller with Owner.UID logging
- Adopt unowned PVCs via SetControllerReference + Update
- Add TestReconcilePVCs with 4 table-driven test cases
- All three resource types (Pod, Service, PVC) now have consistent
  ownership verification
@ArmandoHerra ArmandoHerra force-pushed the fix/issue-265-pod-ownership-verification branch from 3d2418c to 985a58d Compare March 20, 2026 03:14
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2026
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArmandoHerra, vicentefb
Once this PR has been reviewed and has the lgtm label, please ask for approval from janetkuo. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
@ArmandoHerra
Copy link
Author

@vicentefb Rebased onto current main to pick up the 30 commits that landed since this branch was created, notably PR #395 (warm pool refactor) and PR #438 (kubeapilinter fix).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@codebot-robot codebot-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the PR looks excellent. It systematically addresses the vulnerability by introducing centralized ownership verification (checkOwnership), ensuring that pods, services, and PVCs are only deleted or adopted when appropriate. The modifications to handleSandboxExpiry perfectly prevent unauthorized deletions when an expired Sandbox points to a foreign Pod.

I've left several comments focusing primarily on observability (logging), minor edge cases (like trailing spaces, handling of empty uids, and mismatched service/PVC specs during adoption), and potential test suite enhancements. The core logic is sound and heavily fortified.

(This review was generated by Overseer)


const (
// resourceOwnedBySandbox indicates the resource's controllerRef points to this Sandbox.
resourceOwnedBySandbox resourceOwnership = iota

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider initializing the first enum value (e.g., resourceOwnershipUnknown = iota) or start resourceOwnedBySandbox at 1 to prevent an uninitialized variable from defaulting to a valid state.

if controllerRef == nil {
return resourceUnowned, nil
}
if controllerRef.UID == sandbox.UID {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For robustness, consider adding a check to ensure sandbox.UID != "" to prevent mistakenly matching an empty controllerRef.UID in edge cases where the sandbox is malformed.

// If the sandbox has adopted a warm pool pod, the pod name is tracked in the
// agents.x-k8s.io/pod-name annotation and may differ from sandbox.Name.
func resolvePodName(sandbox *sandboxv1alpha1.Sandbox) string {
if name, ok := sandbox.Annotations[SandboxPodNameAnnotation]; ok && name != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using strings.TrimSpace(name) might be helpful to safeguard against trailing whitespace if the annotation was set manually.


case resourceUnowned:
// ClusterIP is immutable — refuse adoption if the service is not headless.
if service.Spec.ClusterIP != corev1.ClusterIPNone && service.Spec.ClusterIP != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great defensive check. An existing unowned Service might have had its IP assigned by the API server if it wasn't explicitly set to None upon creation.

log.Info("Adopting unowned service", "Service.Name", service.Name, "Sandbox.Name", sandbox.Name)

// Enforce intended labels and selector to prevent traffic hijack.
if service.Labels == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the unowned Service was created maliciously or accidentally by a user, it might contain unexpected labels or annotations. Consider explicitly clearing them or creating a fresh map containing only the sandboxLabel.

allErrors = errors.Join(allErrors, fmt.Errorf("failed to delete pod: %w", err))
}
case resourceUnowned:
log.Info("Skipping pod deletion during expiry: pod has no controllerRef pointing to this sandbox",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping pod deletion here safely neutralizes the unauthorized deletion vector. As a defense-in-depth measure, consider also stripping the malicious pod annotation here so the Sandbox object state is fully sanitized before it enters the Expired phase.

// Check if there's an annotation with a non-empty value
if annotatedPod, exists := tc.sandbox.Annotations[SandboxPodNameAnnotation]; exists && annotatedPod != "" {
podName = annotatedPod
if tc.wantPodSurvives != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great, robust assertion. To be completely thorough, you could also assert that livePod.GetOwnerReferences() matches its initial state, proving the Sandbox controller didn't accidentally adopt it.

wantStatusServiceFQDN: sandboxName + "." + sandboxNs + ".svc.cluster.local",
},
{
name: "uses existing service owned by this sandbox",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal to provide wantService for this test case. Even though the controller just returns the existing service, asserting its full state ensures that properties like labels and selector haven't been inadvertently clobbered.

}
}

func TestCheckOwnership(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is solid. You might want to include an edge case where the sandbox.UID is "" or controllerRef.UID is "" to ensure checkOwnership handles malformed inputs gracefully.

}
}

func TestReconcilePVCs(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does a fantastic job covering all PVC ownership paths. For maximum coverage, consider adding a test case containing multiple VolumeClaimTemplates, where one PVC is already owned and the next is unowned, validating that the loop properly evaluates each PVC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Privilege Escalation in agent-sandbox controller via pod-name annotation leads to unauthorized Pod deletion

5 participants