Bug: SandboxClaim Never Enters a Failure State

When a SandboxClaim cannot successfully provision a Sandbox or Pod, the controller continues to reconcile the resource indefinitely. In large-scale environments, this results in:

- Stale SandboxClaims accumulate in etcd and consume kube-apiserver bandwidth.
- There is no clear way to distinguish between a transient error and a persistent failure without a time-bound threshold.
- Users must manually identify and delete reconciling claims that have stalled.

### Proposed Solution:

Introduce a timeout mechanism for the `SandboxClaim` lifecycle:

- Add a configurable timeout after which a claim is considered failed.
- If the Sandbox and Pod are not ready within the timeout period, the controller should take action to stop the reconciliation loop.

### Open Questions:

#### Possible options for how to configure a timeout:

- Controller Flag: A global timeout applied to all SandboxClaims managed by the controller. This is simpler to implement and manage globally but lacks flexibility for specific workloads.
- CRD Field: Adding a timeout field to the SandboxClaim specification. This allows users to define custom timeouts per request, which may be useful for workloads with varying startup characteristics.

#### Possible options for desired behavior once a timeout is reached:

- Hard Deletion: The controller deletes the Sandbox, Pod, and the SandboxClaim to immediately free up resources.
- Failed Status: The controller retains the SandboxClaim but updates its status to Failed to provide a clear signal for debugging and observability.
- Failed Status to Deletion: Same as the "Failed Status", but the SandboxClaim is deleted after X period of time in a Failed status.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: SandboxClaim Never Enters a Failure State #271

Proposed Solution:

Open Questions:

Possible options for how to configure a timeout:

Possible options for desired behavior once a timeout is reached:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: SandboxClaim Never Enters a Failure State #271

Description

Proposed Solution:

Open Questions:

Possible options for how to configure a timeout:

Possible options for desired behavior once a timeout is reached:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions