Skip to content

[BUG] incomplete container couldn't be marked as failed if CRR execution time exceeds the setting of "activeDeadlineSeconds" #2353

@davidlivekoo2012

Description

@davidlivekoo2012

What happened:
in order to test the activeDeadlineSeconds setting, we designed a case. In that case,

  1. we set the CRR as below:

_root@cloud:/usr/lib/go-1.13/src/k8s.io/kruise-test# cat crr.yaml
apiVersion: apps.kruise.io/v1alpha1
kind: ContainerRecreateRequest
metadata:
namespace: default #和pod namespace相同
name: crr-test
spec:
podName: test-client #必须
containers: # 要重建的容器名字列表,至少要有 1 个

  • name: app
  • name: sidecar
    strategy:
    failurePolicy: Fail # 'Fail' 或 'Ignore',表示一旦有某个容器停止或重建失败, CRR 立即结束
    orderedRecreate: false # 'true' 表示要等前一个容器重建完成了,再开始重建下一个
    terminationGracePeriodSeconds: 30 # 等待容器优雅退出的时间,不填默认用 Pod 中定义的
    unreadyGracePeriodSeconds: 3 # 在重建之前先把 Pod 设为 not ready,并等待这段时间后再开始执行重建
    minStartedSeconds: 10 # 重建后新容器至少保持运行这段时间,才认为该容器重建成功
    activeDeadlineSeconds: 10 # 如果 执行超过这个时间,则直接标记为结束(未结束的容器标记为失败)
    ttlSecondsAfterFinished: 300 # 结束后,过了这段时间自动被删除掉_
  1. we created a two-containers pod (app and sidecar). the yaml file as below:
    _root@cloud:/usr/lib/go-1.13/src/k8s.io/kruise-test# cat test-pod.yaml
    apiVersion: v1
    kind: Pod
    metadata:
    name: test-client
    namespace: default
    spec:
    containers:
  • name: app
    image: docker.io/library/alpine:3.18
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: ["while true; do sleep 3600; done"]

  • name: sidecar
    image: docker.io/library/alpine:3.18
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: ["while true; do sleep 3600; done"]_

  1. we created crr and watch the status of the target pod:
    watch -d kubectl describe po test-client

  2. at the same time, in another console, we watch the status of crr
    watch -d kubectl describe containerrecreaterequests.apps.kruise.io crr-test

What you expected to happen:
we expected the restart container timeout and crr operated as the description in below link:
https://openkruise.io/zh/docs/user-manuals/containerrecreaterequest

while we got below result:
_Every 2.0s: kubectl describe containerrecreaterequest.apps.kruise.io crr-test cloud: Thu Jan 29 12:17:35 2026

Name: crr-test
Namespace: default
Labels: crr.apps.kruise.io/node-name=edge
crr.apps.kruise.io/pod-uid=625e9074-d858-4e0c-8e4b-babfa7d1b2ef
Annotations: crr.apps.kruise.io/sync-container-statuses: []
crr.apps.kruise.io/unready-acquired: 2026-01-29T12:15:55Z
API Version: apps.kruise.io/v1alpha1
Kind: ContainerRecreateRequest
Metadata:
Creation Timestamp: 2026-01-29T12:15:55Z
Generation: 1
Resource Version: 1439466
UID: 0dc8c27f-7745-4925-8484-507a4eb7534a
Spec:
Active Deadline Seconds: 10
Containers:
Name: app
Status Context:
Container ID: containerd://13919cb87fc455e93ecb9aee6e3d195f9b8bae9f337b2a13d9743290549228f3
Restart Count: 1
Name: sidecar
Status Context:
Container ID: containerd://faaedee2a5ffb33fa9b02960a69d13e7e5bbcea51ea08330f501bdf5ad2a1dec
Restart Count: 0
Pod Name: test-client
Strategy:
Failure Policy: Fail
Min Started Seconds: 10
Termination Grace Period Seconds: 30
Unready Grace Period Seconds: 3
Ttl Seconds After Finished: 300
Status:
Completion Time: 2026-01-29T12:16:05Z
Container Recreate States:
Is Killed: true
Name: app
Phase: Recreating
Name: sidecar
Phase: Pending
Message: recreating has exceeded the activeDeadlineSeconds
Phase: Completed
Events:
_

and container status is error:

_root@cloud:/usr/lib/go-1.13/src/k8s.io/kruise-test# watch -d kubectl describe po test-client

Every 2.0s: kubectl describe po test-client cloud: Thu Jan 29 12:20:19 2026

Name: test-client
Namespace: default
Priority: 0
Service Account: default
Node: edge/10.0.3.17
Start Time: Thu, 29 Jan 2026 08:36:28 +0000
Labels:
Annotations:
Status: Running
IP: 10.244.1.164
IPs:
IP: 10.244.1.164
Containers:
app:
Container ID: containerd://142abe729aac70806e19993976e28abedcf0763fa940f54c9ed85fda3cb3951f
Image: docker.io/library/alpine:3.18
Image ID: docker.io/library/alpine@sha256:de0eb0b3f2a47ba1eb89389859a9bd88b28e82f5826b6969ad604979713c2d4f
Port:
Host Port:
Command:
/bin/sh
-c
Args:
while true; do sleep 3600; done
State: Running
Started: Thu, 29 Jan 2026 12:16:28 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 29 Jan 2026 11:38:37 +0000
Finished: Thu, 29 Jan 2026 12:16:28 +0000
Ready: True
Restart Count: 2
Environment:

Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gpvgv (ro)
sidecar:
Container ID: containerd://faaedee2a5ffb33fa9b02960a69d13e7e5bbcea51ea08330f501bdf5ad2a1dec
Image: docker.io/library/alpine:3.18
Image ID: docker.io/library/alpine@sha256:de0eb0b3f2a47ba1eb89389859a9bd88b28e82f5826b6969ad604979713c2d4f
Port:
Host Port:
Command:
/bin/sh
-c
Args:
while true; do sleep 3600; done
State: Running
Started: Thu, 29 Jan 2026 08:36:29 +0000
Ready: True

Restart Count: 0
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gpvgv (ro)
Conditions:
Type Status_

and the restart time +1.

firstly, does it meet the expectation? if it is a bug, could you provide a hot fix. thanks.

How to reproduce it (as minimally and precisely as possible):
use the spec pasted in the thread and the issue is always reproduced.

Anything else we need to know?:
no

Environment:

  • Kruise version: the commit of e0cc9f3
  • Kubernetes version (use kubectl version): v1.32.9
  • Install details (e.g. helm install args):
  • Others:

Metadata

Metadata

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions