Skip to content

Tracking time when job fails, or pod terminates after failureΒ #2722

@sleepyfoodie

Description

@sleepyfoodie

What would you like to be added:
Thanks for taking look at this. Apologies if this already exists. I tried using all of these metrics to find the end time for failed jobs. But it doesn't look like the time when a job fails gets recorded.

  • kube_pod_completion_time
  • kube_job_status_completion_time
  • kube_pod_container_status_last_terminated_timestamp

This is the cronjob I was running for testing:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: failing-random-half
spec:
  schedule: "*/2 * * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: failing-random-half
            image: busybox:1.28
            imagePullPolicy: IfNotPresent
            command: ["/bin/sh", "-c", "x=$((RANDOM % 2)); echo $x; echo 'this job fails if it returns 1'; echo 'sleeping for 60-120 seconds'; echo 'exit code:', $x, 'πŸƒβ€β™€οΈβ€βž‘οΈ'; echo 'πŸ’€'; echo 'job namespace: default'; sleep $((60 + $((RANDOM % 60)))); exit $x"]
          restartPolicy: Never

Why is this needed:
I'm working on a dashboard that tracks failed jobs.

Describe the solution you'd like
I'm looking for the AGE for the STATUS Failed ones, from the kubectl get jobs command, which returns this:

NAME                           STATUS     COMPLETIONS   DURATION   AGE
failing-random-half-29241256   Failed     0/1           158m       158m
failing-random-half-29241408   Complete   1/1           5m14s      6m59s

or AGE from ubectl get pods -A for the ones with STATUS Error

NAMESPACE         NAME                                                         READY   STATUS      RESTARTS        AGE
default           failing-random-half-29241256-2n2lg                           0/1     Error       0               140m
default           failing-random-half-29241256-5ntbj                           0/1     Error       0               154m
default           failing-random-half-29241256-9sk6g                           0/1     Error       0               157m
default           failing-random-half-29241256-h6tzh                           0/1     Error       0               147m
default           failing-random-half-29241256-jgpz9                           0/1     Error       0               156m

or the Finished time like Wed, 06 Aug 2025 06:37:46 -0400 from this command
kubectl describe pod failing-random-half-29241256-2n2lg

which returns this:

    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      x=$((RANDOM % 2)); echo $x; echo 'this job fails if it returns 1'; echo 'sleeping for 60-120 seconds'; echo 'exit code:', $x, 'πŸƒβ€β™€οΈβ€βž‘οΈ'; echo 'πŸ’€'; echo 'job namespace: default'; sleep $((60 + $((RANDOM % 60)))); exit $x
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 06 Aug 2025 06:35:48 -0400
      Finished:     Wed, 06 Aug 2025 06:37:46 -0400
    Ready:          False
    Restart Count:  0
    Environment:    <none>

Additional context

Metadata

Metadata

Assignees

Labels

kind/supportCategorizes issue or PR as a support question.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions