Skip to content

Commit 5887977

Browse files
clarify MigrateJobLegacyTracking
1 parent 7823232 commit 5887977

File tree

1 file changed

+20
-14
lines changed
  • keps/sig-apps/2307-job-tracking-without-lingering-pods

1 file changed

+20
-14
lines changed

keps/sig-apps/2307-job-tracking-without-lingering-pods/README.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -321,21 +321,27 @@ the owner reference.
321321

322322
## Monitoring Pods with finalizers
323323

324-
Starting in 1.26, the metric `job_pod_tracking_finalizer` is a gauge that
325-
tracks the number of pods that currently have a job tracking finalizer.
324+
Starting in 1.26, the metric `job_terminated_pod_tracking_finalizer` is a gauge
325+
that tracks the number of terminated pods (`.status.phase=(Succeeded|Failed)`)
326+
that currently have a job tracking finalizer.
326327

327-
The metric increments when the job controller observes a pod created or adopted,
328-
and decrements when the job controller observes an update that removes the
329-
finalizer or a pod deletion.
328+
The job controller tracks this metric in its event handlers.
330329

331330
## Migrating Jobs with legacy tracking
332331

333-
Starting in 1.26, when the feature gate `MigrateJobLegacyTracking` is enabled,
334-
the job controller migrates jobs with legacy tracking to tracking with finalizers.
332+
Once `JobTrackingWithFinalizers` graduates to stable, Jobs that start in a
333+
kubernetes version where `JobTrackingWithFinalizer` is disabled need to be
334+
migrated to the new tracking. This migration mechanism will be initially guarded
335+
by the feature gate `MigrateJobLegacyTracking`, starting in 1.26,
336+
enabled by default.
337+
338+
When the feature gate `MigrateJobLegacyTracking` is enabled, the job controller
339+
migrates jobs with legacy tracking to tracking with finalizers as described
340+
below:
335341

336342
If a Job doesn't have the annotation `batch.kubernetes.io/job-completion`, it
337-
means that is not currently tracked with finalizers. The job controller starts
338-
the following migration process:
343+
means that the Job is not currently tracked with finalizers. The job controller
344+
starts the following migration process:
339345
1. Add the finalizer `batch.kubernetes.io/job-completion` to all pods with
340346
`.status.phase=(Pending/Running)`.
341347
2. Ignore pods with `.status.phase=(Complete/Failed)` that don't have the `batch.kubernetes.io/job-completion`.
@@ -349,7 +355,7 @@ This might lead to extra pods being created, but this is acceptable because:
349355
- For the remaining Jobs, the Job controller already accounted most of the
350356
finished Pods in the status. The controller might leave some
351357
finished Pods unaccounted, if they finish before the controller has a chance
352-
to add a finalizer. This situation is no worse that the legacy tracking
358+
to add a finalizer. This situation is no worse than the legacy tracking
353359
were the controller doesn't account for Pods removed by garbage collection or
354360
other means.
355361

@@ -427,7 +433,7 @@ for jobs with multiple sizes.
427433
#### Beta -> GA Graduation
428434

429435
- [Migrate existing Jobs to tracking with finalizers](#migrating-jobs-with-legacy-tracking)
430-
under feature gate `MigrateJobLegacyTracking`, disabled by default.
436+
under feature gate `MigrateJobLegacyTracking`, enabled by default.
431437
- Job E2E tests graduate to conformance.
432438
- Job tracking scales to 10^5 completions per Job processed within an order of
433439
minutes.
@@ -539,7 +545,7 @@ No implications to node runtime.
539545
duration than previous versions of the job controller due to the new API
540546
calls.
541547
- Stale `job_sync_total` or `job_finished_total`.
542-
- The metric `job_pod_tracking_finalizer` doesn't decrease when pods finish.
548+
- The metric `job_terminated_pod_tracking_finalizer` increases steadily.
543549

544550
#### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
545551

@@ -598,7 +604,7 @@ Yes, see [Deprecation](#deprecation) for the full plan.
598604
- Metric name: `job_sync_duration_seconds`
599605
- [Optional] Aggregation method:
600606
- Components exposing the metric: `kube-controller-manager`
601-
- Metric name: `job_pod_tracking_finalizer`
607+
- Metric name: `job_terminated_pod_tracking_finalizer`
602608
- [Optional] Aggregation method:
603609
- Components exposing the metric: `kube-controller-manager`
604610

@@ -668,7 +674,7 @@ Yes, see [Deprecation](#deprecation) for the full plan.
668674
- Terminated pods are stuck with finalizers
669675
- Detection:
670676
- Before 1.26: Observe the behavior in pods.
671-
- After 1.26: Based on metric `job_pod_tracking_finalizer`
677+
- After 1.26: Based on metric `job_terminated_pod_tracking_finalizer`
672678
- Mitigations:
673679
Before 1.26, disable `JobTrackingWithFinalizers`.
674680
- Diagnostics:

0 commit comments

Comments
 (0)