Skip to content

Commit 393816b

Browse files
authored
Merge pull request #27198 from alaypatel07/blog-cronjob-ga
Blog: CronJob reaches GA
2 parents 7f48e95 + 47df9fd commit 393816b

File tree

3 files changed

+107
-0
lines changed

3 files changed

+107
-0
lines changed

content/en/blog/_posts/2021-04-08-cronjob-reaches-ga/controller-flowchart.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
layout: blog
3+
title: 'Kubernetes 1.21: CronJob Reaches GA'
4+
date: 2021-04-09
5+
slug: kubernetes-release-1.21-cronjob-ga
6+
---
7+
8+
**Authors:** Alay Patel (Red Hat), and Maciej Szulik (Red Hat)
9+
10+
In Kubernetes v1.21, the
11+
[CronJob](/docs/concepts/workloads/controllers/cron-jobs/) resource
12+
reached general availability (GA). We've also substantially improved the
13+
performance of CronJobs since Kubernetes v1.19, by implementing a new
14+
controller.
15+
16+
In Kubernetes v1.20 we launched a revised v2 controller for CronJobs,
17+
initially as an alpha feature. Kubernetes 1.21 uses the newer controller by
18+
default, and the CronJob resource itself is now GA (group version: `batch/v1`).
19+
20+
In this article, we'll take you through the driving forces behind this new
21+
development, give you a brief description of controller design for core
22+
Kubernetes, and we'll outline what you will gain from this improved controller.
23+
24+
The driving force behind promoting the API was Kubernetes' policy choice to
25+
[ensure APIs move beyond beta](/blog/2020/08/21/moving-forward-from-beta/).
26+
That policy aims to prevent APIs from being stuck in a “permanent beta” state.
27+
Over the years the old CronJob controller implementation had received healthy
28+
feedback from the community, with reports of several widely recognized
29+
[issues](https://github.com/kubernetes/kubernetes/issues/82659).
30+
31+
If the beta API for CronJob was to be supported as GA, the existing controller
32+
code would need substantial rework. Instead, the SIG Apps community decided
33+
to introduce a new controller and gradually replace the old one.
34+
35+
## How do controllers work?
36+
37+
Kubernetes [controllers](/docs/concepts/architecture/controller/) are control
38+
loops that watch the state of resource(s) in your cluster, then make or
39+
request changes where needed. Each controller tries to move part of the
40+
current cluster state closer to the desired state.
41+
42+
The v1 CronJob controller works by performing a periodic poll and sweep of all
43+
the CronJob objects in your cluster, in order to act on them. It is a single
44+
worker implementation that gets all CronJobs every 10 seconds, iterates over
45+
each one of them, and syncs them to their desired state. This was the default
46+
way of doing things almost 5 years ago when the controller was initially
47+
written. In hindsight, we can certainly say that such an approach can
48+
overload the API server at scale.
49+
50+
These days, every core controller in kubernetes must follow the guidelines
51+
described in [Writing Controllers](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md#readme).
52+
Among many details, that document prescribes using
53+
[shared informers](https://www.cncf.io/blog/2019/10/15/extend-kubernetes-via-a-shared-informer/)
54+
to “receive notifications of adds, updates, and deletes for a particular
55+
resource”. Upon any such events, the related object(s) is placed in a queue.
56+
Workers pull items from the queue and process them one at a time. This
57+
approach ensures consistency and scalability.
58+
59+
The picture below shows the flow of information from kubernetes API server,
60+
through shared informers and queue, to the main part of a controller - a
61+
reconciliation loop which is responsible for performing the core functionality.
62+
63+
![Controller flowchart](controller-flowchart.svg)
64+
65+
The CronJob controller V2 uses a queue that implements the DelayingInterface to
66+
handle the scheduling aspect. This queue allows processing an element after a
67+
specific time interval. Every time there is a change in a CronJob or its related
68+
Jobs, the key that represents the CronJob is pushed to the queue. The main
69+
handler pops the key, processes the CronJob, and after completion
70+
pushes the key back into the queue for the next scheduled time interval. This is
71+
immediately a more performant implementation, as it no longer requires a linear
72+
scan of all the CronJobs. On top of that, this controller can be scaled by
73+
increasing the number of workers processing the CronJobs in parallel.
74+
75+
## Performance impact of the new controller {#performance-impact}
76+
77+
In order to test the performance difference of the two controllers a VM instance
78+
with 128 GiB RAM and 64 vCPUs was used to set up a single node Kubernetes cluster.
79+
Initially, a sample workload was created with 20 CronJob instances with a schedule
80+
to run every minute, and 2100 CronJobs running every 20 hours. Additionally,
81+
over the next few minutes we added 1000 CronJobs with a schedule to run every
82+
20 hours, until we reached a total of 5120 CronJobs.
83+
84+
![Visualization of performance](performance-impact-graph.svg)
85+
86+
We observed that for every 1000 CronJobs added, the old controller used
87+
around 90 to 120 seconds more wall-clock time to schedule 20 Jobs every cycle.
88+
That is, at 5120 CronJobs, the old controller took approximately 9 minutes
89+
to create 20 Jobs. Hence, during each cycle, about 8 schedules were missed.
90+
The new controller, implemented with architectural change explained above,
91+
created 20 Jobs without any delay, even when we created an additional batch
92+
of 1000 CronJobs reaching a total of 6120.
93+
94+
As a closing remark, the new controller exposes a histogram metric
95+
`cronjob_controller_cronjob_job_creation_skew_duration_seconds` which helps
96+
monitor the time difference between when a CronJob is meant to run and when
97+
the actual Job is created.
98+
99+
Hopefully the above description is a sufficient argument to follow the
100+
guidelines and standards set in the Kubernetes project, even for your own
101+
controllers. As mentioned before, the new controller is on by default starting
102+
from Kubernetes v1.21; if you want to check it out in the previous release (1.20),
103+
you can enable the `CronJobControllerV2`
104+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
105+
for the kube-controller-manger: `--feature-gate="CronJobControllerV2=true"`.

content/en/blog/_posts/2021-04-08-cronjob-reaches-ga/performance-impact-graph.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)