Skip to content

Commit f134c4b

Browse files
authored
Merge branch 'kubernetes:main' into main
2 parents 31c28b5 + f1ddfbf commit f134c4b

File tree

310 files changed

+12043
-1168
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

310 files changed

+12043
-1168
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ non-production-build: module-check ## Build the non-production site, which adds
5656
GOMAXPROCS=1 hugo --cleanDestinationDir --enableGitInfo --environment nonprod
5757

5858
serve: module-check ## Boot the development server.
59-
hugo server --buildFuture --environment development
59+
hugo server --buildDrafts --buildFuture --environment development
6060

6161
docker-image:
6262
@echo -e "$(CCRED)**** The use of docker-image is deprecated. Use container-image instead. ****$(CCEND)"
@@ -107,7 +107,7 @@ container-build: module-check
107107
container-serve: module-check ## Boot the development server using container.
108108
$(CONTAINER_RUN) --cap-drop=ALL --cap-add=AUDIT_WRITE --read-only \
109109
--mount type=tmpfs,destination=/tmp,tmpfs-mode=01777 -p 1313:1313 $(CONTAINER_IMAGE) \
110-
hugo server --buildFuture --environment development --bind 0.0.0.0 --destination /tmp/public --cleanDestinationDir --noBuildLock
110+
hugo server --buildDrafts --buildFuture --environment development --bind 0.0.0.0 --destination /tmp/public --cleanDestinationDir --noBuildLock
111111

112112
test-examples:
113113
scripts/test_examples.sh install

OWNERS_ALIASES

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,9 @@ aliases:
135135
- atoato88
136136
- b1gb4by
137137
- bells17
138+
- inductor
138139
- kakts
140+
- nasa9084
139141
- Okabe-Junya
140142
- t-inu
141143
sig-docs-ko-owners: # Admins for Korean content

content/bn/_index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
{{% blocks/feature image="flower" id="feature-primary" %}}
1313
[কুবারনেটিস]({{< relref "/docs/concepts/overview/" >}}), K8s নামেও পরিচিত, কনটেইনারাইজড অ্যাপ্লিকেশনের স্বয়ংক্রিয় ডিপ্লয়মেন্ট, স্কেলিং এবং পরিচালনার জন্য একটি ওপেন-সোর্স সিস্টেম।
1414

15-
এটি সহজ ব্যবস্থাপনা এবং আবিষ্কারের জন্য লজিক্যাল ইউনিটে একটি অ্যাপ্লিকেশন তৈরি করে এমন কন্টেইনারগুলিকে গোষ্ঠীভুক্ত করে। কুবারনেটিস [Google-এ প্রোডাকশন ওয়ার্কলোড চালানোর 15 বছরের অভিজ্ঞতার ভিত্তিতে](http://queue.acm.org/detail.cfm?id=2898444) তৈরি করে, কমিউনিটির সেরা ধারণা এবং অনুশীলনের সাথে মিলিত ভাবে।
15+
এটি সহজ ব্যবস্থাপনা এবং আবিষ্কারের জন্য লজিক্যাল ইউনিটে একটি অ্যাপ্লিকেশন তৈরি করে এমন কন্টেইনারগুলিকে গোষ্ঠীভুক্ত করে। কুবারনেটিস [Google-এ প্রোডাকশন ওয়ার্কলোড চালানোর 15 বছরের অভিজ্ঞতার ভিত্তিতে](https://queue.acm.org/detail.cfm?id=2898444) তৈরি করে, কমিউনিটির সেরা ধারণা এবং অনুশীলনের সাথে মিলিত ভাবে।
1616
{{% /blocks/feature %}}
1717

1818
{{% blocks/feature image="scalable" %}}

content/en/blog/_posts/2024-12-11-Kubernetes-v1-32-Release/index.md

Lines changed: 505 additions & 0 deletions
Large diffs are not rendered by default.
583 KB
Loading
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes v1.32: QueueingHint Brings a New Possibility to Optimize Pod Scheduling"
4+
date: 2024-12-12
5+
slug: scheduler-queueinghint
6+
Author: >
7+
[Kensei Nakada](https://github.com/sanposhiho) (Tetrate.io)
8+
---
9+
10+
The Kubernetes [scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/) is the core
11+
component that selects the nodes on which new Pods run. The scheduler processes
12+
these new Pods **one by one**. Therefore, the larger your clusters, the more important
13+
the throughput of the scheduler becomes.
14+
15+
Over the years, Kubernetes SIG Scheduling has improved the throughput
16+
of the scheduler in multiple enhancements. This blog post describes a major improvement to the
17+
scheduler in Kubernetes v1.32: a
18+
[scheduling context element](/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points)
19+
named _QueueingHint_. This page provides background knowledge of the scheduler and explains how
20+
QueueingHint improves scheduling throughput.
21+
22+
## Scheduling queue
23+
24+
The scheduler stores all unscheduled Pods in an internal component called the _scheduling queue_.
25+
26+
The scheduling queue consists of the following data structures:
27+
- **ActiveQ**: holds newly created Pods or Pods that are ready to be retried for scheduling.
28+
- **BackoffQ**: holds Pods that are ready to be retried but are waiting for a backoff period to end. The
29+
backoff period depends on the number of unsuccessful scheduling attempts performed by the scheduler on that Pod.
30+
- **Unschedulable Pod Pool**: holds Pods that the scheduler won't attempt to schedule for one of the
31+
following reasons:
32+
- The scheduler previously attempted and was unable to schedule the Pods. Since that attempt, the cluster
33+
hasn't changed in a way that could make those Pods schedulable.
34+
- The Pods are blocked from entering the scheduling cycles by PreEnqueue Plugins,
35+
for example, they have a [scheduling gate](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/#configuring-pod-schedulinggates),
36+
and get blocked by the scheduling gate plugin.
37+
38+
## Scheduling framework and plugins
39+
40+
The Kubernetes scheduler is implemented following the Kubernetes
41+
[scheduling framework](/docs/concepts/scheduling-eviction/scheduling-framework/).
42+
43+
And, all scheduling features are implemented as plugins
44+
(e.g., [Pod affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity)
45+
is implemented in the `InterPodAffinity` plugin.)
46+
47+
The scheduler processes pending Pods in phases called _cycles_ as follows:
48+
1. **Scheduling cycle**: the scheduler takes pending Pods from the activeQ component of the scheduling
49+
queue _one by one_. For each Pod, the scheduler runs the filtering/scoring logic from every scheduling plugin. The
50+
scheduler then decides on the best node for the Pod, or decides that the Pod can't be scheduled at that time.
51+
52+
If the scheduler decides that a Pod can't be scheduled, that Pod enters the Unschedulable Pod Pool
53+
component of the scheduling queue. However, if the scheduler decides to place the Pod on a node,
54+
the Pod goes to the binding cycle.
55+
56+
1. **Binding cycle**: the scheduler communicates the node placement decision to the Kubernetes API
57+
server. This operation bounds the Pod to the selected node.
58+
59+
Aside from some exceptions, most unscheduled Pods enter the unschedulable pod pool after each scheduling
60+
cycle. The Unschedulable Pod Pool component is crucial because of how the scheduling cycle processes Pods one by one. If the scheduler had to constantly retry placing unschedulable Pods, instead of offloading those
61+
Pods to the Unschedulable Pod Pool, multiple scheduling cycles would be wasted on those Pods.
62+
63+
## Improvements to retrying Pod scheduling with QueuingHint
64+
65+
Unschedulable Pods only move back into the ActiveQ or BackoffQ components of the scheduling
66+
queue if changes in the cluster might allow the scheduler to place those Pods on nodes.
67+
68+
Prior to v1.32, each plugin registered which cluster changes could solve their failures, an object creation, update, or deletion in the cluster (called _cluster events_),
69+
with `EnqueueExtensions` (`EventsToRegister`),
70+
and the scheduling queue retries a pod with an event that is registered by a plugin that rejected the pod in a previous scheduling cycle.
71+
72+
Additionally, we had an internal feature called `preCheck`, which helped further filtering of events for efficiency, based on Kubernetes core scheduling constraints;
73+
For example, `preCheck` could filter out node-related events when the node status is `NotReady`.
74+
75+
However, we had two issues for those approaches:
76+
- Requeueing with events was too broad, could lead to scheduling retries for no reason.
77+
- A new scheduled Pod _might_ solve the `InterPodAffinity`'s failure, but not all of them do.
78+
For example, if a new Pod is created, but without a label matching `InterPodAffinity` of the unschedulable pod, the pod wouldn't be schedulable.
79+
- `preCheck` relied on the logic of in-tree plugins and was not extensible to custom plugins,
80+
like in issue [#110175](https://github.com/kubernetes/kubernetes/issues/110175).
81+
82+
Here QueueingHints come into play;
83+
a QueueingHint subscribes to a particular kind of cluster event, and make a decision about whether each incoming event could make the Pod schedulable.
84+
85+
For example, consider a Pod named `pod-a` that has a required Pod affinity. `pod-a` was rejected in
86+
the scheduling cycle by the `InterPodAffinity` plugin because no node had an existing Pod that matched
87+
the Pod affinity specification for `pod-a`.
88+
89+
{{< figure src="queueinghint1.svg" alt="A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin" caption="A diagram showing the scheduling queue and pod-a rejected by InterPodAffinity plugin" >}}
90+
91+
`pod-a` moves into the Unschedulable Pod Pool. The scheduling queue records which plugin caused
92+
the scheduling failure for the Pod. For `pod-a`, the scheduling queue records that the `InterPodAffinity`
93+
plugin rejected the Pod.
94+
95+
`pod-a` will never be schedulable until the InterPodAffinity failure is resolved.
96+
There're some scenarios that the failure could be resolved, one example is an existing running pod gets a label update and becomes matching a Pod affinity.
97+
For this scenario, the `InterPodAffinity` plugin's `QueuingHint` callback function checks every Pod label update that occurs in the cluster.
98+
Then, if a Pod gets a label update that matches the Pod affinity requirement of `pod-a`, the `InterPodAffinity`,
99+
plugin's `QueuingHint` prompts the scheduling queue to move `pod-a` back into the ActiveQ or
100+
the BackoffQ component.
101+
102+
{{< figure src="queueinghint2.svg" alt="A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint" caption="A diagram showing the scheduling queue and pod-a being moved by InterPodAffinity QueueingHint" >}}
103+
104+
## QueueingHint's history and what's new in v1.32
105+
106+
At SIG Scheduling, we have been working on the development of QueueingHint since
107+
Kubernetes v1.28.
108+
109+
While QueuingHint isn't user-facing, we implemented the `SchedulerQueueingHints` feature gate as a
110+
safety measure when we originally added this feature. In v1.28, we implemented QueueingHints with a
111+
few in-tree plugins experimentally, and made the feature gate enabled by default.
112+
113+
However, users reported a memory leak, and consequently we disabled the feature gate in a
114+
patch release of v1.28. From v1.28 until v1.31, we kept working on the QueueingHint implementation
115+
within the rest of the in-tree plugins and fixing bugs.
116+
117+
In v1.32, we made this feature enabled by default again. We finished implementing QueueingHints
118+
in all plugins and also identified the cause of the memory leak!
119+
120+
We thank all the contributors who participated in the development of this feature and those who reported and investigated the earlier issues.
121+
122+
## Getting involved
123+
124+
These features are managed by Kubernetes [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling).
125+
126+
Please join us and share your feedback.
127+
128+
## How can I learn more?
129+
130+
- [KEP-4247: Per-plugin callback functions for efficient requeueing in the scheduling queue](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md)

content/en/blog/_posts/2024-12-12-scheduler-queueinghint/queueinghint1.svg

Lines changed: 4 additions & 0 deletions
Loading

content/en/blog/_posts/2024-12-12-scheduler-queueinghint/queueinghint2.svg

Lines changed: 4 additions & 0 deletions
Loading
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes v1.32: Memory Manager Goes GA"
4+
date: 2024-12-13
5+
slug: memory-manager-goes-ga
6+
author: >
7+
[Talor Itzhak](https://github.com/Tal-or) (Red Hat)
8+
---
9+
10+
With Kubernetes 1.32, the memory manager has officially graduated to General Availability (GA),
11+
marking a significant milestone in the journey toward efficient and predictable memory allocation for containerized applications.
12+
Since Kubernetes v1.22, where it graduated to beta, the memory manager has proved itself reliable, stable and a good complementary feature for the
13+
[CPU Manager](/docs/tasks/administer-cluster/cpu-management-policies/).
14+
15+
As part of kubelet's workload admission process,
16+
the memory manager provides topology hints
17+
to optimize memory allocation and alignment.
18+
This enables users to allocate exclusive
19+
memory for Pods in the [Guaranteed](/docs/concepts/workloads/pods/pod-qos/#guaranteed) QoS class.
20+
More details about the process can be found in the memory manager goes to beta [blog](/blog/2021/08/11/kubernetes-1-22-feature-memory-manager-moves-to-beta/).
21+
22+
Most of the changes introduced since the Beta are bug fixes, internal refactoring and
23+
observability improvements, such as metrics and better logging.
24+
25+
## Observability improvements
26+
27+
As part of the effort
28+
to increase the observability of memory manager, new metrics have been added
29+
to provide some statistics on memory allocation patterns.
30+
31+
32+
* **memory_manager_pinning_requests_total** -
33+
tracks the number of times the pod spec required the memory manager to pin memory pages.
34+
35+
* **memory_manager_pinning_errors_total** -
36+
tracks the number of times the pod spec required the memory manager
37+
to pin memory pages, but the allocation failed.
38+
39+
40+
## Improving memory manager reliability and consistency
41+
42+
The kubelet does not guarantee pod ordering
43+
when admitting pods after a restart or reboot.
44+
45+
In certain edge cases, this behavior could cause
46+
the memory manager to reject some pods,
47+
and in more extreme cases, it may cause kubelet to fail upon restart.
48+
49+
Previously, the beta implementation lacked certain checks and logic to prevent
50+
these issues.
51+
52+
To stabilize the memory manager for general availability (GA) readiness,
53+
small but critical refinements have been
54+
made to the algorithm, improving its robustness and handling of edge cases.
55+
56+
## Future development
57+
58+
There is more to come for the future of Topology Manager in general,
59+
and memory manager in particular.
60+
Notably, ongoing efforts are underway
61+
to extend [memory manager support to Windows](https://github.com/kubernetes/kubernetes/pull/128560),
62+
enabling CPU and memory affinity on a Windows operating system.
63+
64+
## Getting involved
65+
66+
This feature is driven by the [SIG Node](https://github.com/Kubernetes/community/blob/master/sig-node/README.md) community.
67+
Please join us to connect with the community
68+
and share your ideas and feedback around the above feature and
69+
beyond.
70+
We look forward to hearing from you!
71+
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
layout: blog
3+
title: 'Kubernetes v1.32 Adds A New CPU Manager Static Policy Option For Strict CPU Reservation'
4+
date: 2024-12-16
5+
slug: cpumanager-strict-cpu-reservation
6+
author: >
7+
[Jing Zhang](https://github.com/jingczhang) (Nokia)
8+
---
9+
10+
In Kubernetes v1.32, after years of community discussion, we are excited to introduce a
11+
`strict-cpu-reservation` option for the [CPU Manager static policy](/docs/tasks/administer-cluster/cpu-management-policies/#static-policy-options).
12+
This feature is currently in alpha, with the associated policy hidden by default. You can only use the
13+
policy if you explicitly enable the alpha behavior in your cluster.
14+
15+
16+
## Understanding the feature
17+
18+
The CPU Manager static policy is used to reduce latency or improve performance. The `reservedSystemCPUs` defines an explicit CPU set for OS system daemons and kubernetes system daemons. This option is designed for Telco/NFV type use cases where uncontrolled interrupts/timers may impact the workload performance. you can use this option to define the explicit cpuset for the system/kubernetes daemons as well as the interrupts/timers, so the rest CPUs on the system can be used exclusively for workloads, with less impact from uncontrolled interrupts/timers. More details of this parameter can be found on the [Explicitly Reserved CPU List](/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list) page.
19+
20+
If you want to protect your system daemons and interrupt processing, the obvious way is to use the `reservedSystemCPUs` option.
21+
22+
However, until the Kubernetes v1.32 release, this isolation was only implemented for guaranteed
23+
pods that made requests for a whole number of CPUs. At pod admission time, the kubelet only
24+
compares the CPU _requests_ against the allocatable CPUs. In Kubernetes, limits can be higher than
25+
the requests; the previous implementation allowed burstable and best-effort pods to use up
26+
the capacity of `reservedSystemCPUs`, which could then starve host OS services of CPU - and we
27+
know that people saw this in real life deployments.
28+
The existing behavior also made benchmarking (for both infrastructure and workloads) results inaccurate.
29+
30+
When this new `strict-cpu-reservation` policy option is enabled, the CPU Manager static policy will not allow any workload to use the reserved system CPU cores.
31+
32+
33+
## Enabling the feature
34+
35+
To enable this feature, you need to turn on both the `CPUManagerPolicyAlphaOptions` feature gate and the `strict-cpu-reservation` policy option. And you need to remove the `/var/lib/kubelet/cpu_manager_state` file if it exists and restart kubelet.
36+
37+
With the following kubelet configuration:
38+
39+
```yaml
40+
kind: KubeletConfiguration
41+
apiVersion: kubelet.config.k8s.io/v1beta1
42+
featureGates:
43+
...
44+
CPUManagerPolicyOptions: true
45+
CPUManagerPolicyAlphaOptions: true
46+
cpuManagerPolicy: static
47+
cpuManagerPolicyOptions:
48+
strict-cpu-reservation: "true"
49+
reservedSystemCPUs: "0,32,1,33,16,48"
50+
...
51+
```
52+
53+
When `strict-cpu-reservation` is not set or set to false:
54+
```console
55+
# cat /var/lib/kubelet/cpu_manager_state
56+
{"policyName":"static","defaultCpuSet":"0-63","checksum":1058907510}
57+
```
58+
59+
When `strict-cpu-reservation` is set to true:
60+
```console
61+
# cat /var/lib/kubelet/cpu_manager_state
62+
{"policyName":"static","defaultCpuSet":"2-15,17-31,34-47,49-63","checksum":4141502832}
63+
```
64+
65+
66+
## Monitoring the feature
67+
68+
You can monitor the feature impact by checking the following CPU Manager counters:
69+
- `cpu_manager_shared_pool_size_millicores`: report shared pool size, in millicores (e.g. 13500m)
70+
- `cpu_manager_exclusive_cpu_allocation_count`: report exclusively allocated cores, counting full cores (e.g. 16)
71+
72+
Your best-effort workloads may starve if the `cpu_manager_shared_pool_size_millicores` count is zero for prolonged time.
73+
74+
We believe any pod that is required for operational purpose like a log forwarder should not run as best-effort, but you can review and adjust the amount of CPU cores reserved as needed.
75+
76+
## Conclusion
77+
78+
Strict CPU reservation is critical for Telco/NFV use cases. It is also a prerequisite for enabling the all-in-one type of deployments where workloads are placed on nodes serving combined control+worker+storage roles.
79+
80+
We want you to start using the feature and looking forward to your feedback.
81+
82+
83+
## Further reading
84+
85+
Please check out the [Control CPU Management Policies on the Node](/docs/tasks/administer-cluster/cpu-management-policies/)
86+
task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers.
87+
88+
89+
## Getting involved
90+
91+
This feature is driven by the [SIG Node](https://github.com/Kubernetes/community/blob/master/sig-node/README.md). If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.

0 commit comments

Comments
 (0)