Skip to content

Commit ea4444a

Browse files
authored
Merge pull request #43214 from shannonxtreme/apparmor-seccomp
Add new page for kernel-level constraints
2 parents 23b054d + 7416c9c commit ea4444a

File tree

3 files changed

+322
-40
lines changed

3 files changed

+322
-40
lines changed
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
---
2+
title: Linux kernel security constraints for Pods and containers
3+
description: >
4+
Overview of Linux kernel security modules and constraints that you can use to
5+
harden your Pods and containers.
6+
content_type: concept
7+
weight: 100
8+
---
9+
10+
<!-- overview -->
11+
12+
This page describes some of the security features that are built into the Linux
13+
kernel that you can use in your Kubernetes workloads. To learn how to apply
14+
these features to your Pods and containers, refer to
15+
[Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
16+
You should already be familiar with Linux and with the basics of Kubernetes
17+
workloads.
18+
19+
<!-- body -->
20+
21+
## Run workloads without root privileges {#run-without-root}
22+
23+
When you deploy a workload in Kubernetes, use the Pod specification to restrict
24+
that workload from running as the root user on the node. You can use the Pod
25+
`securityContext` to define the specific Linux user and group for the processes in
26+
the Pod, and explicitly restrict containers from running as root users. Setting
27+
these values in the Pod manifest takes precedence over similar values in the
28+
container image, which is especially useful if you're running images that you
29+
don't own.
30+
31+
{{< caution >}}
32+
Ensure that the user or group that you assign to the workload has the permissions
33+
required for the application to function correctly. Changing the user or group
34+
to one that doesn't have the correct permissions could lead to file access
35+
issues or failed operations.
36+
{{< /caution >}}
37+
38+
Configuring the kernel security features on this page provides fine-grained
39+
control over the actions that processes in your cluster can take, but managing
40+
these configurations can be challenging at scale. Running containers as
41+
non-root, or in user namespaces if you need root privileges, helps to reduce the
42+
chance that you'll need to enforce your configured kernel security capabilities.
43+
44+
## Security features in the Linux kernel {#linux-security-features}
45+
46+
Kubernetes lets you configure and use Linux kernel features to improve isolation
47+
and harden your containerized workloads. Common features include the following:
48+
49+
* **Secure computing mode (seccomp)**: Filter which system calls a process can
50+
make
51+
* **AppArmor**: Restrict the access privileges of individual programs
52+
* **Security Enhanced Linux (SELinux)**: Assign security labels to objects for
53+
more manageable security policy enforcement
54+
55+
To configure settings for one of these features, the operating system that you
56+
choose for your nodes must enable the feature in the kernel. For example,
57+
Ubuntu 7.10 and later enable AppArmor by default. To learn whether your OS
58+
enables a specific feature, consult the OS documentation.
59+
60+
You use the `securityContext` field in your Pod specification to define the
61+
constraints that apply to those processes. The `securityContext` field also
62+
supports other security settings, such as specific Linux capabilities or file
63+
access permissions using UIDs and GIDs. To learn more, refer to
64+
[Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
65+
66+
### seccomp
67+
68+
Some of your workloads might need privileges to perform specific actions as the
69+
root user on your node's host machine. Linux uses *capabilities* to divide the
70+
available privileges into categories, so that processes can get the privileges
71+
required to perform specific actions without being granted all privileges. Each
72+
capability has a set of system calls (syscalls) that a process can make. seccomp
73+
lets you restrict these individual syscalls. <!--Copied from seccomp tutorial-->
74+
It can be used to sandbox the privileges of a process, restricting the calls it
75+
is able to make from userspace into the kernel.<!--End copy-->
76+
77+
In Kubernetes, you use a *container runtime* on each node to run your
78+
containers. Example runtimes include CRI-O, Docker, or containerd. Each runtime
79+
allows only a subset of Linux capabilities by default. You can further limit the
80+
allowed syscalls individually by using a seccomp profile. Container runtimes
81+
usually include a default seccomp profile. <!--Copied from seccomp tutorial-->
82+
Kubernetes lets you automatically
83+
apply seccomp profiles loaded onto a node to your Pods and containers.<!--End copy-->
84+
85+
{{<note>}}
86+
Kubernetes also has the `allowPrivilegeEscalation` setting for Pods and
87+
containers. When set to `false`, this prevents processes from gaining new
88+
capabilities and restricts unprivileged users from changing the applied seccomp
89+
profile to a more permissive profile.
90+
{{</note>}}
91+
92+
To learn how to implement seccomp in Kubernetes, refer to
93+
[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/).
94+
95+
To learn more about seccomp, see
96+
[Seccomp BPF](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)
97+
in the Linux kernel documentation.
98+
99+
#### Considerations for seccomp {#seccomp-considerations}
100+
101+
seccomp is a low-level security configuration that you should only configure
102+
yourself if you require fine-grained control over Linux syscalls. Using
103+
seccomp, especially at scale, has the following risks:
104+
105+
* Configurations might break during application updates
106+
* Attackers can still use allowed syscalls to exploit vulnerabilities
107+
* Profile management for individual applications becomes challenging at scale
108+
109+
**Recommendation**: Use the default seccomp profile that's bundled with your
110+
container runtime. If you need a more isolated environment, consider using a
111+
sandbox, such as gVisor. Sandboxes solve the preceding risks with custom
112+
seccomp profiles, but require more compute resources on your nodes and might
113+
have compatibility issues with GPUs and other specialized hardware.
114+
115+
### AppArmor and SELinux: policy-based mandatory access control {#policy-based-mac}
116+
117+
You can use Linux policy-based mandatory access control (MAC) mechanisms, such
118+
as AppArmor and SELinux, to harden your Kubernetes workloads.
119+
120+
#### AppArmor
121+
122+
<!-- Original text from https://kubernetes.io/docs/tutorials/security/apparmor/ -->
123+
124+
[AppArmor](https://apparmor.net/) is a Linux kernel security module that
125+
supplements the standard Linux user and group based permissions to confine
126+
programs to a limited set of resources. AppArmor can be configured for any
127+
application to reduce its potential attack surface and provide greater in-depth
128+
defense. It is configured through profiles tuned to allow the access needed by a
129+
specific program or container, such as Linux capabilities, network access, and
130+
file permissions. Each profile can be run in either enforcing mode, which blocks
131+
access to disallowed resources, or complain mode, which only reports violations.
132+
133+
AppArmor can help you to run a more secure deployment by restricting what
134+
containers are allowed to do, and/or provide better auditing through system
135+
logs. The container runtime that you use might ship with a default AppArmor
136+
profile, or you can use a custom profile.
137+
138+
To learn how to use AppArmor in Kubernetes, refer to
139+
[Restrict a Container's Access to Resources with AppArmor](/docs/tutorials/security/apparmor/).
140+
141+
#### SELinux
142+
143+
SELinux is a Linux kernel security module that lets you restrict the access
144+
that a specific *subject*, such as a process, has to the files on your system.
145+
You define security policies that apply to subjects that have specific SELinux
146+
labels. When a process that has an SELinux label attempts to access a file, the
147+
SELinux server checks whether that process' security policy allows the access
148+
and makes an authorization decision.
149+
150+
In Kubernetes, you can set an SELinux label in the `securityContext` field of
151+
your manifest. The specified labels are assigned to those processes. If you
152+
have configured security policies that affect those labels, the host OS kernel
153+
enforces these policies.
154+
155+
To learn how to use SELinux in Kubernetes, refer to
156+
[Assign SELinux labels to a container](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container).
157+
158+
#### Differences between AppArmor and SELinux {#apparmor-selinux-diff}
159+
160+
The operating system on your Linux nodes usually includes one of either
161+
AppArmor or SELinux. Both mechanisms provide similar types of protection, but
162+
have differences such as the following:
163+
164+
* **Configuration**: AppArmor uses profiles to define access to resources.
165+
SELinux uses policies that apply to specific labels.
166+
* **Policy application**: In AppArmor, you define resources using file paths.
167+
SELinux uses the index node (inode) of a resource to identify the resource.
168+
169+
### Summary of features {#summary}
170+
171+
The following table describes the use cases and scope of each security control.
172+
You can use all of these controls together to build a more hardened system.
173+
174+
<table>
175+
<caption>Summary of Linux kernel security features</caption>
176+
<thead>
177+
<tr>
178+
<th>Security feature</th>
179+
<th>Description</th>
180+
<th>How to use</th>
181+
<th>Example</th>
182+
</tr>
183+
</thead>
184+
<tbody>
185+
<tr>
186+
<td>seccomp</td>
187+
<td>Restrict individual kernel calls in the userspace. Reduces the
188+
likelihood that a vulnerability that uses a restricted syscall would
189+
compromise the system.</td>
190+
<td>Specify a loaded seccomp profile in the Pod or container specification
191+
to apply its constraints to the processes in the Pod.</td>
192+
<td>Reject the <code>unshare</code> syscall, which was used in
193+
<a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0185">CVE-2022-0185</a>.</td>
194+
</tr>
195+
<tr>
196+
<td>AppArmor</td>
197+
<td>Restrict program access to specific resources. Reduces the attack
198+
surface of the program. Improves audit logging.</td>
199+
<td>Specify a loaded AppArmor profile in the container specification.</td>
200+
<td>Restrict a read-only program from writing to any file path
201+
in the system.</td>
202+
</tr>
203+
<tr>
204+
<td>SELinux</td>
205+
<td>Restrict access to resources such as files, applications, ports, and
206+
processes using labels and security policies.</td>
207+
<td>Specify access restrictions for specific labels. Tag processes with
208+
those labels to enforce the access restrictions related to the label.</td>
209+
<td>Restrict a container from accessing files outside its own filesystem.</td>
210+
</tr>
211+
</tbody>
212+
</table>
213+
214+
{{< note >}}
215+
Mechanisms like AppArmor and SELinux can provide protection that extends beyond
216+
the container. For example, you can use SELinux to help mitigate
217+
[CVE-2019-5736](https://access.redhat.com/security/cve/cve-2019-5736).
218+
{{< /note >}}
219+
220+
### Considerations for managing custom configurations {#considerations-custom-configurations}
221+
222+
seccomp, AppArmor, and SELinux usually have a default configuration that offers
223+
basic protections. You can also create custom profiles and policies that meet
224+
the requirements of your workloads. Managing and distributing these custom
225+
configurations at scale might be challenging, especially if you use all three
226+
features together. To help you to manage these configurations at scale, use a
227+
tool like the
228+
[Kubernetes Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator).
229+
230+
## Kernel-level security features and privileged containers {#kernel-security-features-privileged-containers}
231+
232+
Kubernetes lets you specify that some trusted containers can run in
233+
*privileged* mode. Any container in a Pod can run in privileged mode to use
234+
operating system administrative capabilities that would otherwise be
235+
inaccessible. This is available for both Windows and Linux.
236+
237+
Privileged containers explicitly override some of the Linux kernel constraints
238+
that you might use in your workloads, as follows:
239+
240+
* **seccomp**: Privileged containers run as the `Unconfined` seccomp profile,
241+
overriding any seccomp profile that you specified in your manifest.
242+
* **AppArmor**: Privileged containers ignore any applied AppArmor profiles.
243+
* **SELinux**: Privileged containers run as the `unconfined_t` domain.
244+
245+
### Privileged containers {#privileged-containers}
246+
247+
<!-- Content from https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers -->
248+
249+
Any container in a Pod can enable *Privileged mode* if you set the
250+
`privileged: true` field in the
251+
[`securityContext`](/docs/tasks/configure-pod-container/security-context/)
252+
field for the container. Privileged containers override or undo many other hardening settings such as the applied seccomp profile, AppArmor profile, or
253+
SELinux constraints. Privileged containers are given all Linux capabilities,
254+
including capabilities that they don't require. For example, a root user in a
255+
privileged container might be able to use the `CAP_SYS_ADMIN` and
256+
`CAP_NET_ADMIN` capabilities on the node, bypassing the runtime seccomp
257+
configuration and other restrictions.
258+
259+
In most cases, you should avoid using privileged containers, and instead grant
260+
the specific capabilities required by your container using the `capabilities`
261+
field in the `securityContext` field. Only use privileged mode if you have a
262+
capability that you can't grant with the securityContext. This is useful for
263+
containers that want to use operating system administrative capabilities such
264+
as manipulating the network stack or accessing hardware devices.
265+
266+
In Kubernetes version 1.26 and later, you can also run Windows containers in a
267+
similarly privileged mode by setting the `windowsOptions.hostProcess` flag on
268+
the security context of the Pod spec. For details and instructions, see
269+
[Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
270+
271+
## Recommendations and best practices {#recommendations-best-practices}
272+
273+
* Before configuring kernel-level security capabilities, you should consider
274+
implementing network-level isolation. For more information, read the
275+
[Security Checklist](/docs/concepts/security/security-checklist/#network-security).
276+
* Unless necessary, run Linux workloads as non-root by setting specific user and
277+
group IDs in your Pod manifest and by specifying `runAsNonRoot: true`.
278+
279+
Additionally, you can run workloads in user namespaces by setting
280+
`hostUsers: false` in your Pod manifest. This lets you run containers as root
281+
users in the user namespace, but as non-root users in the host namespace on the
282+
node. This is still in early stages of development and might not have the level
283+
of support that you need. For instructions, refer to
284+
[Use a User Namespace With a Pod](/docs/tasks/configure-pod-container/user-namespaces/).
285+
286+
## {{% heading "whatsnext" %}}
287+
288+
* [Learn how to use AppArmor](/docs/tutorials/security/apparmor/)
289+
* [Learn how to use seccomp](/docs/tutorials/security/seccomp/)
290+
* [Learn how to use SELinux](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container)

content/en/docs/concepts/workloads/pods/_index.md

Lines changed: 28 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -276,30 +276,34 @@ Containers within the Pod see the system hostname as being the same as the confi
276276
`name` for the Pod. There's more about this in the [networking](/docs/concepts/cluster-administration/networking/)
277277
section.
278278

279-
## Privileged mode for containers
280-
281-
{{< note >}}
282-
Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} must support the concept of a privileged container for this setting to be relevant.
283-
{{< /note >}}
284-
285-
Any container in a pod can run in privileged mode to use operating system administrative capabilities
286-
that would otherwise be inaccessible. This is available for both Windows and Linux.
287-
288-
### Linux privileged containers
289-
290-
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag
291-
on the [security context](/docs/tasks/configure-pod-container/security-context/) of the
292-
container spec. This is useful for containers that want to use operating system administrative
293-
capabilities such as manipulating the network stack or accessing hardware devices.
294-
295-
### Windows privileged containers
296-
297-
{{< feature-state for_k8s_version="v1.26" state="stable" >}}
298-
299-
In Windows, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod) by setting the
300-
`windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these
301-
pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used
302-
to perform administrative tasks as is done with Linux privileged containers.
279+
## Pod security settings {#pod-security}
280+
281+
To set security constraints on Pods and containers, you use the
282+
`securityContext` field in the Pod specification. This field gives you
283+
granular control over what a Pod or individual containers can do. For example:
284+
285+
* Drop specific Linux capabilities to avoid the impact of a CVE.
286+
* Force all processes in the Pod to run as a non-root user or as a specific
287+
user or group ID.
288+
* Set a specific seccomp profile.
289+
* Set Windows security options, such as whether containers run as HostProcess.
290+
291+
{{< caution >}}
292+
You can also use the Pod securityContext to enable
293+
[_privileged mode_](/docs/concepts/security/linux-kernel-security-constraints/#privileged-containers)
294+
in Linux containers. Privileged mode overrides many of the other security
295+
settings in the securityContext. Avoid using this setting unless you can't grant
296+
the equivalent permissions by using other fields in the securityContext.
297+
In Kubernetes 1.26 and later, you can run Windows containers in a similarly
298+
privileged mode by setting the `windowsOptions.hostProcess` flag on the
299+
security context of the Pod spec. For details and instructions, see
300+
[Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
301+
{{< /caution >}}
302+
303+
* To learn about kernel-level security constraints that you can use,
304+
see [Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints).
305+
* To learn more about the Pod security context, see
306+
[Configure a Security Context for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
303307

304308
## Static Pods
305309

content/en/docs/tutorials/security/apparmor.md

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,22 +10,10 @@ weight: 30
1010

1111
{{< feature-state feature_gate_name="AppArmor" >}}
1212

13-
14-
[AppArmor](https://apparmor.net/) is a Linux kernel security module that supplements the standard Linux user and group based
15-
permissions to confine programs to a limited set of resources. AppArmor can be configured for any
16-
application to reduce its potential attack surface and provide greater in-depth defense. It is
17-
configured through profiles tuned to allow the access needed by a specific program or container,
18-
such as Linux capabilities, network access, file permissions, etc. Each profile can be run in either
19-
*enforcing* mode, which blocks access to disallowed resources, or *complain* mode, which only reports
20-
violations.
21-
22-
On Kubernetes, AppArmor can help you to run a more secure deployment by restricting what containers are allowed to
23-
do, and/or provide better auditing through system logs. However, it is important to keep in mind
24-
that AppArmor is not a silver bullet and can only do so much to protect against exploits in your
25-
application code. It is important to provide good, restrictive profiles, and harden your
26-
applications and cluster from other angles as well.
27-
28-
13+
This page shows you how to load AppArmor profiles on your nodes and enforce
14+
those profiles in Pods. To learn more about how Kubernetes can confine Pods using
15+
AppArmor, see
16+
[Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints/#apparmor).
2917

3018
## {{% heading "objectives" %}}
3119

0 commit comments

Comments
 (0)