Skip to content

Commit 3f8105e

Browse files
committed
runtime-config-linux: Separate mknod from cgroups
With mknod entries in linux.devices and cgroups entries in linux.resources.devices. Background discussion in [1]. For specifying device cgroups independent of device creation. This makes it easy to distinguish between configs that call for cgroup adjustments (which have linux.resources entries) from those that don't. Without this split, folks interested in making that distinction would have to parse the device section to determine if it included cgroup changes. This will also make it easy to drop either portion (mknod [2] or cgroups [3]) independently of the other if the project decides to do so. Using seperate sections for mknod and cgroups also allows us to avoid the complicated validation rules needed for the combined format mknod/cgroup [4]. Now that there is a section specific to supplying devices, I shifted the default device listing over from config-linux [5]. The /dev/ptmx entry is a bit awkward, since it's not a device, but it seemed to fit better over here. But I would also be fine leaving it with the other mounts in config-linux. The reference links are sorted into two blocks, with kernel-doc links sorted alphabetically followed by man pages sorted alphabetically by section. [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM Subject: Separate config entries for device mknod and cgroups? Date: Mon, 5 Oct 2015 12:46:55 -0700 Message-ID: <[email protected]> [2]: #98 [3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk Subject: removal of cgroups from the OCI Linux spec Date: Wed, 28 Oct 2015 17:01:59 +0000 Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com> [4]: #101 [5]: #171 (comment) Signed-off-by: W. Trevor King <[email protected]>
1 parent 6aa53ed commit 3f8105e

File tree

3 files changed

+112
-86
lines changed

3 files changed

+112
-86
lines changed

config-linux.md

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,16 @@ Valid values are the strings for capabilities defined in [the man page](http://m
1616
]
1717
```
1818

19-
## Default Devices and File Systems
19+
## Default File Systems
2020

2121
The Linux ABI includes both syscalls and several special file paths.
2222
Applications expecting a Linux environment will very likely expect these files paths to be setup correctly.
2323

24-
The following devices and filesystems MUST be made available in each application's filesystem
25-
26-
| Path | Type | Notes |
27-
| ------------ | ------ | ------- |
28-
| /proc | [procfs](https://www.kernel.org/doc/Documentation/filesystems/proc.txt) | |
29-
| /sys | [sysfs](https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt) | |
30-
| /dev/null | [device](http://man7.org/linux/man-pages/man4/null.4.html) | |
31-
| /dev/zero | [device](http://man7.org/linux/man-pages/man4/zero.4.html) | |
32-
| /dev/full | [device](http://man7.org/linux/man-pages/man4/full.4.html) | |
33-
| /dev/random | [device](http://man7.org/linux/man-pages/man4/random.4.html) | |
34-
| /dev/urandom | [device](http://man7.org/linux/man-pages/man4/random.4.html) | |
35-
| /dev/tty | [device](http://man7.org/linux/man-pages/man4/tty.4.html) | |
36-
| /dev/console | [device](http://man7.org/linux/man-pages/man4/console.4.html) | |
37-
| /dev/pts | [devpts](https://www.kernel.org/doc/Documentation/filesystems/devpts.txt) | |
38-
| /dev/ptmx | [device](https://www.kernel.org/doc/Documentation/filesystems/devpts.txt) | Bind-mount or symlink of /dev/pts/ptmx |
39-
| /dev/shm | [tmpfs](https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt) | |
24+
The following filesystems MUST be made available in each application's filesystem
25+
26+
| Path | Type |
27+
| -------- | ------ |
28+
| /proc | [procfs](https://www.kernel.org/doc/Documentation/filesystems/proc.txt) |
29+
| /sys | [sysfs](https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt) |
30+
| /dev/pts | [devpts](https://www.kernel.org/doc/Documentation/filesystems/devpts.txt) |
31+
| /dev/shm | [tmpfs](https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt) |

runtime-config-linux.md

Lines changed: 85 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -77,93 +77,59 @@ There is a limit of 5 mappings which is the Linux kernel hard limit.
7777

7878
## Devices
7979

80-
`devices` is an array specifying the list of devices to be created in the container.
80+
`devices` is an array specifying the list of devices that the MUST be available in the container.
81+
The runtime may supply them however it likes (with [mknod][mknod.2], by bind mounting from the runtime mount namespace, etc.).
8182

8283
The following parameters can be specified:
8384

84-
* **`type`** *(char, required)* - type of device: `c`, `b`, `u` or `p`. More info in `man mknod`.
85-
86-
* **`path`** *(string, optional)* - full path to device inside container
87-
88-
* **`major, minor`** *(int64, required)* - major, minor numbers for device. More info in `man mknod`. There is a special value: `-1`, which means `*` for `device` cgroup setup.
89-
90-
* **`permissions`** *(string, optional)* - cgroup permissions for device. A composition of `r` (*read*), `w` (*write*), and `m` (*mknod*).
91-
92-
* **`fileMode`** *(uint32, optional)* - file mode for device file
93-
94-
* **`uid`** *(uint32, optional)* - uid of device owner
95-
96-
* **`gid`** *(uint32, optional)* - gid of device owner
97-
98-
**`fileMode`**, **`uid`** and **`gid`** are required if **`path`** is given and are otherwise not allowed.
85+
* **`type`** *(char, required)* - type of device: `c`, `b`, `u` or `p`.
86+
More info in [mknod(1)][mknod.1].
87+
* **`path`** *(string, required)* - full path to device inside container.
88+
* **`major, minor`** *(int64, required)* - [major, minor numbers][devices] for the device.
89+
* **`fileMode`** *(uint32, required)* - file mode for the device.
90+
You can also control access to devices [with cgroups](#device-whitelist).
91+
* **`uid`** *(uint32, required)* - id of device owner.
92+
* **`gid`** *(uint32, required)* - id of device group.
9993

10094
###### Example
10195

10296
```json
10397
"devices": [
10498
{
105-
"path": "/dev/random",
99+
"path": "/dev/fuse",
106100
"type": "c",
107-
"major": 1,
108-
"minor": 8,
109-
"permissions": "rwm",
101+
"major": 10,
102+
"minor": 229,
110103
"fileMode": 0666,
111104
"uid": 0,
112105
"gid": 0
113106
},
114107
{
115-
"path": "/dev/urandom",
116-
"type": "c",
117-
"major": 1,
118-
"minor": 9,
119-
"permissions": "rwm",
120-
"fileMode": 0666,
121-
"uid": 0,
122-
"gid": 0
123-
},
124-
{
125-
"path": "/dev/null",
126-
"type": "c",
127-
"major": 1,
128-
"minor": 3,
129-
"permissions": "rwm",
130-
"fileMode": 0666,
131-
"uid": 0,
132-
"gid": 0
133-
},
134-
{
135-
"path": "/dev/zero",
136-
"type": "c",
137-
"major": 1,
138-
"minor": 5,
139-
"permissions": "rwm",
140-
"fileMode": 0666,
141-
"uid": 0,
142-
"gid": 0
143-
},
144-
{
145-
"path": "/dev/tty",
146-
"type": "c",
147-
"major": 5,
108+
"path": "/dev/sda",
109+
"type": "b",
110+
"major": 8,
148111
"minor": 0,
149-
"permissions": "rwm",
150-
"fileMode": 0666,
151-
"uid": 0,
152-
"gid": 0
153-
},
154-
{
155-
"path": "/dev/full",
156-
"type": "c",
157-
"major": 1,
158-
"minor": 7,
159-
"permissions": "rwm",
160-
"fileMode": 0666,
112+
"fileMode": 0660,
161113
"uid": 0,
162114
"gid": 0
163115
}
164116
]
165117
```
166118

119+
###### Default Devices
120+
121+
In addition to any devices configured with this setting, the runtime MUST also supply:
122+
123+
* [`/dev/null`][null.4]
124+
* [`/dev/zero`][zero.4]
125+
* [`/dev/full`][full.4]
126+
* [`/dev/random`][random.4]
127+
* [`/dev/urandom`][random.4]
128+
* [`/dev/tty`][tty.4]
129+
* [`/dev/console`][console.4]
130+
* [`/dev/ptmx`][pts.4].
131+
A [bind-mount or symlink of the container's `/dev/pts/ptmx`][devpts].
132+
167133
## Control groups
168134

169135
Also known as cgroups, they are used to restrict resource usage for a container and handle device access.
@@ -190,6 +156,46 @@ You can configure a container's cgroups via the `resources` field of the Linux c
190156
Do not specify `resources` unless limits have to be updated.
191157
For example, to run a new process in an existing container without updating limits, `resources` need not be specified.
192158

159+
#### Device whitelist
160+
161+
`devices` is an array of entries to control the [device whitelist][cgroups-devices].
162+
The runtime MUST apply entries in the listed order.
163+
164+
The following parameters can be specified:
165+
166+
* **`allow`** *(boolean, required)* - whether the entry is allowed or denied.
167+
* **`type`** *(char, optional)* - type of device: `a` (all), `c` (char), or `b` (block).
168+
`null` or unset values mean "all", mapping to `a`.
169+
* **`major, minor`** *(int64, optional)* - [major, minor numbers][devices] for the device.
170+
`null` or unset values mean "all", mapping to [`*` in the filesystem API][cgroups-devices].
171+
* **`access`** *(string, required)* - cgroup permissions for device.
172+
A composition of `r` (read), `w` (write), and `m` (mknod).
173+
174+
###### Example
175+
176+
```json
177+
"devices": [
178+
{
179+
"allow": false,
180+
"access": "rwm"
181+
},
182+
{
183+
"allow": true,
184+
"type": "c",
185+
"major": 10,
186+
"minor": 229,
187+
"access": "rw"
188+
},
189+
{
190+
"allow": true,
191+
"type": "b",
192+
"major": 8,
193+
"minor": 0,
194+
"access": "r"
195+
}
196+
]
197+
```
198+
193199
#### Disable out-of-memory killer
194200

195201
`disableOOMKiller` contains a boolean (`true` or `false`) that enables or disables the Out of Memory killer for a cgroup.
@@ -540,3 +546,17 @@ Its value is either slave, private, or shared.
540546
```json
541547
"rootfsPropagation": "slave",
542548
```
549+
550+
[cgroups-devices]: https://www.kernel.org/doc/Documentation/cgroups/devices.txt
551+
[devices]: https://www.kernel.org/doc/Documentation/devices.txt
552+
[devpts]: https://www.kernel.org/doc/Documentation/filesystems/devpts.txt
553+
554+
[mknod.1]: http://man7.org/linux/man-pages/man1/mknod.1.html
555+
[mknod.2]: http://man7.org/linux/man-pages/man2/mknod.2.html
556+
[console.4]: http://man7.org/linux/man-pages/man4/console.4.html
557+
[full.4]: http://man7.org/linux/man-pages/man4/full.4.html
558+
[null.4]: http://man7.org/linux/man-pages/man4/null.4.html
559+
[pts.4]: http://man7.org/linux/man-pages/man4/pts.4.html
560+
[random.4]: http://man7.org/linux/man-pages/man4/random.4.html
561+
[tty.4]: http://man7.org/linux/man-pages/man4/tty.4.html
562+
[zero.4]: http://man7.org/linux/man-pages/man4/zero.4.html

runtime_config_linux.go

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ type LinuxRuntime struct {
3131
CgroupsPath *string `json:"cgroupsPath,omitempty"`
3232
// Namespaces contains the namespaces that are created and/or joined by the container
3333
Namespaces []Namespace `json:"namespaces"`
34-
// Devices are a list of device nodes that are created and enabled for the container
34+
// Devices are a list of device nodes that are created for the container
3535
Devices []Device `json:"devices"`
3636
// ApparmorProfile specified the apparmor profile for the container.
3737
ApparmorProfile string `json:"apparmorProfile"`
@@ -200,6 +200,8 @@ type Network struct {
200200

201201
// Resources has container runtime resource constraints
202202
type Resources struct {
203+
// Devices are a list of device rules for the whitelist controller
204+
Devices []DeviceCgroup `json:"devices"`
203205
// DisableOOMKiller disables the OOM killer for out of memory conditions
204206
DisableOOMKiller *bool `json:"disableOOMKiller,omitempty"`
205207
// Specify an oom_score_adj for the container.
@@ -218,7 +220,7 @@ type Resources struct {
218220
Network *Network `json:"network,omitempty"`
219221
}
220222

221-
// Device represents the information on a Linux special device file
223+
// Device represents the mknod information for a Linux special device file
222224
type Device struct {
223225
// Path to the device.
224226
Path string `json:"path"`
@@ -228,8 +230,6 @@ type Device struct {
228230
Major int64 `json:"major"`
229231
// Minor is the device's minor number.
230232
Minor int64 `json:"minor"`
231-
// Cgroup permissions format, rwm.
232-
Permissions string `json:"permissions"`
233233
// FileMode permission bits for the device.
234234
FileMode os.FileMode `json:"fileMode"`
235235
// UID of the device.
@@ -238,6 +238,20 @@ type Device struct {
238238
GID uint32 `json:"gid"`
239239
}
240240

241+
// DeviceCgroup represents a device rule for the whitelist controller
242+
type DeviceCgroup struct {
243+
// Allow or deny
244+
Allow bool `json:"allow"`
245+
// Device type, block, char, etc.
246+
Type *rune `json:"type,omitempty"`
247+
// Major is the device's major number.
248+
Major *int64 `json:"major,omitempty"`
249+
// Minor is the device's minor number.
250+
Minor *int64 `json:"minor,omitempty"`
251+
// Cgroup access permissions format, rwm.
252+
Access string `json:"access"`
253+
}
254+
241255
// Seccomp represents syscall restrictions
242256
type Seccomp struct {
243257
DefaultAction Action `json:"defaultAction"`

0 commit comments

Comments
 (0)