diff --git a/.travis.yml b/.travis.yml index 82e03a210..ff332f328 100644 --- a/.travis.yml +++ b/.travis.yml @@ -6,6 +6,8 @@ go: sudo: required +go_import_path: github.com/opencontainers/runtime-spec + services: - docker diff --git a/config-linux.md b/config-linux.md index 99de39c2f..c552dd9c3 100644 --- a/config-linux.md +++ b/config-linux.md @@ -633,7 +633,7 @@ The following parameters can be specified to set up seccomp: * **`names`** *(array of strings, REQUIRED)* - the names of the syscalls. `names` MUST contain at least one entry. * **`action`** *(string, REQUIRED)* - the action for seccomp rules. - A valid list of constants as of libseccomp v2.4.0 is shown below. + A valid list of constants as of libseccomp v2.5.0 is shown below. * `SCMP_ACT_KILL` * `SCMP_ACT_KILL_PROCESS` @@ -642,6 +642,7 @@ The following parameters can be specified to set up seccomp: * `SCMP_ACT_TRACE` * `SCMP_ACT_ALLOW` * `SCMP_ACT_LOG` + * `SCMP_ACT_NOTIFY` * **`errnoRet`** *(uint, OPTIONAL)* - the errno return code to use. Some actions like `SCMP_ACT_ERRNO` and `SCMP_ACT_TRACE` allow to specify the errno diff --git a/config.md b/config.md index 48ff0d729..3c587e688 100644 --- a/config.md +++ b/config.md @@ -400,6 +400,11 @@ For POSIX platforms, the configuration structure supports `hooks` for configurin * Entries in the array have the same schema as `createRuntime` entries. * The value of `path` MUST resolve in the [runtime namespace](glossary.md#runtime-namespace). * The `createContainer` hooks MUST be executed in the [container namespace](glossary.md#container-namespace). + * **`sendSeccompFd`** (array of objects, OPTIONAL) is an array of [`sendSeccompFd` hooks](#sendseccompfd). + * Entries in the array have the same schema as `createRuntime` entries. + * The value of `path` MUST resolve in the [runtime namespace](glossary.md#runtime-namespace). + * The `sendSeccompFd` hooks MUST be executed in the [runtime namespace](glossary.md#runtime-namespace). + * The data passed over stdin is the [seccomp state](#seccompstate). * **`startContainer`** (array of objects, OPTIONAL) is an array of [`startContainer` hooks](#startContainer-hooks). * Entries in the array have the same schema as `createRuntime` entries. * The value of `path` MUST resolve in the [container namespace](glossary.md#container-namespace). @@ -415,7 +420,8 @@ For POSIX platforms, the configuration structure supports `hooks` for configurin Hooks allow users to specify programs to run before or after various lifecycle events. Hooks MUST be called in the listed order. -The [state](runtime.md#state) of the container MUST be passed to hooks over stdin so that they may do work appropriate to the current state of the container. +All hooks MUST be passed a data structure over stdin so that they may do work appropriately. +Except when specified otherwise above, the data structure is the [state](runtime.md#state) of the container. ### Prestart @@ -452,6 +458,53 @@ For example, on Linux this would happen before the `pivot_root` operation is exe The definition of `createContainer` hooks is currently underspecified and hooks authors, should only expect from the runtime that the mount namespace and different mounts will be setup. Other operations such as cgroups and SELinux/AppArmor labels might not have been performed by the runtime. +### SendSeccompFd Hooks + +The `sendSeccompFd` hooks MUST only be called if the seccomp policy contains `SCMP_ACT_NOTIFY` actions. + +The `sendSeccompFd` hooks MUST be called after the [`start`](runtime.md#start) operation is called and after the seccomp policy is installed but [before the user-specified program command is executed](runtime.md#lifecycle). +The `sendSeccompFd` hooks MAY additionally be called while the container is in the [`running` state](runtime.md#runtimeState), for example during an `exec` operation. +The goal of this hook is to pass the seccomp file descriptor to a seccomp agent. + +The `sendSeccompFd` hooks' path MUST resolve in the [runtime namespace](glossary.md#runtime-namespace). +The `sendSeccompFd` hooks MUST be executed in the [runtime namespace](glossary.md#runtime-namespace). + +### The Seccomp State + +The seccomp state is a data structure passed via stdin to the SendSeccompFd hooks. +It includes the following properties: + +* **`ociVersion`** (string, REQUIRED) is version of the Open Container Initiative Runtime Specification with which the seccomp state complies. +* **`seccompFd`** (int, REQUIRED) is the file descriptor for Seccomp User Notification passed via process inheritance to the SendSeccompFd hooks. The value MUST NOT be zero: zero is reserved for stdin. +* **`pid`** (int, REQUIRED) is the process ID on which the seccomp filter is applied. +* **`pidFd`** (int, OPTIONAL) is a pid file descriptor for the process on which the seccomp filter is applied. This file descriptor is also passed via process inheritance to the SendSeccompFd hooks. As the field is optional, the value MAY be zero, meaning `pidFd` is not passed to the hook. If passed, the file descriptor MUST NOT be zero: zero is reserved for stdin. +* **`state`** (map, REQUIRED) is the [state](runtime.md#state) of the container. + +When serialized in JSON, the format MUST adhere to the following pattern: + +```json +{ + "ociVersion": "0.2.0", + "seccompFd": 3, + "pid": 4422, + "pidFd": 4, + "state": { + "ociVersion": "0.2.0", + "id": "oci-container1", + "status": "creating", + "pid": 4422, + "bundle": "/containers/redis", + "annotations": { + "myKey": "myValue" + } + } +} +``` + +Note that if `state.status` is `creating`, the seccomp filter is created following the [`start`](runtime.md#start) command and `.pid` has the same value as `.state.pid`. +And if `state.status` is `running`, the seccomp filter is created following an `exec` command and `.pid` has a different value than `.state.pid`. + + ### StartContainer Hooks The `startContainer` hooks MUST be called [before the user-specified process is executed](runtime.md#lifecycle) as part of the [`start`](runtime.md#start) operation. @@ -485,6 +538,7 @@ See the below table for a summary of hooks and when they are called: | `prestart` (Deprecated) | runtime | After the start operation is called but before the user-specified program command is executed. | | `createRuntime` | runtime | During the create operation, after the runtime environment has been created and before the pivot root or any equivalent operation. | | `createContainer` | container | During the create operation, after the runtime environment has been created and before the pivot root or any equivalent operation. | +| `sendSeccompFd` | runtime | After the start operation is called but before the user-specified program command is executed. | | `startContainer` | container | After the start operation is called but before the user-specified program command is executed. | | `poststart` | runtime | After the user-specified process is executed but before the start operation returns. | | `poststop` | runtime | After the container is deleted but before the delete operation returns. | @@ -520,6 +574,13 @@ See the below table for a summary of hooks and when they are called: "env": [ "key1=value1"] } ], + "sendSeccompFd": [ + { + "path": "/usr/bin/seccomp-agent", + "args": ["seccomp-agent", "--allow-mknods=/dev/null,/dev/net/tun"], + "env": [ "key1=value1"] + } + ], "startContainer": [ { "path": "/usr/bin/refresh-ldcache" diff --git a/schema/config-schema.json b/schema/config-schema.json index 94923b35a..278cd06aa 100644 --- a/schema/config-schema.json +++ b/schema/config-schema.json @@ -18,6 +18,9 @@ "createContainer": { "$ref": "defs.json#/definitions/ArrayOfHooks" }, + "sendSeccompFd": { + "$ref": "defs.json#/definitions/ArrayOfHooks" + }, "startContainer": { "$ref": "defs.json#/definitions/ArrayOfHooks" }, diff --git a/schema/defs-linux.json b/schema/defs-linux.json index 73a14fc53..01b4e3ebf 100644 --- a/schema/defs-linux.json +++ b/schema/defs-linux.json @@ -60,7 +60,8 @@ "SCMP_ACT_ERRNO", "SCMP_ACT_TRACE", "SCMP_ACT_ALLOW", - "SCMP_ACT_LOG" + "SCMP_ACT_LOG", + "SCMP_ACT_NOTIFY" ] }, "SeccompFlag": { diff --git a/schema/test/config/good/spec-example.json b/schema/test/config/good/spec-example.json index a784d1d74..e12f68ca7 100644 --- a/schema/test/config/good/spec-example.json +++ b/schema/test/config/good/spec-example.json @@ -172,6 +172,13 @@ "env": [ "key1=value1"] } ], + "sendSeccompFd": [ + { + "path": "/usr/bin/seccomp-agent", + "args": ["seccomp-agent", "--allow-mknods=/dev/null,/dev/net/tun"], + "env": [ "key1=value1"] + } + ], "startContainer": [ { "path": "/usr/bin/refresh-ldcache" diff --git a/specs-go/config.go b/specs-go/config.go index 5fceeb635..91cc95346 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -137,6 +137,9 @@ type Hooks struct { // CreateContainer is a list of hooks to be run after the container has been created but before pivot_root or any equivalent operation has been called // It is called in the Container Namespace CreateContainer []Hook `json:"createContainer,omitempty"` + // SendSeccompFd is a list of hooks to be run after a new seccomp fd is created + // It is called in the Runtime Namespace + SendSeccompFd []Hook `json:"sendSeccompFd,omitempty"` // StartContainer is a list of hooks to be run after the start operation is called but before the container process is started // It is called in the Container Namespace StartContainer []Hook `json:"startContainer,omitempty"` @@ -646,6 +649,7 @@ const ( ActTrace LinuxSeccompAction = "SCMP_ACT_TRACE" ActAllow LinuxSeccompAction = "SCMP_ACT_ALLOW" ActLog LinuxSeccompAction = "SCMP_ACT_LOG" + ActNotify LinuxSeccompAction = "SCMP_ACT_NOTIFY" ) // LinuxSeccompOperator used to match syscall arguments in Seccomp diff --git a/specs-go/state.go b/specs-go/state.go index e2e64c663..d7c8aefc2 100644 --- a/specs-go/state.go +++ b/specs-go/state.go @@ -5,20 +5,22 @@ type ContainerState string const ( // StateCreating indicates that the container is being created - StateCreating ContainerState = "creating" + StateCreating ContainerState = "creating" // StateCreated indicates that the runtime has finished the create operation - StateCreated ContainerState = "created" + StateCreated ContainerState = "created" // StateRunning indicates that the container process has executed the // user-specified program but has not exited - StateRunning ContainerState = "running" + StateRunning ContainerState = "running" // StateStopped indicates that the container process has exited - StateStopped ContainerState = "stopped" + StateStopped ContainerState = "stopped" ) -// State holds information about the runtime state of the container. +// State holds information about the runtime state of the container. The State +// can be displayed when requested (query state operation); it is also passed +// via stdin to many hooks. type State struct { // Version is the version of the specification that is supported. Version string `json:"ociVersion"` @@ -33,3 +35,17 @@ type State struct { // Annotations are key values associated with the container. Annotations map[string]string `json:"annotations,omitempty"` } + +type SeccompState struct { + // Version is the version of the specification that is supported. + Version string `json:"ociVersion"` + // SeccompFd is the file descriptor for Seccomp User Notification + SeccompFd int `json:"seccompFd"` + // Pid is the process ID on which the seccomp filter is applied + Pid int `json:"pid"` + // PidFd is a pidfd for the process on which the seccomp filter is + // applied + PidFd int `json:"pidFd,omitempty"` + // State of the container + State State `json:"state"` +}