@@ -135,52 +135,6 @@ entirely read-only. To close this gap it would be great if such
135135propagated mounts could implicitly gain ` MS_RDONLY ` as they are
136136propagated.
137137
138- ### Disabling reception of ` SCM_RIGHTS ` for ` AF_UNIX ` sockets
139-
140- [ x] Ability to turn off ` SCM_RIGHTS ` reception for ` AF_UNIX `
141- sockets.
142-
143- ** 🙇 ` 77cbe1a6d8730a07f99f9263c2d5f2304cf5e830 ("af_unix: Introduce SO_PASSRIGHTS") ` 🙇**
144-
145- Right now reception of file descriptors is always on when
146- a process makes the mistake of invoking ` recvmsg() ` on such a
147- socket. This is problematic since ` SCM_RIGHTS ` installs file
148- descriptors in the recipient process' file descriptor
149- table. Getting rid of these file descriptors is not necessarily
150- easy, as they could refer to "slow-to-close" files (think: dirty
151- file descriptor referring to a file on an unresponsive NFS server,
152- or some device file descriptor), that might cause the recipient to
153- block for a longer time when it tries to them. Programs reading
154- from an ` AF_UNIX ` socket currently have three options:
155-
156- 1 . Never use ` recvmsg() ` , and stick to ` read() ` , ` recv() ` and
157- similar which do not install file descriptors in the recipients
158- file descriptor table.
159-
160- 2 . Ignore the problem, and simply ` close() ` the received file descriptors
161- it didn't expect, thus possibly locking up for a longer time.
162-
163- 3 . Fork off a thread that invokes ` close() ` , which mitigates the
164- risk of blocking, but still means a sender can cause resource
165- exhaustion in a recipient by flooding it with file descriptors,
166- as for each of them a thread needs to be spawned and a file
167- descriptor is taken while it is in the process of being closed.
168-
169- (Another option of course is to never talk ` AF_UNIX ` to peers that
170- are not trusted to not send unexpected file descriptors.)
171-
172- A simple knob that allows turning off ` SCM_RIGHTS ` right reception
173- would be useful to close this weakness, and would allow
174- ` recvmsg() ` to be called without risking file descriptors to be
175- installed in the file descriptor table, and thus risking a
176- blocking ` close() ` or a form of potential resource exhaustion.
177-
178- ** Use-Case:** any program that uses ` AF_UNIX ` sockets and uses (or
179- would like to use) ` recvmsg() ` on it (which is useful to acquire
180- other metadata). Example: logging daemons that want to collect
181- timestamp or ` SCM_CREDS ` auxiliary data, or the D-Bus message
182- broker and suchlike.
183-
184138### Filtering on received file descriptors
185139
186140An alternative to the previous item could be if some form of filtering
@@ -191,27 +145,6 @@ received" may be expressed. (BPF?).
191145
192146** Use-Case:** as above.
193147
194- ### A reliable way to check for PID namespacing
195-
196- [ x] A reliable (non-heuristic) way to detect from userspace if the
197- current process is running in a PID namespace that is not the main
198- PID namespace. PID namespaces are probably the primary type of
199- namespace that identify a container environment. While many
200- heuristics exist to determine generically whether one is executed
201- inside a container, it would be good to have a correct,
202- well-defined way to determine this.
203-
204- ** 🙇 The inode number of the root PID namespace is fixed (0xEFFFFFFC)
205- and now considered API. It can be used to distinguish the root PID
206- namespace from all others. 🙇**
207-
208- ** Use-Case:** tools such as ` systemd-detect-virt ` exist to determine
209- container execution, but typically resolve to checking for
210- specific implementations. It would be much nicer and universally
211- applicable if such a check could be done generically. It would
212- probably suffice to provide an ` ioctl() ` call on the ` pidns ` file
213- descriptor that reveals this kind of information in some form.
214-
215148### Excluding processes watched via ` pidfd ` from ` waitid(P_ALL, …) `
216149
217150** Use-Case:** various programs use ` waitid(P_ALL, …) ` to collect exit
@@ -1007,3 +940,70 @@ handlers.
1007940** 🙇 ` bc70682a497c ("ovl: support idmapped layers") ` 🙇**
1008941
1009942** Use-Case:** Allow containers to use ` overlayfs ` with idmapped mounts.
943+
944+ ### Disabling reception of ` SCM_RIGHTS ` for ` AF_UNIX ` sockets
945+
946+ [ x] Ability to turn off ` SCM_RIGHTS ` reception for ` AF_UNIX `
947+ sockets.
948+
949+ ** 🙇 ` 77cbe1a6d8730a07f99f9263c2d5f2304cf5e830 ("af_unix: Introduce SO_PASSRIGHTS") ` 🙇**
950+
951+ Right now reception of file descriptors is always on when
952+ a process makes the mistake of invoking ` recvmsg() ` on such a
953+ socket. This is problematic since ` SCM_RIGHTS ` installs file
954+ descriptors in the recipient process' file descriptor
955+ table. Getting rid of these file descriptors is not necessarily
956+ easy, as they could refer to "slow-to-close" files (think: dirty
957+ file descriptor referring to a file on an unresponsive NFS server,
958+ or some device file descriptor), that might cause the recipient to
959+ block for a longer time when it tries to them. Programs reading
960+ from an ` AF_UNIX ` socket currently have three options:
961+
962+ 1 . Never use ` recvmsg() ` , and stick to ` read() ` , ` recv() ` and
963+ similar which do not install file descriptors in the recipients
964+ file descriptor table.
965+
966+ 2 . Ignore the problem, and simply ` close() ` the received file descriptors
967+ it didn't expect, thus possibly locking up for a longer time.
968+
969+ 3 . Fork off a thread that invokes ` close() ` , which mitigates the
970+ risk of blocking, but still means a sender can cause resource
971+ exhaustion in a recipient by flooding it with file descriptors,
972+ as for each of them a thread needs to be spawned and a file
973+ descriptor is taken while it is in the process of being closed.
974+
975+ (Another option of course is to never talk ` AF_UNIX ` to peers that
976+ are not trusted to not send unexpected file descriptors.)
977+
978+ A simple knob that allows turning off ` SCM_RIGHTS ` right reception
979+ would be useful to close this weakness, and would allow
980+ ` recvmsg() ` to be called without risking file descriptors to be
981+ installed in the file descriptor table, and thus risking a
982+ blocking ` close() ` or a form of potential resource exhaustion.
983+
984+ ** Use-Case:** any program that uses ` AF_UNIX ` sockets and uses (or
985+ would like to use) ` recvmsg() ` on it (which is useful to acquire
986+ other metadata). Example: logging daemons that want to collect
987+ timestamp or ` SCM_CREDS ` auxiliary data, or the D-Bus message
988+ broker and suchlike.
989+
990+ ### A reliable way to check for PID namespacing
991+
992+ [ x] A reliable (non-heuristic) way to detect from userspace if the
993+ current process is running in a PID namespace that is not the main
994+ PID namespace. PID namespaces are probably the primary type of
995+ namespace that identify a container environment. While many
996+ heuristics exist to determine generically whether one is executed
997+ inside a container, it would be good to have a correct,
998+ well-defined way to determine this.
999+
1000+ ** 🙇 The inode number of the root PID namespace is fixed (0xEFFFFFFC)
1001+ and now considered API. It can be used to distinguish the root PID
1002+ namespace from all others. 🙇**
1003+
1004+ ** Use-Case:** tools such as ` systemd-detect-virt ` exist to determine
1005+ container execution, but typically resolve to checking for
1006+ specific implementations. It would be much nicer and universally
1007+ applicable if such a check could be done generically. It would
1008+ probably suffice to provide an ` ioctl() ` call on the ` pidns ` file
1009+ descriptor that reveals this kind of information in some form.
0 commit comments