Skip to content

Commit 8deca02

Browse files
poetteringbluca
authored andcommitted
doc: add a markdown doc giving an overview over the fdstore
And link it up everywhere. (cherry picked from commit 0959847)
1 parent d5c180b commit 8deca02

File tree

4 files changed

+213
-9
lines changed

4 files changed

+213
-9
lines changed

docs/FILE_DESCRIPTOR_STORE.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: The File Descriptor Store
3+
category: Interfaces
4+
layout: default
5+
SPDX-License-Identifier: LGPL-2.1-or-later
6+
---
7+
8+
# The File Descriptor Store
9+
10+
*TL;DR: The systemd service manager may optionally maintain a set of file
11+
descriptors for each service, that are under control of the service and that
12+
help making service restarts without losing connectivity or context easier to
13+
implement.*
14+
15+
Since its inception `systemd` has supported the *socket* *activation*
16+
mechanism: the service manager creates and listens on some sockets (and similar
17+
UNIX file descriptors) on behalf of a service, and then passes them to the
18+
service during activation of the service via UNIX file descriptor (short: *fd*)
19+
passing over `execve()`. This is primarily exposed in the
20+
[.socket](https://www.freedesktop.org/software/systemd/man/systemd.socket.html)
21+
unit type.
22+
23+
The *file* *descriptor* *store* (short: *fdstore*) extends this concept, and
24+
allows services to *upload* during runtime additional fds to the service
25+
manager that it shall keep on its behalf. File descriptors are passed back to
26+
the service on subsequent activations, the same way as any socket activation
27+
fds are passed.
28+
29+
If a service fd is passed to the fdstore logic of the service manager it only
30+
maintains a duplicate of it (in the sense of UNIX
31+
[`dup(2)`](https://man7.org/linux/man-pages/man2/dup.2.html)), the fd remains
32+
also in possession of the service itself, and it may (and is expected to)
33+
invoke any operations on it that it likes.
34+
35+
The primary usecase of this logic is to permit services to restart seamlessly
36+
(for example to update them to a newer version), without losing execution
37+
context, dropping pinned resources, terminating established connections or even
38+
just momentarily losing connectivity. In fact, as the file descriptors can be
39+
uploaded freely at any time during the service runtime, this can even be used to
40+
implement services that robustly handle abnormal termination and can recover
41+
from that without losing pinned resources.
42+
43+
Note that Linux supports the
44+
[`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html) concept
45+
that allows associating a memory-backed fd with arbitrary data. This may
46+
conveniently be used to serialize service state into and then place in the
47+
fdstore, in order to implement service restarts with full service state being
48+
passed over.
49+
50+
# Basic Mechanism
51+
52+
The fdstore is enabled per-service via the
53+
[`FileDescriptorStoreMax=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStoreMax=)
54+
service setting. It defaults to zero (which means the fdstore logic is turned
55+
off), but can take an unsigned integer value that controls how many fds to
56+
permit the service to upload to the service manager to keep simultaneously.
57+
58+
If set to values > 0, the fdstore is enabled. When invoked the service may now
59+
(asynchronously) upload file descriptors to the fdstore via the
60+
[`sd_pid_notify_with_fds()`](https://www.freedesktop.org/software/systemd/man/sd_pid_notify_with_fds.html)
61+
API call (or an equivalent reimplementation). When uploading the fds it is
62+
necessary to set the `FDSTORE=1` field in the message, to indicate what the fd
63+
is intended for. It's recommended to also set the `FDNAME=…` field to any
64+
string of choice, which may be used to identify the fd later.
65+
66+
Whenever the service is restarted the fds in its fdstore will be passed to the
67+
new instance following the same protocol as for socket activation fds. i.e. the
68+
`$LISTEN_FDS`, `$LISTEN_PIDS`, `$LISTEN_FDNAMES` environment variables will be
69+
set (the latter will be populated from the `FDNAME=…` field mentioned
70+
above). See
71+
[`sd_listen_fds()`](https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html)
72+
for details on receiving such fds in a service. (Note that the name set in
73+
`FDNAME=…` does not need to be unique, which is useful when operating with
74+
multiple fully equivalent sockets or similar, for example for a service that
75+
both operates on IPv4 and IPv6 and treats both more or less the same.).
76+
77+
And that's already the gist of it.
78+
79+
# Seamless Service Restarts
80+
81+
A system service that provides a client-facing interface that shall be able to
82+
seamlessly restart can make use of this in a scheme like the following:
83+
whenever a new connection comes in it uploads its fd immediately into its
84+
fdstore. At approporate times it also serializes its state into a memfd it
85+
uploads to the service manager — either whenever the state changed
86+
sufficiently, or simply right before it terminates. (The latter of course means
87+
that state only survives on *clean* restarts and abnormal termination implies the
88+
state is lost completely — while the former would mean there's a good chance the
89+
next restart after an abnormal termination could continue where it left off
90+
with only some context lost.)
91+
92+
Using the fdstore for such seamless service restarts is generally recommended
93+
over implementations that attempt to leave a process from the old service
94+
instance around until after the new instance already started, so that the old
95+
then communicates with the new service instance, and passes the fds over
96+
directly. Typically service restarts are a mechanism for implementing *code*
97+
updates, hence leaving two version of the service running at the same time is
98+
generally problematic. It also collides with the systemd service manager's
99+
general principle of guaranteeing a pristine execution environment, a pristine
100+
security context, and a pristine resource management context for freshly
101+
started services, without uncontrolled "left-overs" from previous runs. For
102+
example: leaving processes from previous runs generally negatively affects
103+
lifecycle management (i.e. `KillMode=none` must be set), which disables large
104+
parts of the service managers state tracking, resource management (as resource
105+
counters cannot start at zero during service activation anymore, since the old
106+
processes remaining skew them), security policies (as processes with possibly
107+
out-of-date security policies – selinux, AppArmor, any LSM, seccomp, BPF — in
108+
effect remain), and similar.
109+
110+
# File Descriptor Store Lifecycle
111+
112+
By default any file descriptor stored in the fdstore for which a `POLLHUP` or
113+
`POLLERR` is seen is automatically closed and removed from the fdstore. This
114+
behaviour can be turned off, by setting the `FDPOLL=0` field when uploading the
115+
fd via `sd_notify_with_fds()`.
116+
117+
The fdstore is automatically closed whenever the service is fully deactivated
118+
and no jobs are queued for it anymore. This means that a restart job for a
119+
service will leave the fdstore intact, but a separate stop and start job for
120+
it — executed synchronously one after the other — will likely not.
121+
122+
This behaviour can be modified via the
123+
[`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStorePreserve=)
124+
setting in service unit files. If set to `yes` the fdstore will be kept as long
125+
as the service definition is loaded into memory by the service manager, i.e. as
126+
long as at least one other loaded unit has a reference to it.
127+
128+
The `systemctl clean --what=fdstore …` command may be used to explicitly clear
129+
the fdstore of a service. This is only allowed when the service is fully
130+
deactivated, and is hence primarily useful in case
131+
`FileDescriptorStorePreserve=yes` is set (because the fdstore is otherwise
132+
fully closed anyway in this state).
133+
134+
Individual file descriptors may be removed from the fdstore via the
135+
`sd_notify()` mechanism, by sending an `FDSTOREREMOVE=1` message, accompanied
136+
by an `FDNAME=…` string identifying the fds to remove. (The name does not have
137+
to be unique, as mentioned, in which case *all* matching fds are
138+
closed). Generally it's a good idea to send such messages to the service
139+
manager during initialization of the service whenever an unrecognized fd is
140+
received, to make the service robust for code updates: if an old version
141+
uploaded an fd that the new version doesn't recognize anymore it's good idea to
142+
close it both in the service and in the fdstore.
143+
144+
Note that storing a duplicate of an fd in the fdstore means the fd remains
145+
pinned even if the service closes it. This in particular means that peers on a
146+
connection socket uploaded this way will not receive an automatic `POLLHUP`
147+
event anymore if the service code issues `close()` on the socket. It must
148+
accompany it with an `FDSTOREREMOVE=1` notification to the service manager, so
149+
that the fd is comprehensively closed.
150+
151+
# Access Control
152+
153+
Access to the fds in the file descriptor store is generally restricted to the
154+
service code itself. Pushing fds into or removing fds from the fdstore is
155+
subject to the access control restrictions of any other `sd_notify()` message,
156+
which is controlled via
157+
[`NotifyAccess=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#NotifyAccess=).
158+
159+
By default only the main service process hence can push/remove fds, but by
160+
setting `NotifyAccess=cgroup` this may be relaxed to allow arbitrary service
161+
child processes to do the same.
162+
163+
# Soft Reboot
164+
165+
The fdstore is particularly interesting in [soft
166+
reboot](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html)
167+
scenarios, as per `systemctl soft-reboot` (which restarts userspace like in a
168+
real reboot, but leaves the kernel running). File descriptor stores that remain
169+
loaded at the very end of the system cycle — just before the soft-reboot – are
170+
passed over to the next system cycle, and propagated to services they originate
171+
from there. This enables updating the full userspace of a system during
172+
runtime, fully replacing all processes without losing pinning resources,
173+
interrupting connectivity or established connections and similar.
174+
175+
This mechanism can be enabled either by making sure the service survives until
176+
the very end (i.e. by setting `DefaultDependencies=no` so that it keeps running
177+
for the whole system lifetime without being regularly deactivated at shutdown)
178+
or by setting `FileDescriptorStorePresever=yes` (and referencing the unit
179+
continously).
180+
181+
# Debugging
182+
183+
The
184+
[`systemd-analyze`](https://www.freedesktop.org/software/systemd/man/systemd-analyze.html#systemd-analyze%20fdstore%20%5BUNIT...%5D)
185+
tool may be used to list the current contents of the fdstore of any running
186+
service.
187+
188+
The
189+
[`systemd-run`](https://www.freedesktop.org/software/systemd/man/systemd-run.html)
190+
tool may be used to quickly start a testing binary or similar as a service. Use
191+
`-p FileDescriptorStore=4711` to enable the fdstore from `systemd-run`'s
192+
command line. By using the `-t` switch you can even interactively communicate
193+
via processes spawned that way, via the TTY.

man/sd_listen_fds.xml

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,11 @@
4646
<title>Description</title>
4747

4848
<para><function>sd_listen_fds()</function> may be invoked by a daemon to check for file descriptors
49-
passed by the service manager as part of the socket-based activation logic. It returns the number of
50-
received file descriptors. If no file descriptors have been received, zero is returned. The first file
51-
descriptor may be found at file descriptor number 3 (i.e. <constant>SD_LISTEN_FDS_START</constant>), the
52-
remaining descriptors follow at 4, 5, 6, …, if any.</para>
49+
passed by the service manager as part of the socket-based activation and file descriptor store logic. It
50+
returns the number of received file descriptors. If no file descriptors have been received, zero is
51+
returned. The first file descriptor may be found at file descriptor number 3
52+
(i.e. <constant>SD_LISTEN_FDS_START</constant>), the remaining descriptors follow at 4, 5, 6, …, if
53+
any.</para>
5354

5455
<para>The file descriptors passed this way may be closed at will by the processes receiving them: it's up
5556
to the processes themselves to close them after use or whether to leave them open until the process exits
@@ -78,9 +79,8 @@
7879
for the service to work, hence it should not be verified. On the other hand, whether a socket is a
7980
datagram or stream socket matters a lot for the most common program logics and should be checked.</para>
8081

81-
<para>This function call will set the FD_CLOEXEC flag for all
82-
passed file descriptors to avoid further inheritance to children
83-
of the calling process.</para>
82+
<para>This function call will set the <constant>FD_CLOEXEC</constant> flag for all passed file
83+
descriptors to avoid further inheritance to children of the calling process.</para>
8484

8585
<para>If multiple socket units activate the same service, the order
8686
of the file descriptors passed to its main process is undefined.
@@ -164,6 +164,9 @@
164164
</tgroup>
165165
</table>
166166
</para>
167+
168+
<para>For further information on the file descriptor store see the <ulink
169+
url="https://systemd.io/FILE_DESCRIPTOR_STORE">File Descriptor Store</ulink> overview.</para>
167170
</refsect1>
168171

169172
<refsect1>

man/sd_notify.xml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,11 @@
274274
stopped, its file descriptor store is discarded and all file descriptors in it are closed. Use
275275
<function>sd_pid_notify_with_fds()</function> to send messages with <literal>FDSTORE=1</literal>, see
276276
below. The service manager will set the <varname>$FDSTORE</varname> environment variable for services
277-
that have the file descriptor store enabled.</para></listitem>
277+
that have the file descriptor store enabled.</para>
278+
279+
<para>For further information on the file descriptor store see the <ulink
280+
url="https://systemd.io/FILE_DESCRIPTOR_STORE">File Descriptor Store</ulink> overview.</para>
281+
</listitem>
278282
</varlistentry>
279283

280284
<varlistentry>

man/systemd.service.xml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1142,7 +1142,11 @@
11421142
<para>If this option is set to a non-zero value the <varname>$FDSTORE</varname> environment variable
11431143
will be set for processes invoked for this service. See
11441144
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
1145-
details.</para></listitem>
1145+
details.</para>
1146+
1147+
<para>For further information on the file descriptor store see the <ulink
1148+
url="https://systemd.io/FILE_DESCRIPTOR_STORE">File Descriptor Store</ulink> overview.</para>
1149+
</listitem>
11461150
</varlistentry>
11471151

11481152
<varlistentry>

0 commit comments

Comments
 (0)