Skip to content

Commit 2619a6d

Browse files
committed
Merge tag 'fuse-update-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi: - Remove tmp page copying in writeback path (Joanne). This removes ~300 lines and with that a lot of complexity related to avoiding reclaim related deadlock. The old mechanism is replaced with a mapping flag that tells the MM not to block reclaim waiting for writeback to complete. The MM parts have been reviewed/acked by respective maintainers. - Convert more code to handle large folios (Joanne). This still just adds the code to deal with large folios and does not enable them yet. - Allow invalidating all cached lookups atomically (Luis Henriques). This feature is useful for CernVMFS, which currently does this iteratively. - Align write prefaulting in fuse with generic one (Dave Hansen) - Fix race causing invalid data to be cached when setting attributes on different nodes of a distributed fs (Guang Yuan Wu) - Update documentation for passthrough (Chen Linxuan) - Add fdinfo about the device number associated with an opened /dev/fuse instance (Chen Linxuan) - Increase readdir buffer size (Miklos). This depends on a patch to VFS readdir code that was already merged through Christians tree. - Optimize io-uring request expiration (Joanne) - Misc cleanups * tag 'fuse-update-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) fuse: increase readdir buffer size readdir: supply dir_context.count as readdir buffer size hint fuse: don't allow signals to interrupt getdents copying fuse: support large folios for writeback fuse: support large folios for readahead fuse: support large folios for queued writes fuse: support large folios for stores fuse: support large folios for symlinks fuse: support large folios for folio reads fuse: support large folios for writethrough writes fuse: refactor fuse_fill_write_pages() fuse: support large folios for retrieves fuse: support copying large folios fs: fuse: add dev id to /dev/fuse fdinfo docs: filesystems: add fuse-passthrough.rst MAINTAINERS: update filter of FUSE documentation fuse: fix race between concurrent setattrs from multiple nodes fuse: remove tmp folio for writebacks and internal rb tree mm: skip folio reclaim in legacy memcg contexts for deadlockable mappings fuse: optimize over-io-uring request expiration check ...
2 parents 0fb3442 + dabb903 commit 2619a6d

File tree

14 files changed

+466
-501
lines changed

14 files changed

+466
-501
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
================
4+
FUSE Passthrough
5+
================
6+
7+
Introduction
8+
============
9+
10+
FUSE (Filesystem in Userspace) passthrough is a feature designed to improve the
11+
performance of FUSE filesystems for I/O operations. Typically, FUSE operations
12+
involve communication between the kernel and a userspace FUSE daemon, which can
13+
incur overhead. Passthrough allows certain operations on a FUSE file to bypass
14+
the userspace daemon and be executed directly by the kernel on an underlying
15+
"backing file".
16+
17+
This is achieved by the FUSE daemon registering a file descriptor (pointing to
18+
the backing file on a lower filesystem) with the FUSE kernel module. The kernel
19+
then receives an identifier (``backing_id``) for this registered backing file.
20+
When a FUSE file is subsequently opened, the FUSE daemon can, in its response to
21+
the ``OPEN`` request, include this ``backing_id`` and set the
22+
``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific
23+
operations.
24+
25+
Currently, passthrough is supported for operations like ``read(2)``/``write(2)``
26+
(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``.
27+
28+
Enabling Passthrough
29+
====================
30+
31+
To use FUSE passthrough:
32+
33+
1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH``
34+
enabled.
35+
2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the
36+
``FUSE_PASSTHROUGH`` capability and specify its desired
37+
``max_stack_depth``.
38+
3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl
39+
on its connection file descriptor (e.g., ``/dev/fuse``) to register a
40+
backing file descriptor and obtain a ``backing_id``.
41+
4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon
42+
replies with the ``FOPEN_PASSTHROUGH`` flag set in
43+
``fuse_open_out::open_flags`` and provides the corresponding ``backing_id``
44+
in ``fuse_open_out::backing_id``.
45+
5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with
46+
the ``backing_id`` to release the kernel's reference to the backing file
47+
when it's no longer needed for passthrough setups.
48+
49+
Privilege Requirements
50+
======================
51+
52+
Setting up passthrough functionality currently requires the FUSE daemon to
53+
possess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several
54+
security and resource management considerations that are actively being
55+
discussed and worked on. The primary reasons for this restriction are detailed
56+
below.
57+
58+
Resource Accounting and Visibility
59+
----------------------------------
60+
61+
The core mechanism for passthrough involves the FUSE daemon opening a file
62+
descriptor to a backing file and registering it with the FUSE kernel module via
63+
the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id``
64+
associated with a kernel-internal ``struct fuse_backing`` object, which holds a
65+
reference to the backing ``struct file``.
66+
67+
A significant concern arises because the FUSE daemon can close its own file
68+
descriptor to the backing file after registration. The kernel, however, will
69+
still hold a reference to the ``struct file`` via the ``struct fuse_backing``
70+
object as long as it's associated with a ``backing_id`` (or subsequently, with
71+
an open FUSE file in passthrough mode).
72+
73+
This behavior leads to two main issues for unprivileged FUSE daemons:
74+
75+
1. **Invisibility to lsof and other inspection tools**: Once the FUSE
76+
daemon closes its file descriptor, the open backing file held by the kernel
77+
becomes "hidden." Standard tools like ``lsof``, which typically inspect
78+
process file descriptor tables, would not be able to identify that this
79+
file is still open by the system on behalf of the FUSE filesystem. This
80+
makes it difficult for system administrators to track resource usage or
81+
debug issues related to open files (e.g., preventing unmounts).
82+
83+
2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to
84+
resource limits, including the maximum number of open file descriptors
85+
(``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files
86+
and then close its own FDs, it could potentially cause the kernel to hold
87+
an unlimited number of open ``struct file`` references without these being
88+
accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a
89+
denial-of-service (DoS) by exhausting system-wide file resources.
90+
91+
The ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues,
92+
restricting this powerful capability to trusted processes.
93+
94+
**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files",
95+
which are visible via ``fdinfo`` and accounted under the registering user's
96+
``RLIMIT_NOFILE``.
97+
98+
Filesystem Stacking and Shutdown Loops
99+
--------------------------------------
100+
101+
Another concern relates to the potential for creating complex and problematic
102+
filesystem stacking scenarios if unprivileged users could set up passthrough.
103+
A FUSE passthrough filesystem might use a backing file that resides:
104+
105+
* On the *same* FUSE filesystem.
106+
* On another filesystem (like OverlayFS) which itself might have an upper or
107+
lower layer that is a FUSE filesystem.
108+
109+
These configurations could create dependency loops, particularly during
110+
filesystem shutdown or unmount sequences, leading to deadlocks or system
111+
instability. This is conceptually similar to the risks associated with the
112+
``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``.
113+
114+
To mitigate this, FUSE passthrough already incorporates checks based on
115+
filesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``).
116+
For example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate
117+
the ``max_stack_depth`` it supports. When a backing file is registered via
118+
``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's
119+
filesystem stack depth is within the allowed limit.
120+
121+
The ``CAP_SYS_ADMIN`` requirement provides an additional layer of security,
122+
ensuring that only privileged users can create these potentially complex
123+
stacking arrangements.
124+
125+
General Security Posture
126+
------------------------
127+
128+
As a general principle for new kernel features that allow userspace to instruct
129+
the kernel to perform direct operations on its behalf based on user-provided
130+
file descriptors, starting with a higher privilege requirement (like
131+
``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows
132+
the feature to be used and tested while further security implications are
133+
evaluated and addressed.

Documentation/filesystems/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ Documentation for filesystem implementations.
9999
fuse
100100
fuse-io
101101
fuse-io-uring
102+
fuse-passthrough
102103
inotify
103104
isofs
104105
nilfs2

MAINTAINERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9846,7 +9846,7 @@ L: [email protected]
98469846
S: Maintained
98479847
W: https://github.com/libfuse/
98489848
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
9849-
F: Documentation/filesystems/fuse.rst
9849+
F: Documentation/filesystems/fuse*
98509850
F: fs/fuse/
98519851
F: include/uapi/linux/fuse.h
98529852

0 commit comments

Comments
 (0)