Skip to content

Commit 1432f06

Browse files
committed
storage/filesystem: Handle FUSE deadlock
1 parent 8ed4524 commit 1432f06

File tree

1 file changed

+84
-2
lines changed

1 file changed

+84
-2
lines changed

docs/ops/storage/filesystem.md

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,8 +1092,10 @@ Hello, world!
10921092
完成之后,记得解除挂载:
10931093

10941094
```shell
1095-
umount mountpoint/
1096-
# 或者 fusermount -u mountpoint/
1095+
# fusermount 不需要 root 权限
1096+
fusermount -u mountpoint/
1097+
# 或者 umount,旧版本可能需要 root 权限
1098+
# umount mountpoint/
10971099
```
10981100

10991101
!!! tip "FUSE 的代价"
@@ -1102,6 +1104,86 @@ umount mountpoint/
11021104

11031105
此外,FUSE 在允许任意用户访问挂载点,并且需要应用定义 ACL 的情况下,[存在潜在的安全问题](https://github.com/libfuse/libfuse?tab=readme-ov-file#security-implications),在生产环境使用时需要注意。
11041106

1107+
!!! tip "解决 FUSE 死锁问题"
1108+
1109+
一些实现不佳的 FUSE 文件系统可能会出现死锁,此时文件系统进程和访问该文件系统的进程都会陷入内核中,无法用常用的 SIGKILL 信号终止。
1110+
1111+
??? note "一个死锁的代码例子:FUSE 文件系统实现访问自身的路径"
1112+
1113+
```c
1114+
#define FUSE_USE_VERSION 31
1115+
1116+
#include <assert.h>
1117+
#include <stdio.h>
1118+
#include <stdlib.h>
1119+
#include <fuse.h>
1120+
#include <string.h>
1121+
#include <dirent.h>
1122+
1123+
static char *real_path = NULL;
1124+
1125+
static int hello_getattr(const char *path, struct stat *stbuf,
1126+
struct fuse_file_info *fi) {
1127+
int res = 0;
1128+
1129+
memset(stbuf, 0, sizeof(struct stat));
1130+
// DEADLOCK!
1131+
// Accessing myself in the getattr function
1132+
DIR *d = opendir(real_path);
1133+
struct dirent *entry = readdir(d);
1134+
// OK, impossible to reach here
1135+
assert(0);
1136+
return res;
1137+
}
1138+
1139+
static const struct fuse_operations hello_oper = {
1140+
.getattr = hello_getattr,
1141+
};
1142+
1143+
int main(int argc, char *argv[]) {
1144+
if (argc < 2) {
1145+
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
1146+
return 1;
1147+
}
1148+
real_path = realpath(argv[1], NULL);
1149+
return fuse_main(argc, argv, &hello_oper, NULL);
1150+
}
1151+
```
1152+
1153+
上述程序在运行后,访问路径会卡死,该进程也无法正常被 wait 回收,挂载点也无法 umount:
1154+
1155+
```shell
1156+
$ ./deadlock mountpoint/
1157+
$ ls mountpoint/
1158+
(卡住)
1159+
```
1160+
1161+
```shell
1162+
$ ps aux | grep deadlock
1163+
username 150911 0.0 0.0 748452 1584 ? Ssl 16:09 0:00 ./deadlock mountpoint/
1164+
username 151414 0.0 0.0 9556 5960 pts/3 S+ 16:10 0:00 grep --color=auto deadlock
1165+
$ kill -9 150911
1166+
$ ps aux | grep deadlock
1167+
username 150911 0.0 0.0 0 0 ? Zsl 16:09 0:00 [deadlock] <defunct>
1168+
username 151555 0.0 0.0 9556 5804 pts/3 S+ 16:12 0:00 grep --color=auto deadlock
1169+
$ fusermount -u mountpoint/
1170+
fusermount: failed to unmount /path/to/mountpoint: Device or resource busy
1171+
```
1172+
1173+
此时需要使用内核 FUSE 暴露的控制接口强制关闭连接,详情参见[内核文档中 FUSE 的介绍](https://docs.kernel.org/filesystems/fuse.html):
1174+
1175+
```shell
1176+
$ ls /sys/fs/fuse/connections/
1177+
1292/ 1327/ 1381/
1178+
$ cat /sys/fs/fuse/connections/1381/waiting
1179+
21
1180+
$ # waiting 的值非 0,表明现在 1381 连接有进程在等待
1181+
$ # 正常情况下 waiting 的值应该是 0,如果持续为非 0,那么就可能出现了连接卡死的问题
1182+
$ echo 1 | sudo tee /sys/fs/fuse/connections/1381/abort
1183+
1
1184+
$ fusermount -u mountpoint/
1185+
```
1186+
11051187
[^sector]: 当然了,「扇区」的概念在现代磁盘,特别是固态硬盘上已经不再准确,但是这里仍然使用这个习惯性的术语。
11061188
[^sector-size]: 扇区的大小(特别是现代磁盘在实际物理上)不一定是 512 字节,但在实际创建分区时,一般都是以 512 字节为单位。
11071189
[^xfs_growfs]: [xfs_growfs(8)][xfs_growfs.8]: A filesystem with only 1 AG cannot be shrunk further, and a filesystem cannot be shrunk to the point where it would only have 1 AG.

0 commit comments

Comments
 (0)