Skip to content

Commit 4f30a60

Browse files
committed
Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range() implementation from Christian Brauner: "This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. This is coordinated with the FreeBSD folks which have copied our version of this syscall and in the meantime have already merged it in April 2019: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 The syscall originally came up in a discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor and other hacks. From looking at various large(ish) userspace code bases this or similar patterns are very common in service managers, container runtimes, and programming language runtimes/standard libraries such as Python or Rust. In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery. Based on Linus' suggestion close_range() also comes with a new flag CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping right before exec. This would usually be expressed in the sequence: unshare(CLONE_FILES); close_range(3, ~0U); as pointed out by Linus it might be desirable to have this be a part of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which gets especially handy when we're closing all file descriptors above a certain threshold. Test-suite as always included" * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add CLOSE_RANGE_UNSHARE tests close_range: add CLOSE_RANGE_UNSHARE tests: add close_range() tests arch: wire-up close_range() open: add close_range()
2 parents 74858ab + a5161ee commit 4f30a60

File tree

28 files changed

+405
-17
lines changed

28 files changed

+405
-17
lines changed

arch/alpha/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,7 @@
475475
543 common fspick sys_fspick
476476
544 common pidfd_open sys_pidfd_open
477477
# 545 reserved for clone3
478+
546 common close_range sys_close_range
478479
547 common openat2 sys_openat2
479480
548 common pidfd_getfd sys_pidfd_getfd
480481
549 common faccessat2 sys_faccessat2

arch/arm/tools/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,7 @@
449449
433 common fspick sys_fspick
450450
434 common pidfd_open sys_pidfd_open
451451
435 common clone3 sys_clone3
452+
436 common close_range sys_close_range
452453
437 common openat2 sys_openat2
453454
438 common pidfd_getfd sys_pidfd_getfd
454455
439 common faccessat2 sys_faccessat2

arch/arm64/include/asm/unistd32.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
879879
__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
880880
#define __NR_clone3 435
881881
__SYSCALL(__NR_clone3, sys_clone3)
882+
#define __NR_close_range 436
883+
__SYSCALL(__NR_close_range, sys_close_range)
882884
#define __NR_openat2 437
883885
__SYSCALL(__NR_openat2, sys_openat2)
884886
#define __NR_pidfd_getfd 438

arch/ia64/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@
356356
433 common fspick sys_fspick
357357
434 common pidfd_open sys_pidfd_open
358358
# 435 reserved for clone3
359+
436 common close_range sys_close_range
359360
437 common openat2 sys_openat2
360361
438 common pidfd_getfd sys_pidfd_getfd
361362
439 common faccessat2 sys_faccessat2

arch/m68k/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -435,6 +435,7 @@
435435
433 common fspick sys_fspick
436436
434 common pidfd_open sys_pidfd_open
437437
435 common clone3 __sys_clone3
438+
436 common close_range sys_close_range
438439
437 common openat2 sys_openat2
439440
438 common pidfd_getfd sys_pidfd_getfd
440441
439 common faccessat2 sys_faccessat2

arch/microblaze/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,7 @@
441441
433 common fspick sys_fspick
442442
434 common pidfd_open sys_pidfd_open
443443
435 common clone3 sys_clone3
444+
436 common close_range sys_close_range
444445
437 common openat2 sys_openat2
445446
438 common pidfd_getfd sys_pidfd_getfd
446447
439 common faccessat2 sys_faccessat2

arch/mips/kernel/syscalls/syscall_n32.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,7 @@
374374
433 n32 fspick sys_fspick
375375
434 n32 pidfd_open sys_pidfd_open
376376
435 n32 clone3 __sys_clone3
377+
436 n32 close_range sys_close_range
377378
437 n32 openat2 sys_openat2
378379
438 n32 pidfd_getfd sys_pidfd_getfd
379380
439 n32 faccessat2 sys_faccessat2

arch/mips/kernel/syscalls/syscall_n64.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,7 @@
350350
433 n64 fspick sys_fspick
351351
434 n64 pidfd_open sys_pidfd_open
352352
435 n64 clone3 __sys_clone3
353+
436 n64 close_range sys_close_range
353354
437 n64 openat2 sys_openat2
354355
438 n64 pidfd_getfd sys_pidfd_getfd
355356
439 n64 faccessat2 sys_faccessat2

arch/mips/kernel/syscalls/syscall_o32.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,7 @@
423423
433 o32 fspick sys_fspick
424424
434 o32 pidfd_open sys_pidfd_open
425425
435 o32 clone3 __sys_clone3
426+
436 o32 close_range sys_close_range
426427
437 o32 openat2 sys_openat2
427428
438 o32 pidfd_getfd sys_pidfd_getfd
428429
439 o32 faccessat2 sys_faccessat2

arch/parisc/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -433,6 +433,7 @@
433433
433 common fspick sys_fspick
434434
434 common pidfd_open sys_pidfd_open
435435
435 common clone3 sys_clone3_wrapper
436+
436 common close_range sys_close_range
436437
437 common openat2 sys_openat2
437438
438 common pidfd_getfd sys_pidfd_getfd
438439
439 common faccessat2 sys_faccessat2

0 commit comments

Comments
 (0)