Skip to content

Commit a9194f8

Browse files
committed
coredump: add coredump socket
Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server uses SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX sockets directly. - The pidfd for the crashing task will grow new information how the task coredumps. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers that have and continue to cause significant CVEs. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can e.g., just keep a worker pool. Link: https://lore.kernel.org/[email protected] Acked-by: Luca Boccassi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Alexander Mikhalitsyn <[email protected]> Reviewed-by: Jann Horn <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
1 parent 1c587ee commit a9194f8

File tree

3 files changed

+177
-17
lines changed

3 files changed

+177
-17
lines changed

fs/coredump.c

Lines changed: 135 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,11 @@
4444
#include <linux/sysctl.h>
4545
#include <linux/elf.h>
4646
#include <linux/pidfs.h>
47+
#include <linux/net.h>
48+
#include <linux/socket.h>
49+
#include <net/net_namespace.h>
4750
#include <uapi/linux/pidfd.h>
51+
#include <uapi/linux/un.h>
4852

4953
#include <linux/uaccess.h>
5054
#include <asm/mmu_context.h>
@@ -79,6 +83,7 @@ unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
7983
enum coredump_type_t {
8084
COREDUMP_FILE = 1,
8185
COREDUMP_PIPE = 2,
86+
COREDUMP_SOCK = 3,
8287
};
8388

8489
struct core_name {
@@ -232,13 +237,16 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
232237
cn->corename = NULL;
233238
if (*pat_ptr == '|')
234239
cn->core_type = COREDUMP_PIPE;
240+
else if (*pat_ptr == '@')
241+
cn->core_type = COREDUMP_SOCK;
235242
else
236243
cn->core_type = COREDUMP_FILE;
237244
if (expand_corename(cn, core_name_size))
238245
return -ENOMEM;
239246
cn->corename[0] = '\0';
240247

241-
if (cn->core_type == COREDUMP_PIPE) {
248+
switch (cn->core_type) {
249+
case COREDUMP_PIPE: {
242250
int argvs = sizeof(core_pattern) / 2;
243251
(*argv) = kmalloc_array(argvs, sizeof(**argv), GFP_KERNEL);
244252
if (!(*argv))
@@ -247,6 +255,45 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
247255
++pat_ptr;
248256
if (!(*pat_ptr))
249257
return -ENOMEM;
258+
break;
259+
}
260+
case COREDUMP_SOCK: {
261+
/* skip the @ */
262+
pat_ptr++;
263+
if (!(*pat_ptr))
264+
return -ENOMEM;
265+
266+
err = cn_printf(cn, "%s", pat_ptr);
267+
if (err)
268+
return err;
269+
270+
/* Require absolute paths. */
271+
if (cn->corename[0] != '/')
272+
return -EINVAL;
273+
274+
/*
275+
* Ensure we can uses spaces to indicate additional
276+
* parameters in the future.
277+
*/
278+
if (strchr(cn->corename, ' ')) {
279+
coredump_report_failure("Coredump socket may not %s contain spaces", cn->corename);
280+
return -EINVAL;
281+
}
282+
283+
/*
284+
* Currently no need to parse any other options.
285+
* Relevant information can be retrieved from the peer
286+
* pidfd retrievable via SO_PEERPIDFD by the receiver or
287+
* via /proc/<pid>, using the SO_PEERPIDFD to guard
288+
* against pid recycling when opening /proc/<pid>.
289+
*/
290+
return 0;
291+
}
292+
case COREDUMP_FILE:
293+
break;
294+
default:
295+
WARN_ON_ONCE(true);
296+
return -EINVAL;
250297
}
251298

252299
/* Repeat as long as we have more pattern to process and more output
@@ -395,6 +442,7 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
395442
* the filename. Do not do this for piped commands. */
396443
if (cn->core_type == COREDUMP_FILE && !pid_in_pattern && core_uses_pid)
397444
return cn_printf(cn, ".%d", task_tgid_vnr(current));
445+
398446
return 0;
399447
}
400448

@@ -798,6 +846,53 @@ void do_coredump(const kernel_siginfo_t *siginfo)
798846
}
799847
break;
800848
}
849+
case COREDUMP_SOCK: {
850+
#ifdef CONFIG_UNIX
851+
struct file *file __free(fput) = NULL;
852+
struct sockaddr_un addr = {
853+
.sun_family = AF_UNIX,
854+
};
855+
ssize_t addr_len;
856+
struct socket *socket;
857+
858+
addr_len = strscpy(addr.sun_path, cn.corename);
859+
if (addr_len < 0)
860+
goto close_fail;
861+
addr_len += offsetof(struct sockaddr_un, sun_path) + 1;
862+
863+
/*
864+
* It is possible that the userspace process which is
865+
* supposed to handle the coredump and is listening on
866+
* the AF_UNIX socket coredumps. Userspace should just
867+
* mark itself non dumpable.
868+
*/
869+
870+
retval = sock_create_kern(&init_net, AF_UNIX, SOCK_STREAM, 0, &socket);
871+
if (retval < 0)
872+
goto close_fail;
873+
874+
file = sock_alloc_file(socket, 0, NULL);
875+
if (IS_ERR(file))
876+
goto close_fail;
877+
878+
retval = kernel_connect(socket, (struct sockaddr *)(&addr),
879+
addr_len, O_NONBLOCK | SOCK_COREDUMP);
880+
if (retval) {
881+
if (retval == -EAGAIN)
882+
coredump_report_failure("Coredump socket %s receive queue full", addr.sun_path);
883+
else
884+
coredump_report_failure("Coredump socket connection %s failed %d", addr.sun_path, retval);
885+
goto close_fail;
886+
}
887+
888+
cprm.limit = RLIM_INFINITY;
889+
cprm.file = no_free_ptr(file);
890+
#else
891+
coredump_report_failure("Core dump socket support %s disabled", cn.corename);
892+
goto close_fail;
893+
#endif
894+
break;
895+
}
801896
default:
802897
WARN_ON_ONCE(true);
803898
goto close_fail;
@@ -835,8 +930,44 @@ void do_coredump(const kernel_siginfo_t *siginfo)
835930
file_end_write(cprm.file);
836931
free_vma_snapshot(&cprm);
837932
}
838-
if ((cn.core_type == COREDUMP_PIPE) && core_pipe_limit)
839-
wait_for_dump_helpers(cprm.file);
933+
934+
#ifdef CONFIG_UNIX
935+
/* Let userspace know we're done processing the coredump. */
936+
if (sock_from_file(cprm.file))
937+
kernel_sock_shutdown(sock_from_file(cprm.file), SHUT_WR);
938+
#endif
939+
940+
/*
941+
* When core_pipe_limit is set we wait for the coredump server
942+
* or usermodehelper to finish before exiting so it can e.g.,
943+
* inspect /proc/<pid>.
944+
*/
945+
if (core_pipe_limit) {
946+
switch (cn.core_type) {
947+
case COREDUMP_PIPE:
948+
wait_for_dump_helpers(cprm.file);
949+
break;
950+
#ifdef CONFIG_UNIX
951+
case COREDUMP_SOCK: {
952+
ssize_t n;
953+
954+
/*
955+
* We use a simple read to wait for the coredump
956+
* processing to finish. Either the socket is
957+
* closed or we get sent unexpected data. In
958+
* both cases, we're done.
959+
*/
960+
n = __kernel_read(cprm.file, &(char){ 0 }, 1, NULL);
961+
if (n != 0)
962+
coredump_report_failure("Unexpected data on coredump socket");
963+
break;
964+
}
965+
#endif
966+
default:
967+
break;
968+
}
969+
}
970+
840971
close_fail:
841972
if (cprm.file)
842973
filp_close(cprm.file, NULL);
@@ -1066,7 +1197,7 @@ EXPORT_SYMBOL(dump_align);
10661197
void validate_coredump_safety(void)
10671198
{
10681199
if (suid_dumpable == SUID_DUMP_ROOT &&
1069-
core_pattern[0] != '/' && core_pattern[0] != '|') {
1200+
core_pattern[0] != '/' && core_pattern[0] != '|' && core_pattern[0] != '@') {
10701201

10711202
coredump_report_failure("Unsafe core_pattern used with fs.suid_dumpable=2: "
10721203
"pipe handler or fully qualified core dump path required. "

include/linux/net.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ enum sock_type {
8181
#ifndef SOCK_NONBLOCK
8282
#define SOCK_NONBLOCK O_NONBLOCK
8383
#endif
84+
#define SOCK_COREDUMP O_NOCTTY
8485

8586
#endif /* ARCH_HAS_SOCKET_TYPES */
8687

net/unix/af_unix.c

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,13 @@
8585
#include <linux/file.h>
8686
#include <linux/filter.h>
8787
#include <linux/fs.h>
88+
#include <linux/fs_struct.h>
8889
#include <linux/init.h>
8990
#include <linux/kernel.h>
9091
#include <linux/mount.h>
9192
#include <linux/namei.h>
93+
#include <linux/net.h>
94+
#include <linux/pidfs.h>
9295
#include <linux/poll.h>
9396
#include <linux/proc_fs.h>
9497
#include <linux/sched/signal.h>
@@ -100,7 +103,6 @@
100103
#include <linux/splice.h>
101104
#include <linux/string.h>
102105
#include <linux/uaccess.h>
103-
#include <linux/pidfs.h>
104106
#include <net/af_unix.h>
105107
#include <net/net_namespace.h>
106108
#include <net/scm.h>
@@ -1146,21 +1148,47 @@ static int unix_release(struct socket *sock)
11461148
}
11471149

11481150
static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
1149-
int type)
1151+
int type, int flags)
11501152
{
11511153
struct inode *inode;
11521154
struct path path;
11531155
struct sock *sk;
11541156
int err;
11551157

11561158
unix_mkname_bsd(sunaddr, addr_len);
1157-
err = kern_path(sunaddr->sun_path, LOOKUP_FOLLOW, &path);
1158-
if (err)
1159-
goto fail;
11601159

1161-
err = path_permission(&path, MAY_WRITE);
1162-
if (err)
1163-
goto path_put;
1160+
if (flags & SOCK_COREDUMP) {
1161+
const struct cred *cred;
1162+
struct cred *kcred;
1163+
struct path root;
1164+
1165+
kcred = prepare_kernel_cred(&init_task);
1166+
if (!kcred) {
1167+
err = -ENOMEM;
1168+
goto fail;
1169+
}
1170+
1171+
task_lock(&init_task);
1172+
get_fs_root(init_task.fs, &root);
1173+
task_unlock(&init_task);
1174+
1175+
cred = override_creds(kcred);
1176+
err = vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path,
1177+
LOOKUP_BENEATH | LOOKUP_NO_SYMLINKS |
1178+
LOOKUP_NO_MAGICLINKS, &path);
1179+
put_cred(revert_creds(cred));
1180+
path_put(&root);
1181+
if (err)
1182+
goto fail;
1183+
} else {
1184+
err = kern_path(sunaddr->sun_path, LOOKUP_FOLLOW, &path);
1185+
if (err)
1186+
goto fail;
1187+
1188+
err = path_permission(&path, MAY_WRITE);
1189+
if (err)
1190+
goto path_put;
1191+
}
11641192

11651193
err = -ECONNREFUSED;
11661194
inode = d_backing_inode(path.dentry);
@@ -1210,12 +1238,12 @@ static struct sock *unix_find_abstract(struct net *net,
12101238

12111239
static struct sock *unix_find_other(struct net *net,
12121240
struct sockaddr_un *sunaddr,
1213-
int addr_len, int type)
1241+
int addr_len, int type, int flags)
12141242
{
12151243
struct sock *sk;
12161244

12171245
if (sunaddr->sun_path[0])
1218-
sk = unix_find_bsd(sunaddr, addr_len, type);
1246+
sk = unix_find_bsd(sunaddr, addr_len, type, flags);
12191247
else
12201248
sk = unix_find_abstract(net, sunaddr, addr_len, type);
12211249

@@ -1473,7 +1501,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
14731501
}
14741502

14751503
restart:
1476-
other = unix_find_other(sock_net(sk), sunaddr, alen, sock->type);
1504+
other = unix_find_other(sock_net(sk), sunaddr, alen, sock->type, 0);
14771505
if (IS_ERR(other)) {
14781506
err = PTR_ERR(other);
14791507
goto out;
@@ -1620,7 +1648,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
16201648

16211649
restart:
16221650
/* Find listening sock. */
1623-
other = unix_find_other(net, sunaddr, addr_len, sk->sk_type);
1651+
other = unix_find_other(net, sunaddr, addr_len, sk->sk_type, flags);
16241652
if (IS_ERR(other)) {
16251653
err = PTR_ERR(other);
16261654
goto out_free_skb;
@@ -2089,7 +2117,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
20892117
if (msg->msg_namelen) {
20902118
lookup:
20912119
other = unix_find_other(sock_net(sk), msg->msg_name,
2092-
msg->msg_namelen, sk->sk_type);
2120+
msg->msg_namelen, sk->sk_type, 0);
20932121
if (IS_ERR(other)) {
20942122
err = PTR_ERR(other);
20952123
goto out_free;

0 commit comments

Comments
 (0)