Skip to content

Commit a3b4ca6

Browse files
committed
Merge patch series "coredump: add coredump socket"
Christian Brauner <[email protected]> says: Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server should use SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX socket directly. - The pidfd for the crashing task will contain information how the task coredumps. The PIDFD_GET_INFO ioctl gained a new flag PIDFD_INFO_COREDUMP which can be used to retreive the coredump information. If the coredump gets a new coredump client connection the kernel guarantees that PIDFD_INFO_COREDUMP information is available. Currently the following information is provided in the new @coredump_mask extension to struct pidfd_info: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict access to the generated coredump to sufficiently privileged users. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can just keep a worker pool. * patches from https://lore.kernel.org/[email protected]: selftests/coredump: add tests for AF_UNIX coredumps selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure coredump: validate socket name as it is written coredump: show supported coredump modes pidfs, coredump: add PIDFD_INFO_COREDUMP coredump: add coredump socket coredump: reflow dump helpers a little coredump: massage do_coredump() coredump: massage format_corename() Link: https://lore.kernel.org/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2 parents 4dd6566 + 7b6724f commit a3b4ca6

File tree

8 files changed

+916
-106
lines changed

8 files changed

+916
-106
lines changed

fs/coredump.c

Lines changed: 310 additions & 92 deletions
Large diffs are not rendered by default.

fs/pidfs.c

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include <linux/time_namespace.h>
2121
#include <linux/utsname.h>
2222
#include <net/net_namespace.h>
23+
#include <linux/coredump.h>
2324

2425
#include "internal.h"
2526
#include "mount.h"
@@ -33,6 +34,7 @@ static struct kmem_cache *pidfs_cachep __ro_after_init;
3334
struct pidfs_exit_info {
3435
__u64 cgroupid;
3536
__s32 exit_code;
37+
__u32 coredump_mask;
3638
};
3739

3840
struct pidfs_inode {
@@ -240,6 +242,22 @@ static inline bool pid_in_current_pidns(const struct pid *pid)
240242
return false;
241243
}
242244

245+
static __u32 pidfs_coredump_mask(unsigned long mm_flags)
246+
{
247+
switch (__get_dumpable(mm_flags)) {
248+
case SUID_DUMP_USER:
249+
return PIDFD_COREDUMP_USER;
250+
case SUID_DUMP_ROOT:
251+
return PIDFD_COREDUMP_ROOT;
252+
case SUID_DUMP_DISABLE:
253+
return PIDFD_COREDUMP_SKIP;
254+
default:
255+
WARN_ON_ONCE(true);
256+
}
257+
258+
return 0;
259+
}
260+
243261
static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
244262
{
245263
struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
@@ -280,6 +298,11 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
280298
}
281299
}
282300

301+
if (mask & PIDFD_INFO_COREDUMP) {
302+
kinfo.mask |= PIDFD_INFO_COREDUMP;
303+
kinfo.coredump_mask = READ_ONCE(pidfs_i(inode)->__pei.coredump_mask);
304+
}
305+
283306
task = get_pid_task(pid, PIDTYPE_PID);
284307
if (!task) {
285308
/*
@@ -296,6 +319,13 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
296319
if (!c)
297320
return -ESRCH;
298321

322+
if (!(kinfo.mask & PIDFD_INFO_COREDUMP)) {
323+
task_lock(task);
324+
if (task->mm)
325+
kinfo.coredump_mask = pidfs_coredump_mask(task->mm->flags);
326+
task_unlock(task);
327+
}
328+
299329
/* Unconditionally return identifiers and credentials, the rest only on request */
300330

301331
user_ns = current_user_ns();
@@ -559,6 +589,31 @@ void pidfs_exit(struct task_struct *tsk)
559589
}
560590
}
561591

592+
#ifdef CONFIG_COREDUMP
593+
void pidfs_coredump(const struct coredump_params *cprm)
594+
{
595+
struct pid *pid = cprm->pid;
596+
struct pidfs_exit_info *exit_info;
597+
struct dentry *dentry;
598+
struct inode *inode;
599+
__u32 coredump_mask = 0;
600+
601+
dentry = pid->stashed;
602+
if (WARN_ON_ONCE(!dentry))
603+
return;
604+
605+
inode = d_inode(dentry);
606+
exit_info = &pidfs_i(inode)->__pei;
607+
/* Note how we were coredumped. */
608+
coredump_mask = pidfs_coredump_mask(cprm->mm_flags);
609+
/* Note that we actually did coredump. */
610+
coredump_mask |= PIDFD_COREDUMPED;
611+
/* If coredumping is set to skip we should never end up here. */
612+
VFS_WARN_ON_ONCE(coredump_mask & PIDFD_COREDUMP_SKIP);
613+
smp_store_release(&exit_info->coredump_mask, coredump_mask);
614+
}
615+
#endif
616+
562617
static struct vfsmount *pidfs_mnt __ro_after_init;
563618

564619
/*

include/linux/net.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ enum sock_type {
8181
#ifndef SOCK_NONBLOCK
8282
#define SOCK_NONBLOCK O_NONBLOCK
8383
#endif
84+
#define SOCK_COREDUMP O_NOCTTY
8485

8586
#endif /* ARCH_HAS_SOCKET_TYPES */
8687

include/linux/pidfs.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,16 @@
22
#ifndef _LINUX_PID_FS_H
33
#define _LINUX_PID_FS_H
44

5+
struct coredump_params;
6+
57
struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags);
68
void __init pidfs_init(void);
79
void pidfs_add_pid(struct pid *pid);
810
void pidfs_remove_pid(struct pid *pid);
911
void pidfs_exit(struct task_struct *tsk);
12+
#ifdef CONFIG_COREDUMP
13+
void pidfs_coredump(const struct coredump_params *cprm);
14+
#endif
1015
extern const struct dentry_operations pidfs_dentry_operations;
1116
int pidfs_register_pid(struct pid *pid);
1217
void pidfs_get_pid(struct pid *pid);

include/uapi/linux/pidfd.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,23 @@
2525
#define PIDFD_INFO_CREDS (1UL << 1) /* Always returned, even if not requested */
2626
#define PIDFD_INFO_CGROUPID (1UL << 2) /* Always returned if available, even if not requested */
2727
#define PIDFD_INFO_EXIT (1UL << 3) /* Only returned if requested. */
28+
#define PIDFD_INFO_COREDUMP (1UL << 4) /* Only returned if requested. */
2829

2930
#define PIDFD_INFO_SIZE_VER0 64 /* sizeof first published struct */
3031

32+
/*
33+
* Values for @coredump_mask in pidfd_info.
34+
* Only valid if PIDFD_INFO_COREDUMP is set in @mask.
35+
*
36+
* Note, the @PIDFD_COREDUMP_ROOT flag indicates that the generated
37+
* coredump should be treated as sensitive and access should only be
38+
* granted to privileged users.
39+
*/
40+
#define PIDFD_COREDUMPED (1U << 0) /* Did crash and... */
41+
#define PIDFD_COREDUMP_SKIP (1U << 1) /* coredumping generation was skipped. */
42+
#define PIDFD_COREDUMP_USER (1U << 2) /* coredump was done as the user. */
43+
#define PIDFD_COREDUMP_ROOT (1U << 3) /* coredump was done as root. */
44+
3145
/*
3246
* The concept of process and threads in userland and the kernel is a confusing
3347
* one - within the kernel every thread is a 'task' with its own individual PID,
@@ -92,6 +106,8 @@ struct pidfd_info {
92106
__u32 fsuid;
93107
__u32 fsgid;
94108
__s32 exit_code;
109+
__u32 coredump_mask;
110+
__u32 __spare1;
95111
};
96112

97113
#define PIDFS_IOCTL_MAGIC 0xFF

net/unix/af_unix.c

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,13 @@
8585
#include <linux/file.h>
8686
#include <linux/filter.h>
8787
#include <linux/fs.h>
88+
#include <linux/fs_struct.h>
8889
#include <linux/init.h>
8990
#include <linux/kernel.h>
9091
#include <linux/mount.h>
9192
#include <linux/namei.h>
93+
#include <linux/net.h>
94+
#include <linux/pidfs.h>
9295
#include <linux/poll.h>
9396
#include <linux/proc_fs.h>
9497
#include <linux/sched/signal.h>
@@ -100,7 +103,6 @@
100103
#include <linux/splice.h>
101104
#include <linux/string.h>
102105
#include <linux/uaccess.h>
103-
#include <linux/pidfs.h>
104106
#include <net/af_unix.h>
105107
#include <net/net_namespace.h>
106108
#include <net/scm.h>
@@ -1146,21 +1148,47 @@ static int unix_release(struct socket *sock)
11461148
}
11471149

11481150
static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
1149-
int type)
1151+
int type, int flags)
11501152
{
11511153
struct inode *inode;
11521154
struct path path;
11531155
struct sock *sk;
11541156
int err;
11551157

11561158
unix_mkname_bsd(sunaddr, addr_len);
1157-
err = kern_path(sunaddr->sun_path, LOOKUP_FOLLOW, &path);
1158-
if (err)
1159-
goto fail;
11601159

1161-
err = path_permission(&path, MAY_WRITE);
1162-
if (err)
1163-
goto path_put;
1160+
if (flags & SOCK_COREDUMP) {
1161+
const struct cred *cred;
1162+
struct cred *kcred;
1163+
struct path root;
1164+
1165+
kcred = prepare_kernel_cred(&init_task);
1166+
if (!kcred) {
1167+
err = -ENOMEM;
1168+
goto fail;
1169+
}
1170+
1171+
task_lock(&init_task);
1172+
get_fs_root(init_task.fs, &root);
1173+
task_unlock(&init_task);
1174+
1175+
cred = override_creds(kcred);
1176+
err = vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path,
1177+
LOOKUP_BENEATH | LOOKUP_NO_SYMLINKS |
1178+
LOOKUP_NO_MAGICLINKS, &path);
1179+
put_cred(revert_creds(cred));
1180+
path_put(&root);
1181+
if (err)
1182+
goto fail;
1183+
} else {
1184+
err = kern_path(sunaddr->sun_path, LOOKUP_FOLLOW, &path);
1185+
if (err)
1186+
goto fail;
1187+
1188+
err = path_permission(&path, MAY_WRITE);
1189+
if (err)
1190+
goto path_put;
1191+
}
11641192

11651193
err = -ECONNREFUSED;
11661194
inode = d_backing_inode(path.dentry);
@@ -1210,12 +1238,12 @@ static struct sock *unix_find_abstract(struct net *net,
12101238

12111239
static struct sock *unix_find_other(struct net *net,
12121240
struct sockaddr_un *sunaddr,
1213-
int addr_len, int type)
1241+
int addr_len, int type, int flags)
12141242
{
12151243
struct sock *sk;
12161244

12171245
if (sunaddr->sun_path[0])
1218-
sk = unix_find_bsd(sunaddr, addr_len, type);
1246+
sk = unix_find_bsd(sunaddr, addr_len, type, flags);
12191247
else
12201248
sk = unix_find_abstract(net, sunaddr, addr_len, type);
12211249

@@ -1473,7 +1501,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
14731501
}
14741502

14751503
restart:
1476-
other = unix_find_other(sock_net(sk), sunaddr, alen, sock->type);
1504+
other = unix_find_other(sock_net(sk), sunaddr, alen, sock->type, 0);
14771505
if (IS_ERR(other)) {
14781506
err = PTR_ERR(other);
14791507
goto out;
@@ -1620,7 +1648,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
16201648

16211649
restart:
16221650
/* Find listening sock. */
1623-
other = unix_find_other(net, sunaddr, addr_len, sk->sk_type);
1651+
other = unix_find_other(net, sunaddr, addr_len, sk->sk_type, flags);
16241652
if (IS_ERR(other)) {
16251653
err = PTR_ERR(other);
16261654
goto out_free_skb;
@@ -2089,7 +2117,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
20892117
if (msg->msg_namelen) {
20902118
lookup:
20912119
other = unix_find_other(sock_net(sk), msg->msg_name,
2092-
msg->msg_namelen, sk->sk_type);
2120+
msg->msg_namelen, sk->sk_type, 0);
20932121
if (IS_ERR(other)) {
20942122
err = PTR_ERR(other);
20952123
goto out_free;

0 commit comments

Comments
 (0)