Skip to content

Commit dd1a756

Browse files
anakryikoPeter Zijlstra
authored andcommitted
uprobes: SRCU-protect uretprobe lifetime (with timeout)
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take uretprobe-specific SRCU lock and keep it active as kernel transfers control back to user space. Given we can't rely on user space returning from traced function within reasonable time period, we need to make sure not to keep SRCU lock active for too long, though. To that effect, we employ a timer callback which is meant to terminate SRCU lock region after predefined timeout (currently set to 100ms), and instead transfer underlying struct uprobe's lifetime protection to refcounting. This fallback to less scalable refcounting after 100ms is a fine tradeoff from uretprobe's scalability and performance perspective, because uretprobing *long running* user functions inherently doesn't run into scalability issues (there is just not enough frequency to cause noticeable issues with either performance or scalability). The overall trick is in ensuring synchronization between current thread and timer's callback fired on some other thread. To cope with that with minimal logic complications, we add hprobe wrapper which is used to contain all the synchronization related issues behind a small number of basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected state to refcounted state, and a hprobe_consume() and hprobe_finalize() pair of single-use consuming helpers. Other than that, whatever current thread's logic is there stays the same, as timer thread cannot modify return_instance state (or add new/remove old return_instances). It only takes care of SRCU unlock and uprobe refcounting, which is hidden from the higher-level uretprobe handling logic. We use atomic xchg() in hprobe_consume(), which is called from performance critical handle_uretprobe_chain() function run in the current context. When uncontended, this xchg() doesn't seem to hurt performance as there are no other competing CPUs fighting for the same cache line. We also mark struct return_instance as ____cacheline_aligned to ensure no false sharing can happen. Another technical moment. We need to make sure that the list of return instances can be safely traversed under RCU from timer callback, so we delay return_instance freeing with kfree_rcu() and make sure that list modifications use RCU-aware operations. Also, given SRCU lock survives transition from kernel to user space and back we need to use lower-level __srcu_read_lock() and __srcu_read_unlock() to avoid lockdep complaining. Just to give an impression of a kind of performance improvements this change brings, below are benchmarking results with and without these SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to uprobe lookup) are left intact: WITHOUT SRCU for uretprobes =========================== uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu) uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu) uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu) uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu) uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu) uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu) uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu) uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu) uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu) WITH SRCU for uretprobes ======================== uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu) uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu) uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu) uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu) uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu) uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu) uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu) uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu) Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 2bf8e5a commit dd1a756

File tree

2 files changed

+304
-37
lines changed

2 files changed

+304
-37
lines changed

include/linux/uprobes.h

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <linux/rbtree.h>
1616
#include <linux/types.h>
1717
#include <linux/wait.h>
18+
#include <linux/timer.h>
1819

1920
struct uprobe;
2021
struct vm_area_struct;
@@ -67,6 +68,53 @@ enum uprobe_task_state {
6768
UTASK_SSTEP_TRAPPED,
6869
};
6970

71+
/* The state of hybrid-lifetime uprobe inside struct return_instance */
72+
enum hprobe_state {
73+
HPROBE_LEASED, /* uretprobes_srcu-protected uprobe */
74+
HPROBE_STABLE, /* refcounted uprobe */
75+
HPROBE_GONE, /* NULL uprobe, SRCU expired, refcount failed */
76+
HPROBE_CONSUMED, /* uprobe "consumed" by uretprobe handler */
77+
};
78+
79+
/*
80+
* Hybrid lifetime uprobe. Represents a uprobe instance that could be either
81+
* SRCU protected (with SRCU protection eventually potentially timing out),
82+
* refcounted using uprobe->ref, or there could be no valid uprobe (NULL).
83+
*
84+
* hprobe's internal state is setup such that background timer thread can
85+
* atomically "downgrade" temporarily RCU-protected uprobe into refcounted one
86+
* (or no uprobe, if refcounting failed).
87+
*
88+
* *stable* pointer always point to the uprobe (or could be NULL if there is
89+
* was no valid underlying uprobe to begin with).
90+
*
91+
* *leased* pointer is the key to achieving race-free atomic lifetime state
92+
* transition and can have three possible states:
93+
* - either the same non-NULL value as *stable*, in which case uprobe is
94+
* SRCU-protected;
95+
* - NULL, in which case uprobe (if there is any) is refcounted;
96+
* - special __UPROBE_DEAD value, which represents an uprobe that was SRCU
97+
* protected initially, but SRCU period timed out and we attempted to
98+
* convert it to refcounted, but refcount_inc_not_zero() failed, because
99+
* uprobe effectively went away (the last consumer unsubscribed). In this
100+
* case it's important to know that *stable* pointer (which still has
101+
* non-NULL uprobe pointer) shouldn't be used, because lifetime of
102+
* underlying uprobe is not guaranteed anymore. __UPROBE_DEAD is just an
103+
* internal marker and is handled transparently by hprobe_fetch() helper.
104+
*
105+
* When uprobe is SRCU-protected, we also record srcu_idx value, necessary for
106+
* SRCU unlocking.
107+
*
108+
* See hprobe_expire() and hprobe_fetch() for details of race-free uprobe
109+
* state transitioning details. It all hinges on atomic xchg() over *leaded*
110+
* pointer. *stable* pointer, once initially set, is not modified concurrently.
111+
*/
112+
struct hprobe {
113+
enum hprobe_state state;
114+
int srcu_idx;
115+
struct uprobe *uprobe;
116+
};
117+
70118
/*
71119
* uprobe_task: Metadata of a task while it singlesteps.
72120
*/
@@ -86,6 +134,7 @@ struct uprobe_task {
86134
};
87135

88136
struct uprobe *active_uprobe;
137+
struct timer_list ri_timer;
89138
unsigned long xol_vaddr;
90139

91140
struct arch_uprobe *auprobe;
@@ -100,17 +149,18 @@ struct return_consumer {
100149
};
101150

102151
struct return_instance {
103-
struct uprobe *uprobe;
152+
struct hprobe hprobe;
104153
unsigned long func;
105154
unsigned long stack; /* stack pointer */
106155
unsigned long orig_ret_vaddr; /* original return address */
107156
bool chained; /* true, if instance is nested */
108157
int consumers_cnt;
109158

110159
struct return_instance *next; /* keep as stack */
160+
struct rcu_head rcu;
111161

112162
struct return_consumer consumers[] __counted_by(consumers_cnt);
113-
};
163+
} ____cacheline_aligned;
114164

115165
enum rp_check {
116166
RP_CHECK_CALL,

0 commit comments

Comments
 (0)