Skip to content

Commit ed65df6

Browse files
committed
tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on making ftrace_test_recursion_trylock() disable preemption, I discovered a path that makes the "not do the logic if bit is zero" unsafe. The recursion logic is done in hot paths like the function tracer. Thus, any code executed causes noticeable overhead. Thus, tricks are done to try to limit the amount of code executed. This included the recursion testing logic. Having recursion testing is important, as there are many paths that can end up in an infinite recursion cycle when tracing every function in the kernel. Thus protection is needed to prevent that from happening. Because it is OK to recurse due to different running context levels (e.g. an interrupt preempts a trace, and then a trace occurs in the interrupt handler), a set of bits are used to know which context one is in (normal, softirq, irq and NMI). If a recursion occurs in the same level, it is prevented*. Then there are infrastructure levels of recursion as well. When more than one callback is attached to the same function to trace, it calls a loop function to iterate over all the callbacks. Both the callbacks and the loop function have recursion protection. The callbacks use the "ftrace_test_recursion_trylock()" which has a "function" set of context bits to test, and the loop function calls the internal trace_test_and_set_recursion() directly, with an "internal" set of bits. If an architecture does not implement all the features supported by ftrace then the callbacks are never called directly, and the loop function is called instead, which will implement the features of ftrace. Since both the loop function and the callbacks do recursion protection, it was seemed unnecessary to do it in both locations. Thus, a trick was made to have the internal set of recursion bits at a more significant bit location than the function bits. Then, if any of the higher bits were set, the logic of the function bits could be skipped, as any new recursion would first have to go through the loop function. This is true for architectures that do not support all the ftrace features, because all functions being traced must first go through the loop function before going to the callbacks. But this is not true for architectures that support all the ftrace features. That's because the loop function could be called due to two callbacks attached to the same function, but then a recursion function inside the callback could be called that does not share any other callback, and it will be called directly. i.e. traced_function_1: [ more than one callback tracing it ] call loop_func loop_func: trace_recursion set internal bit call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 traced_function_2: [ only traced by above callback ] call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 [ wash, rinse, repeat, BOOM! out of shampoo! ] Thus, the "bit == 0 skip" trick is not safe, unless the loop function is call for all functions. Since we want to encourage architectures to implement all ftrace features, having them slow down due to this extra logic may encourage the maintainers to update to the latest ftrace features. And because this logic is only safe for them, remove it completely. [*] There is on layer of recursion that is allowed, and that is to allow for the transition between interrupt context (normal -> softirq -> irq -> NMI), because a trace may occur before the context update is visible to the trace recursion logic. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Cc: Linus Torvalds <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Joe Lawrence <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Jisheng Zhang <[email protected]> Cc: =?utf-8?b?546L6LSH?= <[email protected]> Cc: Guo Ren <[email protected]> Cc: [email protected] Fixes: edc15ca ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
1 parent be358af commit ed65df6

File tree

2 files changed

+11
-42
lines changed

2 files changed

+11
-42
lines changed

include/linux/trace_recursion.h

Lines changed: 9 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -16,36 +16,23 @@
1616
* When function tracing occurs, the following steps are made:
1717
* If arch does not support a ftrace feature:
1818
* call internal function (uses INTERNAL bits) which calls...
19-
* If callback is registered to the "global" list, the list
20-
* function is called and recursion checks the GLOBAL bits.
21-
* then this function calls...
2219
* The function callback, which can use the FTRACE bits to
2320
* check for recursion.
24-
*
25-
* Now if the arch does not support a feature, and it calls
26-
* the global list function which calls the ftrace callback
27-
* all three of these steps will do a recursion protection.
28-
* There's no reason to do one if the previous caller already
29-
* did. The recursion that we are protecting against will
30-
* go through the same steps again.
31-
*
32-
* To prevent the multiple recursion checks, if a recursion
33-
* bit is set that is higher than the MAX bit of the current
34-
* check, then we know that the check was made by the previous
35-
* caller, and we can skip the current check.
3621
*/
3722
enum {
3823
/* Function recursion bits */
3924
TRACE_FTRACE_BIT,
4025
TRACE_FTRACE_NMI_BIT,
4126
TRACE_FTRACE_IRQ_BIT,
4227
TRACE_FTRACE_SIRQ_BIT,
28+
TRACE_FTRACE_TRANSITION_BIT,
4329

44-
/* INTERNAL_BITs must be greater than FTRACE_BITs */
30+
/* Internal use recursion bits */
4531
TRACE_INTERNAL_BIT,
4632
TRACE_INTERNAL_NMI_BIT,
4733
TRACE_INTERNAL_IRQ_BIT,
4834
TRACE_INTERNAL_SIRQ_BIT,
35+
TRACE_INTERNAL_TRANSITION_BIT,
4936

5037
TRACE_BRANCH_BIT,
5138
/*
@@ -86,12 +73,6 @@ enum {
8673
*/
8774
TRACE_GRAPH_NOTRACE_BIT,
8875

89-
/*
90-
* When transitioning between context, the preempt_count() may
91-
* not be correct. Allow for a single recursion to cover this case.
92-
*/
93-
TRACE_TRANSITION_BIT,
94-
9576
/* Used to prevent recursion recording from recursing. */
9677
TRACE_RECORD_RECURSION_BIT,
9778
};
@@ -113,12 +94,10 @@ enum {
11394
#define TRACE_CONTEXT_BITS 4
11495

11596
#define TRACE_FTRACE_START TRACE_FTRACE_BIT
116-
#define TRACE_FTRACE_MAX ((1 << (TRACE_FTRACE_START + TRACE_CONTEXT_BITS)) - 1)
11797

11898
#define TRACE_LIST_START TRACE_INTERNAL_BIT
119-
#define TRACE_LIST_MAX ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
12099

121-
#define TRACE_CONTEXT_MASK TRACE_LIST_MAX
100+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
122101

123102
/*
124103
* Used for setting context
@@ -132,6 +111,7 @@ enum {
132111
TRACE_CTX_IRQ,
133112
TRACE_CTX_SOFTIRQ,
134113
TRACE_CTX_NORMAL,
114+
TRACE_CTX_TRANSITION,
135115
};
136116

137117
static __always_inline int trace_get_context_bit(void)
@@ -160,45 +140,34 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
160140
#endif
161141

162142
static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
163-
int start, int max)
143+
int start)
164144
{
165145
unsigned int val = READ_ONCE(current->trace_recursion);
166146
int bit;
167147

168-
/* A previous recursion check was made */
169-
if ((val & TRACE_CONTEXT_MASK) > max)
170-
return 0;
171-
172148
bit = trace_get_context_bit() + start;
173149
if (unlikely(val & (1 << bit))) {
174150
/*
175151
* It could be that preempt_count has not been updated during
176152
* a switch between contexts. Allow for a single recursion.
177153
*/
178-
bit = TRACE_TRANSITION_BIT;
154+
bit = TRACE_CTX_TRANSITION + start;
179155
if (val & (1 << bit)) {
180156
do_ftrace_record_recursion(ip, pip);
181157
return -1;
182158
}
183-
} else {
184-
/* Normal check passed, clear the transition to allow it again */
185-
val &= ~(1 << TRACE_TRANSITION_BIT);
186159
}
187160

188161
val |= 1 << bit;
189162
current->trace_recursion = val;
190163
barrier();
191164

192-
return bit + 1;
165+
return bit;
193166
}
194167

195168
static __always_inline void trace_clear_recursion(int bit)
196169
{
197-
if (!bit)
198-
return;
199-
200170
barrier();
201-
bit--;
202171
trace_recursion_clear(bit);
203172
}
204173

@@ -214,7 +183,7 @@ static __always_inline void trace_clear_recursion(int bit)
214183
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
215184
unsigned long parent_ip)
216185
{
217-
return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
186+
return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
218187
}
219188

220189
/**

kernel/trace/ftrace.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6977,7 +6977,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
69776977
struct ftrace_ops *op;
69786978
int bit;
69796979

6980-
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
6980+
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
69816981
if (bit < 0)
69826982
return;
69836983

@@ -7052,7 +7052,7 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
70527052
{
70537053
int bit;
70547054

7055-
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
7055+
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
70567056
if (bit < 0)
70577057
return;
70587058

0 commit comments

Comments
 (0)