Skip to content

Commit 0173275

Browse files
committed
Merge tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu: "x86 kprobes: - Use boolean for some function return instead of 0 and 1 - Prohibit probing on INT/UD. This prevents user to put kprobe on INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a special purpose in the kernel - Boost Grp instructions. Because a few percent of kernel instructions are Grp 2/3/4/5 and those are safe to be executed without ip register fixup, allow those to be boosted (direct execution on the trampoline buffer with a JMP) tracing: - Add function argument access from return events (kretprobe and fprobe). This allows user to compare how a data structure field is changed after executing a function. With BTF, return event also accepts function argument access by name. - Fix a wrong comment (using "Kretprobe" in fprobe) - Cleanup a big probe argument parser function into three parts, type parser, post-processing function, and main parser - Cleanup to set nr_args field when initializing trace_probe instead of counting up it while parsing - Cleanup a redundant #else block from tracefs/README source code - Update selftests to check entry argument access from return probes - Documentation update about entry argument access from return probes" * tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add entry argument access at function exit selftests/ftrace: Add test cases for entry args at function exit tracing/probes: Support $argN in return probe (kprobe and fprobe) tracing: Remove redundant #else block for BTF args from README tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init tracing/probes: Cleanup probe argument parser tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event x86/kprobes: Boost more instructions from grp2/3/4/5 x86/kprobes: Prohibit kprobing on INT and UD x86/kprobes: Refactor can_{probe,boost} return type to bool
2 parents c0a614e + e8c32f2 commit 0173275

File tree

16 files changed

+584
-199
lines changed

16 files changed

+584
-199
lines changed

Documentation/trace/fprobetrace.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,14 @@ Synopsis of fprobe-events
7070

7171
For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.
7272

73+
Function arguments at exit
74+
--------------------------
75+
Function arguments can be accessed at exit probe using $arg<N> fetcharg. This
76+
is useful to record the function parameter and return value at once, and
77+
trace the difference of structure fields (for debuging a function whether it
78+
correctly updates the given data structure or not)
79+
See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works.
80+
7381
BTF arguments
7482
-------------
7583
BTF (BPF Type Format) argument allows user to trace function and tracepoint
@@ -218,3 +226,26 @@ traceprobe event, you can trace that field as below.
218226
<idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
219227
kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
220228
<idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
229+
230+
.. _fprobetrace_exit_args_sample:
231+
232+
The return probe allows us to access the results of some functions, which returns
233+
the error code and its results are passed via function parameter, such as an
234+
structure-initialization function.
235+
236+
For example, vfs_open() will link the file structure to the inode and update
237+
mode. You can trace that changes with return probe.
238+
::
239+
240+
# echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
241+
# echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
242+
# echo 1 > events/fprobes/enable
243+
# cat trace
244+
sh-131 [006] ...1. 1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
245+
sh-131 [006] ...1. 1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
246+
cat-143 [007] ...1. 1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
247+
cat-143 [007] ...1. 1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
248+
cat-143 [007] ...1. 1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
249+
cat-143 [007] ...1. 1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8
250+
251+
You can see the `file::f_mode` and `file::f_inode` are upated in `vfs_open()`.

Documentation/trace/kprobetrace.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,15 @@ Synopsis of kprobe_events
7070
(\*3) this is useful for fetching a field of data structures.
7171
(\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
7272

73+
Function arguments at kretprobe
74+
-------------------------------
75+
Function arguments can be accessed at kretprobe using $arg<N> fetcharg. This
76+
is useful to record the function parameter and return value at once, and
77+
trace the difference of structure fields (for debuging a function whether it
78+
correctly updates the given data structure or not).
79+
See the :ref:`sample<fprobetrace_exit_args_sample>` in fprobe event for how
80+
it works.
81+
7382
.. _kprobetrace_types:
7483

7584
Types

arch/x86/kernel/kprobes/common.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
#endif
7979

8080
/* Ensure if the instruction can be boostable */
81-
extern int can_boost(struct insn *insn, void *orig_addr);
81+
extern bool can_boost(struct insn *insn, void *orig_addr);
8282
/* Recover instruction if given address is probed */
8383
extern unsigned long recover_probed_instruction(kprobe_opcode_t *buf,
8484
unsigned long addr);

arch/x86/kernel/kprobes/core.c

Lines changed: 68 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -137,30 +137,30 @@ NOKPROBE_SYMBOL(synthesize_relcall);
137137
* Returns non-zero if INSN is boostable.
138138
* RIP relative instructions are adjusted at copying time in 64 bits mode
139139
*/
140-
int can_boost(struct insn *insn, void *addr)
140+
bool can_boost(struct insn *insn, void *addr)
141141
{
142142
kprobe_opcode_t opcode;
143143
insn_byte_t prefix;
144144
int i;
145145

146146
if (search_exception_tables((unsigned long)addr))
147-
return 0; /* Page fault may occur on this address. */
147+
return false; /* Page fault may occur on this address. */
148148

149149
/* 2nd-byte opcode */
150150
if (insn->opcode.nbytes == 2)
151151
return test_bit(insn->opcode.bytes[1],
152152
(unsigned long *)twobyte_is_boostable);
153153

154154
if (insn->opcode.nbytes != 1)
155-
return 0;
155+
return false;
156156

157157
for_each_insn_prefix(insn, i, prefix) {
158158
insn_attr_t attr;
159159

160160
attr = inat_get_opcode_attribute(prefix);
161161
/* Can't boost Address-size override prefix and CS override prefix */
162162
if (prefix == 0x2e || inat_is_address_size_prefix(attr))
163-
return 0;
163+
return false;
164164
}
165165

166166
opcode = insn->opcode.bytes[0];
@@ -169,24 +169,35 @@ int can_boost(struct insn *insn, void *addr)
169169
case 0x62: /* bound */
170170
case 0x70 ... 0x7f: /* Conditional jumps */
171171
case 0x9a: /* Call far */
172-
case 0xc0 ... 0xc1: /* Grp2 */
173172
case 0xcc ... 0xce: /* software exceptions */
174-
case 0xd0 ... 0xd3: /* Grp2 */
175173
case 0xd6: /* (UD) */
176174
case 0xd8 ... 0xdf: /* ESC */
177175
case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
178176
case 0xe8 ... 0xe9: /* near Call, JMP */
179177
case 0xeb: /* Short JMP */
180178
case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
179+
/* ... are not boostable */
180+
return false;
181+
case 0xc0 ... 0xc1: /* Grp2 */
182+
case 0xd0 ... 0xd3: /* Grp2 */
183+
/*
184+
* AMD uses nnn == 110 as SHL/SAL, but Intel makes it reserved.
185+
*/
186+
return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b110;
181187
case 0xf6 ... 0xf7: /* Grp3 */
188+
/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
189+
return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b001;
182190
case 0xfe: /* Grp4 */
183-
/* ... are not boostable */
184-
return 0;
191+
/* Only INC and DEC are boostable */
192+
return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
193+
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001;
185194
case 0xff: /* Grp5 */
186-
/* Only indirect jmp is boostable */
187-
return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
195+
/* Only INC, DEC, and indirect JMP are boostable */
196+
return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
197+
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001 ||
198+
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b100;
188199
default:
189-
return 1;
200+
return true;
190201
}
191202
}
192203

@@ -252,21 +263,40 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
252263
return __recover_probed_insn(buf, addr);
253264
}
254265

255-
/* Check if paddr is at an instruction boundary */
256-
static int can_probe(unsigned long paddr)
266+
/* Check if insn is INT or UD */
267+
static inline bool is_exception_insn(struct insn *insn)
268+
{
269+
/* UD uses 0f escape */
270+
if (insn->opcode.bytes[0] == 0x0f) {
271+
/* UD0 / UD1 / UD2 */
272+
return insn->opcode.bytes[1] == 0xff ||
273+
insn->opcode.bytes[1] == 0xb9 ||
274+
insn->opcode.bytes[1] == 0x0b;
275+
}
276+
277+
/* INT3 / INT n / INTO / INT1 */
278+
return insn->opcode.bytes[0] == 0xcc ||
279+
insn->opcode.bytes[0] == 0xcd ||
280+
insn->opcode.bytes[0] == 0xce ||
281+
insn->opcode.bytes[0] == 0xf1;
282+
}
283+
284+
/*
285+
* Check if paddr is at an instruction boundary and that instruction can
286+
* be probed
287+
*/
288+
static bool can_probe(unsigned long paddr)
257289
{
258290
unsigned long addr, __addr, offset = 0;
259291
struct insn insn;
260292
kprobe_opcode_t buf[MAX_INSN_SIZE];
261293

262294
if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
263-
return 0;
295+
return false;
264296

265297
/* Decode instructions */
266298
addr = paddr - offset;
267299
while (addr < paddr) {
268-
int ret;
269-
270300
/*
271301
* Check if the instruction has been modified by another
272302
* kprobe, in which case we replace the breakpoint by the
@@ -277,11 +307,10 @@ static int can_probe(unsigned long paddr)
277307
*/
278308
__addr = recover_probed_instruction(buf, addr);
279309
if (!__addr)
280-
return 0;
310+
return false;
281311

282-
ret = insn_decode_kernel(&insn, (void *)__addr);
283-
if (ret < 0)
284-
return 0;
312+
if (insn_decode_kernel(&insn, (void *)__addr) < 0)
313+
return false;
285314

286315
#ifdef CONFIG_KGDB
287316
/*
@@ -290,10 +319,26 @@ static int can_probe(unsigned long paddr)
290319
*/
291320
if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
292321
kgdb_has_hit_break(addr))
293-
return 0;
322+
return false;
294323
#endif
295324
addr += insn.length;
296325
}
326+
327+
/* Check if paddr is at an instruction boundary */
328+
if (addr != paddr)
329+
return false;
330+
331+
__addr = recover_probed_instruction(buf, addr);
332+
if (!__addr)
333+
return false;
334+
335+
if (insn_decode_kernel(&insn, (void *)__addr) < 0)
336+
return false;
337+
338+
/* INT and UD are special and should not be kprobed */
339+
if (is_exception_insn(&insn))
340+
return false;
341+
297342
if (IS_ENABLED(CONFIG_CFI_CLANG)) {
298343
/*
299344
* The compiler generates the following instruction sequence
@@ -308,13 +353,6 @@ static int can_probe(unsigned long paddr)
308353
* Also, these movl and addl are used for showing expected
309354
* type. So those must not be touched.
310355
*/
311-
__addr = recover_probed_instruction(buf, addr);
312-
if (!__addr)
313-
return 0;
314-
315-
if (insn_decode_kernel(&insn, (void *)__addr) < 0)
316-
return 0;
317-
318356
if (insn.opcode.value == 0xBA)
319357
offset = 12;
320358
else if (insn.opcode.value == 0x3)
@@ -324,11 +362,11 @@ static int can_probe(unsigned long paddr)
324362

325363
/* This movl/addl is used for decoding CFI. */
326364
if (is_cfi_trap(addr + offset))
327-
return 0;
365+
return false;
328366
}
329367

330368
out:
331-
return (addr == paddr);
369+
return true;
332370
}
333371

334372
/* If x86 supports IBT (ENDBR) it must be skipped. */

kernel/trace/trace.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5747,16 +5747,15 @@ static const char readme_msg[] =
57475747
"\t args: <name>=fetcharg[:type]\n"
57485748
"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
57495749
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
5750-
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
57515750
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
5751+
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
57525752
"\t <argname>[->field[->field|.field...]],\n"
5753-
#else
5754-
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
57555753
#endif
57565754
#else
57575755
"\t $stack<index>, $stack, $retval, $comm,\n"
57585756
#endif
57595757
"\t +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
5758+
"\t kernel return probes support: $retval, $arg<N>, $comm\n"
57605759
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
57615760
"\t b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
57625761
"\t symstr, <type>\\[<array-size>\\]\n"

kernel/trace/trace_eprobe.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ static struct trace_eprobe *alloc_event_probe(const char *group,
220220
if (!ep->event_system)
221221
goto error;
222222

223-
ret = trace_probe_init(&ep->tp, this_event, group, false);
223+
ret = trace_probe_init(&ep->tp, this_event, group, false, nargs);
224224
if (ret < 0)
225225
goto error;
226226

@@ -390,8 +390,8 @@ static int get_eprobe_size(struct trace_probe *tp, void *rec)
390390

391391
/* Note that we don't verify it, since the code does not come from user space */
392392
static int
393-
process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
394-
void *base)
393+
process_fetch_insn(struct fetch_insn *code, void *rec, void *edata,
394+
void *dest, void *base)
395395
{
396396
unsigned long val;
397397
int ret;
@@ -438,7 +438,7 @@ __eprobe_trace_func(struct eprobe_data *edata, void *rec)
438438
return;
439439

440440
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
441-
store_trace_args(&entry[1], &edata->ep->tp, rec, sizeof(*entry), dsize);
441+
store_trace_args(&entry[1], &edata->ep->tp, rec, NULL, sizeof(*entry), dsize);
442442

443443
trace_event_buffer_commit(&fbuffer);
444444
}

0 commit comments

Comments
 (0)