Skip to content

Commit 881dadf

Browse files
Alexandre Ghitipalmer-dabbelt
authored andcommitted
Merge patch series "riscv: ftrace: atmoic patching and preempt improvements"
Andy Chiu <[email protected]> says: This series makes atomic code patching in ftrace possible and eliminates the need of the stop_machine dance. The major difference of this version is that we merge the CALL_OPS support from Puranjay [1] and make direct calls available for practical uses such as BPF. Thanks for the time reviewing the series and suggestions, we hope this version gets a step closer to happening in the upstream. Please reference the link to v3 below for more introductory view of the implementation [2] Added patch: 2, 4, 10, 11, 12 Modified patch: 5, 6 Unchanged patch: 1, 3, 7, 8, 9 (1, 8 has commit msg modified) Special thanks to Björn for his efforts on testing and guiding the series! [1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/linux-riscv/[email protected]/ * patches from https://lore.kernel.org/r/[email protected]: riscv: Documentation: add a description about dynamic ftrace riscv: ftrace: support direct call using call_ops riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS riscv: ftrace: support PREEMPT riscv: add a data fence for CMODX in the kernel mode riscv: vector: Support calling schedule() for preemptible Vector riscv: ftrace: do not use stop_machine to update code riscv: ftrace: prepare ftrace for atomic code patching kernel: ftrace: export ftrace_sync_ipi riscv: ftrace: align patchable functions to 4 Byte boundary riscv: ftrace factor out code defined by !WITH_ARG riscv: ftrace: support fastcc in Clang for WITH_ARGS Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Ghiti <[email protected]>
2 parents c3cc2a4 + d8ac85d commit 881dadf

File tree

12 files changed

+331
-207
lines changed

12 files changed

+331
-207
lines changed

Documentation/arch/riscv/cmodx.rst

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,45 @@ modified by the program itself. Instruction storage and the instruction cache
1010
program must enforce its own synchronization with the unprivileged fence.i
1111
instruction.
1212

13-
However, the default Linux ABI prohibits the use of fence.i in userspace
14-
applications. At any point the scheduler may migrate a task onto a new hart. If
15-
migration occurs after the userspace synchronized the icache and instruction
16-
storage with fence.i, the icache on the new hart will no longer be clean. This
17-
is due to the behavior of fence.i only affecting the hart that it is called on.
18-
Thus, the hart that the task has been migrated to may not have synchronized
19-
instruction storage and icache.
13+
CMODX in the Kernel Space
14+
---------------------
15+
16+
Dynamic ftrace
17+
---------------------
18+
19+
Essentially, dynamic ftrace directs the control flow by inserting a function
20+
call at each patchable function entry, and patches it dynamically at runtime to
21+
enable or disable the redirection. In the case of RISC-V, 2 instructions,
22+
AUIPC + JALR, are required to compose a function call. However, it is impossible
23+
to patch 2 instructions and expect that a concurrent read-side executes them
24+
without a race condition. This series makes atmoic code patching possible in
25+
RISC-V ftrace. Kernel preemption makes things even worse as it allows the old
26+
state to persist across the patching process with stop_machine().
27+
28+
In order to get rid of stop_machine() and run dynamic ftrace with full kernel
29+
preemption, we partially initialize each patchable function entry at boot-time,
30+
setting the first instruction to AUIPC, and the second to NOP. Now, atmoic
31+
patching is possible because the kernel only has to update one instruction.
32+
According to Ziccif, as long as an instruction is naturally aligned, the ISA
33+
guarantee an atomic update.
34+
35+
By fixing down the first instruction, AUIPC, the range of the ftrace trampoline
36+
is limited to +-2K from the predetermined target, ftrace_caller, due to the lack
37+
of immediate encoding space in RISC-V. To address the issue, we introduce
38+
CALL_OPS, where an 8B naturally align metadata is added in front of each
39+
pacthable function. The metadata is resolved at the first trampoline, then the
40+
execution can be derect to another custom trampoline.
41+
42+
CMODX in the User Space
43+
---------------------
44+
45+
Though fence.i is an unprivileged instruction, the default Linux ABI prohibits
46+
the use of fence.i in userspace applications. At any point the scheduler may
47+
migrate a task onto a new hart. If migration occurs after the userspace
48+
synchronized the icache and instruction storage with fence.i, the icache on the
49+
new hart will no longer be clean. This is due to the behavior of fence.i only
50+
affecting the hart that it is called on. Thus, the hart that the task has been
51+
migrated to may not have synchronized instruction storage and icache.
2052

2153
There are two ways to solve this problem: use the riscv_flush_icache() syscall,
2254
or use the ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` prctl() and emit fence.i in

arch/riscv/Kconfig

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ config RISCV
9999
select EDAC_SUPPORT
100100
select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
101101
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if DYNAMIC_FTRACE
102+
select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS
102103
select GENERIC_ARCH_TOPOLOGY
103104
select GENERIC_ATOMIC64 if !64BIT
104105
select GENERIC_CLOCKEVENTS_BROADCAST if SMP
@@ -151,13 +152,15 @@ config RISCV
151152
select HAVE_DEBUG_KMEMLEAK
152153
select HAVE_DMA_CONTIGUOUS if MMU
153154
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
154-
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
155+
select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C
156+
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
157+
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG)
155158
select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE
156159
select HAVE_FTRACE_GRAPH_FUNC
157160
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
158161
select HAVE_FUNCTION_GRAPH_TRACER if HAVE_DYNAMIC_FTRACE_WITH_ARGS
159162
select HAVE_FUNCTION_GRAPH_FREGS
160-
select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
163+
select HAVE_FUNCTION_TRACER if !XIP_KERNEL
161164
select HAVE_EBPF_JIT if MMU
162165
select HAVE_GUP_FAST if MMU
163166
select HAVE_FUNCTION_ARG_ACCESS_API
@@ -237,6 +240,7 @@ config CLANG_SUPPORTS_DYNAMIC_FTRACE
237240
config GCC_SUPPORTS_DYNAMIC_FTRACE
238241
def_bool CC_IS_GCC
239242
depends on $(cc-option,-fpatchable-function-entry=8)
243+
depends on CC_HAS_MIN_FUNCTION_ALIGNMENT || !RISCV_ISA_C
240244

241245
config HAVE_SHADOW_CALL_STACK
242246
def_bool $(cc-option,-fsanitize=shadow-call-stack)

arch/riscv/Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
1515
LDFLAGS_vmlinux += --no-relax
1616
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
1717
ifeq ($(CONFIG_RISCV_ISA_C),y)
18-
CC_FLAGS_FTRACE := -fpatchable-function-entry=4
18+
CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4
1919
else
20-
CC_FLAGS_FTRACE := -fpatchable-function-entry=2
20+
CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
2121
endif
2222
endif
2323

arch/riscv/include/asm/ftrace.h

Lines changed: 35 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
2020
#define ftrace_return_address(n) return_address(n)
2121

2222
void _mcount(void);
23-
static inline unsigned long ftrace_call_adjust(unsigned long addr)
24-
{
25-
return addr;
26-
}
23+
unsigned long ftrace_call_adjust(unsigned long addr);
24+
unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
25+
#define ftrace_get_symaddr(fentry_ip) arch_ftrace_get_symaddr(fentry_ip)
2726

2827
/*
2928
* Let's do like x86/arm64 and ignore the compat syscalls.
@@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
5756
* 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
5857
* return address (original pc + 4)
5958
*
59+
* The first 2 instructions for each tracable function is compiled to 2 nop
60+
* instructions. Then, the kernel initializes the first instruction to auipc at
61+
* boot time (<ftrace disable>). The second instruction is patched to jalr to
62+
* start the trace.
63+
*
64+
*<Image>:
65+
* 0: nop
66+
* 4: nop
67+
*
6068
*<ftrace enable>:
61-
* 0: auipc t0/ra, 0x?
62-
* 4: jalr t0/ra, ?(t0/ra)
69+
* 0: auipc t0, 0x?
70+
* 4: jalr t0, ?(t0)
6371
*
6472
*<ftrace disable>:
65-
* 0: nop
73+
* 0: auipc t0, 0x?
6674
* 4: nop
6775
*
6876
* Dynamic ftrace generates probes to call sites, so we must deal with
@@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
7583
#define AUIPC_OFFSET_MASK (0xfffff000)
7684
#define AUIPC_PAD (0x00001000)
7785
#define JALR_SHIFT 20
78-
#define JALR_RA (0x000080e7)
79-
#define AUIPC_RA (0x00000097)
8086
#define JALR_T0 (0x000282e7)
8187
#define AUIPC_T0 (0x00000297)
88+
#define JALR_RANGE (JALR_SIGN_MASK - 1)
8289

8390
#define to_jalr_t0(offset) \
8491
(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
@@ -96,26 +103,14 @@ do { \
96103
call[1] = to_jalr_t0(offset); \
97104
} while (0)
98105

99-
#define to_jalr_ra(offset) \
100-
(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
101-
102-
#define to_auipc_ra(offset) \
103-
((offset & JALR_SIGN_MASK) ? \
104-
(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) : \
105-
((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
106-
107-
#define make_call_ra(caller, callee, call) \
108-
do { \
109-
unsigned int offset = \
110-
(unsigned long) (callee) - (unsigned long) (caller); \
111-
call[0] = to_auipc_ra(offset); \
112-
call[1] = to_jalr_ra(offset); \
113-
} while (0)
114-
115106
/*
116-
* Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
107+
* Only the jalr insn in the auipc+jalr is patched, so we make it 4
108+
* bytes here.
117109
*/
118-
#define MCOUNT_INSN_SIZE 8
110+
#define MCOUNT_INSN_SIZE 4
111+
#define MCOUNT_AUIPC_SIZE 4
112+
#define MCOUNT_JALR_SIZE 4
113+
#define MCOUNT_NOP4_SIZE 4
119114

120115
#ifndef __ASSEMBLY__
121116
struct dyn_ftrace;
@@ -135,6 +130,9 @@ struct __arch_ftrace_regs {
135130
unsigned long sp;
136131
unsigned long s0;
137132
unsigned long t1;
133+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
134+
unsigned long direct_tramp;
135+
#endif
138136
union {
139137
unsigned long args[8];
140138
struct {
@@ -146,6 +144,13 @@ struct __arch_ftrace_regs {
146144
unsigned long a5;
147145
unsigned long a6;
148146
unsigned long a7;
147+
#ifdef CONFIG_CC_IS_CLANG
148+
unsigned long t2;
149+
unsigned long t3;
150+
unsigned long t4;
151+
unsigned long t5;
152+
unsigned long t6;
153+
#endif
149154
};
150155
};
151156
};
@@ -221,10 +226,13 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
221226
struct ftrace_ops *op, struct ftrace_regs *fregs);
222227
#define ftrace_graph_func ftrace_graph_func
223228

229+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
224230
static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr)
225231
{
226232
arch_ftrace_regs(fregs)->t1 = addr;
227233
}
234+
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
235+
228236
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
229237

230238
#endif /* __ASSEMBLY__ */

arch/riscv/include/asm/processor.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ struct pt_regs;
7979
* Thus, the task does not own preempt_v. Any use of Vector will have to
8080
* save preempt_v, if dirty, and fallback to non-preemptible kernel-mode
8181
* Vector.
82+
* - bit 29: The thread voluntarily calls schedule() while holding an active
83+
* preempt_v. All preempt_v context should be dropped in such case because
84+
* V-regs are caller-saved. Only sstatus.VS=ON is persisted across a
85+
* schedule() call.
8286
* - bit 30: The in-kernel preempt_v context is saved, and requries to be
8387
* restored when returning to the context that owns the preempt_v.
8488
* - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
@@ -93,6 +97,7 @@ struct pt_regs;
9397
#define RISCV_PREEMPT_V 0x00000100
9498
#define RISCV_PREEMPT_V_DIRTY 0x80000000
9599
#define RISCV_PREEMPT_V_NEED_RESTORE 0x40000000
100+
#define RISCV_PREEMPT_V_IN_SCHEDULE 0x20000000
96101

97102
/* CPU-specific state of a task */
98103
struct thread_struct {

arch/riscv/include/asm/vector.h

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,11 @@ static __always_inline void riscv_v_disable(void)
120120
csr_clear(CSR_SSTATUS, SR_VS);
121121
}
122122

123+
static __always_inline bool riscv_v_is_on(void)
124+
{
125+
return !!(csr_read(CSR_SSTATUS) & SR_VS);
126+
}
127+
123128
static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
124129
{
125130
asm volatile (
@@ -366,6 +371,11 @@ static inline void __switch_to_vector(struct task_struct *prev,
366371
struct pt_regs *regs;
367372

368373
if (riscv_preempt_v_started(prev)) {
374+
if (riscv_v_is_on()) {
375+
WARN_ON(prev->thread.riscv_v_flags & RISCV_V_CTX_DEPTH_MASK);
376+
riscv_v_disable();
377+
prev->thread.riscv_v_flags |= RISCV_PREEMPT_V_IN_SCHEDULE;
378+
}
369379
if (riscv_preempt_v_dirty(prev)) {
370380
__riscv_v_vstate_save(&prev->thread.kernel_vstate,
371381
prev->thread.kernel_vstate.datap);
@@ -376,10 +386,16 @@ static inline void __switch_to_vector(struct task_struct *prev,
376386
riscv_v_vstate_save(&prev->thread.vstate, regs);
377387
}
378388

379-
if (riscv_preempt_v_started(next))
380-
riscv_preempt_v_set_restore(next);
381-
else
389+
if (riscv_preempt_v_started(next)) {
390+
if (next->thread.riscv_v_flags & RISCV_PREEMPT_V_IN_SCHEDULE) {
391+
next->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_IN_SCHEDULE;
392+
riscv_v_enable();
393+
} else {
394+
riscv_preempt_v_set_restore(next);
395+
}
396+
} else {
382397
riscv_v_vstate_set_restore(next, task_pt_regs(next));
398+
}
383399
}
384400

385401
void riscv_v_vstate_ctrl_init(struct task_struct *tsk);

arch/riscv/kernel/asm-offsets.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,12 @@ void asm_offsets(void)
493493
DEFINE(STACKFRAME_SIZE_ON_STACK, ALIGN(sizeof(struct stackframe), STACK_ALIGN));
494494
OFFSET(STACKFRAME_FP, stackframe, fp);
495495
OFFSET(STACKFRAME_RA, stackframe, ra);
496+
#ifdef CONFIG_FUNCTION_TRACER
497+
DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func));
498+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
499+
DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call));
500+
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
501+
#endif
496502

497503
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
498504
DEFINE(FREGS_SIZE_ON_STACK, ALIGN(sizeof(struct __arch_ftrace_regs), STACK_ALIGN));
@@ -501,6 +507,13 @@ void asm_offsets(void)
501507
DEFINE(FREGS_SP, offsetof(struct __arch_ftrace_regs, sp));
502508
DEFINE(FREGS_S0, offsetof(struct __arch_ftrace_regs, s0));
503509
DEFINE(FREGS_T1, offsetof(struct __arch_ftrace_regs, t1));
510+
#ifdef CONFIG_CC_IS_CLANG
511+
DEFINE(FREGS_T2, offsetof(struct __arch_ftrace_regs, t2));
512+
DEFINE(FREGS_T3, offsetof(struct __arch_ftrace_regs, t3));
513+
DEFINE(FREGS_T4, offsetof(struct __arch_ftrace_regs, t4));
514+
DEFINE(FREGS_T5, offsetof(struct __arch_ftrace_regs, t5));
515+
DEFINE(FREGS_T6, offsetof(struct __arch_ftrace_regs, t6));
516+
#endif
504517
DEFINE(FREGS_A0, offsetof(struct __arch_ftrace_regs, a0));
505518
DEFINE(FREGS_A1, offsetof(struct __arch_ftrace_regs, a1));
506519
DEFINE(FREGS_A2, offsetof(struct __arch_ftrace_regs, a2));

0 commit comments

Comments
 (0)