Skip to content

Commit 5fcf896

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf-mitigate-spectre-v1-using-barriers'
Luis Gerhorst says: ==================== This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs. The approach was previously presented at LPC'24 [1] and RAID'24 [2]. To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs. To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()). For this, it relies on the fact that speculation barriers generally prevent all later instructions from executing if the speculation was not correct (not only loads). See patch 7 ("bpf: Fall back to nospec for Spectre v1") for a detailed description and references to the relevant vendor documentation (AMD and Intel x86-64, ARM64, and PowerPC). In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected. I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5). Main design decisions are: * Do not use separate bytecode insns for v1 and v4 barriers (inspired by Daniel Borkmann's question at LPC). This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be. * Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel). * Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1. [1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") Changes: * v3 -> v4: - Remove insn parameter from do_check_insn() and extract process_bpf_exit_full as a function as requested by Eduard - Investigate apparent sanitize_check_bounds() bug reported by Kartikeya (does appear to not be a bug but only confusing code), sent separate patch to document it and add an assert - Remove already-merged commit 1 ("selftests/bpf: Fix caps for __xlated/jited_unpriv") - Drop former commit 10 ("bpf: Allow nospec-protected var-offset stack access") as it did not include a test and there are other places where var-off is rejected. Also, none of the tested real-world programs used var-off in the paper. Therefore keep the old behavior for now and potentially prepare a patch that converts all cases later if required. - Add link to AMD lfence and PowerPC speculation barrier (ori 31,31,0) documentation - Move detailed barrier documentation to commit 7 ("bpf: Fall back to nospec for Spectre v1") - Link to v3: https://lore.kernel.org/all/[email protected]/ * v2 -> v3: - Fix https://lore.kernel.org/oe-kbuild-all/[email protected]/ and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to filter.h (where similar bpf_jit_*() prototypes live) as they would still have to be duplicated in bpf.h to be usable to bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an option). - Fix https://lore.kernel.org/oe-kbuild-all/[email protected]/ by moving the variable declarations out of the switch-case. - Build touched C files with W=2 and bpf config on x86 to check that there are no other warnings introduced. - Found 3 more checkpatch warnings that can be fixed without degrading readability. - Rebase to bpf-next 2025-05-01 - Link to v2: https://lore.kernel.org/bpf/[email protected]/ * v1 -> v2: - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11 ("bpf: Fall back to nospec for spec path verification") as suggested by Alexei. This series therefore no longer changes push_stack() to return PTR_ERR. - Add detailed explanation of how lfence works internally and how it affects the algorithm. - Add tests checking that nospec instructions are inserted in expected locations using __xlated_unpriv as suggested by Eduard (also, include a fix for __xlated_unpriv) - Add a test for the mitigations from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches") - Remove unused variables from do_check[_insn]() as suggested by Eduard. - Remove INSN_IDX_MODIFIED to improve readability as suggested by Eduard. This also causes the nospec_result-check to run (and fail) for jumping-ops. Add a warning to assert that this check must never succeed in that case. - Add details on the safety of patch 10 ("bpf: Allow nospec-protected var-offset stack access") based on the feedback on v1. - Rebase to bpf-next-250420 - Link to v1: https://lore.kernel.org/all/[email protected]/ * RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions - Link to RFC: https://lore.kernel.org/bpf/[email protected]/ ==================== Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 2bc0575 + 4a8765d commit 5fcf896

File tree

17 files changed

+601
-320
lines changed

17 files changed

+601
-320
lines changed

arch/arm64/net/bpf_jit.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,4 +325,9 @@
325325
#define A64_MRS_SP_EL0(Rt) \
326326
aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_SP_EL0)
327327

328+
/* Barriers */
329+
#define A64_SB aarch64_insn_get_sb_value()
330+
#define A64_DSB_NSH (aarch64_insn_get_dsb_base_value() | 0x7 << 8)
331+
#define A64_ISB aarch64_insn_get_isb_value()
332+
328333
#endif /* _BPF_JIT_H */

arch/arm64/net/bpf_jit_comp.c

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1630,17 +1630,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
16301630
return ret;
16311631
break;
16321632

1633-
/* speculation barrier */
1633+
/* speculation barrier against v1 and v4 */
16341634
case BPF_ST | BPF_NOSPEC:
1635-
/*
1636-
* Nothing required here.
1637-
*
1638-
* In case of arm64, we rely on the firmware mitigation of
1639-
* Speculative Store Bypass as controlled via the ssbd kernel
1640-
* parameter. Whenever the mitigation is enabled, it works
1641-
* for all of the kernel code with no need to provide any
1642-
* additional instructions.
1643-
*/
1635+
if (alternative_has_cap_likely(ARM64_HAS_SB)) {
1636+
emit(A64_SB, ctx);
1637+
} else {
1638+
emit(A64_DSB_NSH, ctx);
1639+
emit(A64_ISB, ctx);
1640+
}
16441641
break;
16451642

16461643
/* ST: *(size *)(dst + off) = imm */
@@ -2911,6 +2908,17 @@ bool bpf_jit_supports_percpu_insn(void)
29112908
return true;
29122909
}
29132910

2911+
bool bpf_jit_bypass_spec_v4(void)
2912+
{
2913+
/* In case of arm64, we rely on the firmware mitigation of Speculative
2914+
* Store Bypass as controlled via the ssbd kernel parameter. Whenever
2915+
* the mitigation is enabled, it works for all of the kernel code with
2916+
* no need to provide any additional instructions. Therefore, skip
2917+
* inserting nospec insns against Spectre v4.
2918+
*/
2919+
return true;
2920+
}
2921+
29142922
bool bpf_jit_inlines_helper_call(s32 imm)
29152923
{
29162924
switch (imm) {

arch/powerpc/net/bpf_jit_comp64.c

Lines changed: 60 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,23 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
370370
return 0;
371371
}
372372

373+
bool bpf_jit_bypass_spec_v1(void)
374+
{
375+
#if defined(CONFIG_PPC_E500) || defined(CONFIG_PPC_BOOK3S_64)
376+
return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
377+
security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR));
378+
#else
379+
return true;
380+
#endif
381+
}
382+
383+
bool bpf_jit_bypass_spec_v4(void)
384+
{
385+
return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
386+
security_ftr_enabled(SEC_FTR_STF_BARRIER) &&
387+
stf_barrier_type_get() != STF_BARRIER_NONE);
388+
}
389+
373390
/*
374391
* We spill into the redzone always, even if the bpf program has its own stackframe.
375392
* Offsets hardcoded based on BPF_PPC_STACK_SAVE -- see bpf_jit_stack_local()
@@ -397,6 +414,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
397414
u32 *addrs, int pass, bool extra_pass)
398415
{
399416
enum stf_barrier_type stf_barrier = stf_barrier_type_get();
417+
bool sync_emitted, ori31_emitted;
400418
const struct bpf_insn *insn = fp->insnsi;
401419
int flen = fp->len;
402420
int i, ret;
@@ -789,30 +807,52 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
789807

790808
/*
791809
* BPF_ST NOSPEC (speculation barrier)
810+
*
811+
* The following must act as a barrier against both Spectre v1
812+
* and v4 if we requested both mitigations. Therefore, also emit
813+
* 'isync; sync' on E500 or 'ori31' on BOOK3S_64 in addition to
814+
* the insns needed for a Spectre v4 barrier.
815+
*
816+
* If we requested only !bypass_spec_v1 OR only !bypass_spec_v4,
817+
* we can skip the respective other barrier type as an
818+
* optimization.
792819
*/
793820
case BPF_ST | BPF_NOSPEC:
794-
if (!security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) ||
795-
!security_ftr_enabled(SEC_FTR_STF_BARRIER))
796-
break;
797-
798-
switch (stf_barrier) {
799-
case STF_BARRIER_EIEIO:
800-
EMIT(PPC_RAW_EIEIO() | 0x02000000);
801-
break;
802-
case STF_BARRIER_SYNC_ORI:
821+
sync_emitted = false;
822+
ori31_emitted = false;
823+
#ifdef CONFIG_PPC_E500
824+
if (!bpf_jit_bypass_spec_v1()) {
825+
EMIT(PPC_RAW_ISYNC());
803826
EMIT(PPC_RAW_SYNC());
804-
EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0));
805-
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
806-
break;
807-
case STF_BARRIER_FALLBACK:
808-
ctx->seen |= SEEN_FUNC;
809-
PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier));
810-
EMIT(PPC_RAW_MTCTR(_R12));
811-
EMIT(PPC_RAW_BCTRL());
812-
break;
813-
case STF_BARRIER_NONE:
814-
break;
827+
sync_emitted = true;
828+
}
829+
#endif
830+
if (!bpf_jit_bypass_spec_v4()) {
831+
switch (stf_barrier) {
832+
case STF_BARRIER_EIEIO:
833+
EMIT(PPC_RAW_EIEIO() | 0x02000000);
834+
break;
835+
case STF_BARRIER_SYNC_ORI:
836+
if (!sync_emitted)
837+
EMIT(PPC_RAW_SYNC());
838+
EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0));
839+
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
840+
ori31_emitted = true;
841+
break;
842+
case STF_BARRIER_FALLBACK:
843+
ctx->seen |= SEEN_FUNC;
844+
PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier));
845+
EMIT(PPC_RAW_MTCTR(_R12));
846+
EMIT(PPC_RAW_BCTRL());
847+
break;
848+
case STF_BARRIER_NONE:
849+
break;
850+
}
815851
}
852+
#ifdef CONFIG_PPC_BOOK3S_64
853+
if (!bpf_jit_bypass_spec_v1() && !ori31_emitted)
854+
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
855+
#endif
816856
break;
817857

818858
/*

include/linux/bpf.h

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2288,6 +2288,9 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
22882288
return ret;
22892289
}
22902290

2291+
bool bpf_jit_bypass_spec_v1(void);
2292+
bool bpf_jit_bypass_spec_v4(void);
2293+
22912294
#ifdef CONFIG_BPF_SYSCALL
22922295
DECLARE_PER_CPU(int, bpf_prog_active);
22932296
extern struct mutex bpf_stats_enabled_mutex;
@@ -2475,12 +2478,16 @@ static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
24752478

24762479
static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
24772480
{
2478-
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
2481+
return bpf_jit_bypass_spec_v1() ||
2482+
cpu_mitigations_off() ||
2483+
bpf_token_capable(token, CAP_PERFMON);
24792484
}
24802485

24812486
static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
24822487
{
2483-
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
2488+
return bpf_jit_bypass_spec_v4() ||
2489+
cpu_mitigations_off() ||
2490+
bpf_token_capable(token, CAP_PERFMON);
24842491
}
24852492

24862493
int bpf_map_new_fd(struct bpf_map *map, int flags);

include/linux/bpf_verifier.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,8 @@ struct bpf_insn_aux_data {
580580
u64 map_key_state; /* constant (32 bit) key tracking for maps */
581581
int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
582582
u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
583-
bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
583+
bool nospec; /* do not execute this instruction speculatively */
584+
bool nospec_result; /* result is unsafe under speculation, nospec must follow */
584585
bool zext_dst; /* this insn zero extends dst reg */
585586
bool needs_zext; /* alu op needs to clear upper bits */
586587
bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */

include/linux/filter.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ struct ctl_table_header;
8282
#define BPF_CALL_ARGS 0xe0
8383

8484
/* unused opcode to mark speculation barrier for mitigating
85-
* Speculative Store Bypass
85+
* Spectre v1 and v4
8686
*/
8787
#define BPF_NOSPEC 0xc0
8888

kernel/bpf/core.c

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2102,14 +2102,15 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
21022102
#undef COND_JMP
21032103
/* ST, STX and LDX*/
21042104
ST_NOSPEC:
2105-
/* Speculation barrier for mitigating Speculative Store Bypass.
2106-
* In case of arm64, we rely on the firmware mitigation as
2107-
* controlled via the ssbd kernel parameter. Whenever the
2108-
* mitigation is enabled, it works for all of the kernel code
2109-
* with no need to provide any additional instructions here.
2110-
* In case of x86, we use 'lfence' insn for mitigation. We
2111-
* reuse preexisting logic from Spectre v1 mitigation that
2112-
* happens to produce the required code on x86 for v4 as well.
2105+
/* Speculation barrier for mitigating Speculative Store Bypass,
2106+
* Bounds-Check Bypass and Type Confusion. In case of arm64, we
2107+
* rely on the firmware mitigation as controlled via the ssbd
2108+
* kernel parameter. Whenever the mitigation is enabled, it
2109+
* works for all of the kernel code with no need to provide any
2110+
* additional instructions here. In case of x86, we use 'lfence'
2111+
* insn for mitigation. We reuse preexisting logic from Spectre
2112+
* v1 mitigation that happens to produce the required code on
2113+
* x86 for v4 as well.
21132114
*/
21142115
barrier_nospec();
21152116
CONT;
@@ -3034,6 +3035,21 @@ bool __weak bpf_jit_needs_zext(void)
30343035
return false;
30353036
}
30363037

3038+
/* By default, enable the verifier's mitigations against Spectre v1 and v4 for
3039+
* all archs. The value returned must not change at runtime as there is
3040+
* currently no support for reloading programs that were loaded without
3041+
* mitigations.
3042+
*/
3043+
bool __weak bpf_jit_bypass_spec_v1(void)
3044+
{
3045+
return false;
3046+
}
3047+
3048+
bool __weak bpf_jit_bypass_spec_v4(void)
3049+
{
3050+
return false;
3051+
}
3052+
30373053
/* Return true if the JIT inlines the call to the helper corresponding to
30383054
* the imm.
30393055
*

0 commit comments

Comments
 (0)