-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[BPF] Support Jump Table #149715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BPF] Support Jump Table #149715
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
Thanks! (Building this && rebasing my branch.) |
|
All looks to be compiling properly, I should be able to make this work altogether. I have two questions & nits below, will split them in two comments. Question 1: For switches and for computed goto relocations look a bit different. Switch: computed gotos: The latter two point to properly defined symbols: but this is another step to find those? Can relocation point to the symbol directly as with the switch? |
|
Currently, jump tables contain 8 bytes per entry, is this intentional? (the offsets will never be greater than 4 bytes): One related bug: the size of BPF.JT.0.0 computed as it points to 4-byte entries, and the ^ here the first two have size 2 (as in your example above), the BPF.JT.0.0 actually has size 5 (a switch from my test). However, symbol size of it is |
Signed-off-by: Anton Protopopov <[email protected]>
|
Ok, my |
This part is generated by the compiler directly and the bpf backend is not involved. I need to do some investigation to find out why and whether we could make a change to relocate to the symbol directly or not. |
Thanks. I will fix the bug. Regarding why we have 8 byte jump table entry. I guess this is probably due to the address is calculated from the start of the section. |
Great. I will try to address the bug you found and the relocation difference between switch-statement vs. computed goto ASAP. |
|
Just pushed a fix (BPF.JT.0.0/1 size) discovered by @aspsk in the above. |
|
Currently JT offsets are calculated in bytes, but I think it still would be simpler for libbpf/kernel if offsets would be calculated in instructions. Also, there would be no need to track offsets as 8 bytes, 4 bytes would suffice. The following part was responsible for this in old pr: void BPFAsmPrinter::emitJumpTableInfo()
...
SmallPtrSet<const MachineBasicBlock *, 16> EmittedSets;
auto *Base: const MCSymbolRefExpr * = MCSymbolRefExpr::create(Symbol: getJXAnchorSymbol(JTI), &Ctx: OutContext);
for (const MachineBasicBlock *MBB : JTBBs) {
if (!EmittedSets.insert(Ptr: MBB).second)
continue;
// Offset from gotox to target basic block expressed in number
// of instructions, e.g.:
//
// .L0_0_set_4 = ((LBB0_4 - .LBPF.JX.0.0) >> 3) - 1
const MCExpr *LHS = MCSymbolRefExpr::create(Symbol: MBB->getSymbol(), &Ctx: OutContext);
OutStreamer->emitAssignment(
Symbol: GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber()),
Value: MCBinaryExpr::createSub(
LHS: MCBinaryExpr::createAShr(
LHS: MCBinaryExpr::createSub(LHS, RHS: Base, &Ctx: OutContext),
RHS: MCConstantExpr::create(Value: 3, &Ctx: OutContext), &Ctx: OutContext),
RHS: MCConstantExpr::create(Value: 1, &Ctx: OutContext), &Ctx: OutContext));
}
// BPF.JT.0.0:
// .long .L0_0_set_4
// .long .L0_0_set_2
// ...
// .size BPF.JT.0.0, 128
MCSymbol *JTStart = getJTPublicSymbol(JTI);
OutStreamer->emitLabel(Symbol: JTStart);
for (const MachineBasicBlock *MBB : JTBBs) {
MCSymbol *SetSymbol = GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber());
const MCExpr *V = MCSymbolRefExpr::create(Symbol: SetSymbol, &Ctx: OutContext);
OutStreamer->emitValue(Value: V, Size: EntrySize);
}
const MCExpr *JTSize = MCConstantExpr::create(Value: JTBBs.size() * 4, &Ctx: OutContext);
OutStreamer->emitELFSize(Symbol: JTStart, Value: JTSize);
}
...The expression for |
|
The following example is not handled: int bar(int a) {
__label__ l1, l2;
void * volatile tgt;
int ret = 0;
if (a)
tgt = &&l1;
else
tgt = &&l2;
goto *tgt;
l1: ret += 1;
l2: ret += 2;
return ret;
}Currently the following code is produced: As discussed previously, |
For computed goto, for 'const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();' the MJTI will nullptr. I didn't use the above to be consistent with computed goto jump table. |
Yes, make sense. Missed this one. |
|
One more thing to check, all labels e.g. BPF.JT.0.0 is global. What if there are two bpf progs both having BPF.JT.0.0 and they need to be linked together? Does libbpf can handle this properly? We need to double check with libbpf for this. |
|
For the new version, computed goto has the same relocation mechanism based on symbols. |
Good point, I think libbpf linker should be modified to take care of this. |
Well, arrays of labels declared in ".jumptables" sections can be lowered as jump tables, this way |
Let me take a look. |
|
Ok. With this current patch (and my dev kernel branch) smth like this compiles into two map loads and one (and it verifies and run properly). So far all looks good for me to start working on cleaning things up & more examples of computed gotos. |
92d1b62 to
a26a805
Compare
Uh-oh, that's because I forgot to pass |
|
Below slight modification of what Yonghong tried handles removal of leftover globals for the --- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -30,6 +30,9 @@
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"
#include <set>
@@ -321,6 +324,7 @@ private:
bool insertMissingCallerSavedSpills();
bool removeMayGotoZero();
bool addExitAfterUnreachable();
+ bool removeUnusedGV();
public:
@@ -338,6 +342,7 @@ public:
Changed |= insertMissingCallerSavedSpills();
Changed |= removeMayGotoZero();
Changed |= addExitAfterUnreachable();
+ Changed |= removeUnusedGV();
return Changed;
}
};
@@ -750,6 +755,29 @@ bool BPFMIPreEmitPeephole::addExitAfterUnreachable() {
return true;
}
+bool BPFMIPreEmitPeephole::removeUnusedGV() {
+ Module *M = MF->getFunction().getParent();
+ std::vector<GlobalVariable *> Targets;
+ for (GlobalVariable &Global : M->globals()) {
+ if (Global.getLinkage() != GlobalValue::PrivateLinkage)
+ continue;
+ if (!Global.isConstant() || !Global.hasInitializer())
+ continue;
+ Constant *CV = dyn_cast<Constant>(Global.getInitializer());
+ if (!CV)
+ continue;
+ ConstantArray *CA = dyn_cast<ConstantArray>(CV);
+ if (!CA)
+ continue;
+ Targets.push_back(&Global);
+ }
+ for (auto *G: Targets) {
+ G->replaceAllUsesWith(PoisonValue::get(G->getType())); // <----- Key change
+ G->eraseFromParent();
+ }
+ return true;
+}
+
} // end default namespace
INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole", |
Just updated the pull request to address unused global variable issue. I moved the above code to doFinalization() which handles at module level. |
|
Thanks, tested it with my current branch. |
39dc3c4 to
da16130
Compare
|
@aspsk I just rebased on top of latest llvm-project main branch. With this llvm version, I tried with latest bpf-next with your current patch set To build bpf selftest, I got the following libbpf warnings like I did a hack like below to allow jump table with switch cases 3 or above. When running and a crash I think maybe it is time to send another revision? |
Yes, makes sense, plus there is potentially yet another instruction to be added for static keys.
Thanks! The current version is here: https://github.com/aspsk/bpf-next/tree/wip/indirect-jumps
Do you think this is related to the patch or just to the latest llvm?
Thanks a lot! I will run all tests with
Planning to send it this week. (WIP, but I still need to address one comment from Eduard and one from Andrii.) |
and I checked libbpf source code. Looks like libbpf needs to handle this.
The above is just hack to add but change to cpuv4 for progs/bpf_goto_x.c.
Thanks! |
| ; CHECK-NEXT: LBB0_5: # %sw.epilog | ||
| ; CHECK-NEXT: w0 = 0 | ||
| ; CHECK-NEXT: exit | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use
; UTC_ARGS: --disable
; CHECCK: .section ...
...
; UTC_ARGS: --enable
then can use the UTC script for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will do.
| ; return ret; | ||
| ; } | ||
| ; | ||
| ; Compilation Flags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may search ;--- gen within llvm/test. Those files utilize https://llvm.org/docs/TestingGuide.html#extra-files to make update easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following:
$ cat test/CodeGen/BPF/jump_table_blockaddr.ll
; RUN: rm -rf %t && split-file %s %t && cd %t
; RUN: llc -march=bpf -mcpu=v4 < %s | FileCheck %s
; CHECK: bar
;--- test.c
int bar(int a) {
__label__ l1, l2;
void * volatile tgt;
int ret = 0;
if (a)
tgt = &&l1; // synthetic jump table generated here
else
tgt = &&l2; // another synthetic jump table
goto *tgt;
l1: ret += 1;
l2: ret += 2;
return ret;
}
;--- gen
clang --target=bpf -mcpu=v4 -O2 -emit-llvm -S test.c -o -
;--- test.ll
And run utils/update_test_body.py test/CodeGen/BPF/jump_table_blockaddr.ll. It does generate .ll file properly.
...
;--- test.ll
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "bpf"
; Function Attrs: nofree norecurse nounwind memory(inaccessiblemem: readwrite)
define dso_local range(i32 2, 4) i32 @bar(i32 noundef %0) local_unnamed_addr #0 {
%2 = alloca ptr, align 8
%3 = icmp eq i32 %0, 0
%4 = select i1 %3, ptr blockaddress(@bar, %7), ptr blockaddress(@bar, %6)
store volatile ptr %4, ptr %2, align 8, !tbaa !2
%5 = load volatile ptr, ptr %2, align 8, !tbaa !2
indirectbr ptr %5, [label %6, label %7]
...
But with the above generated .ll file, update_llc_test_checks.py not working any more.
$ utils/update_llc_test_checks.py test/CodeGen/BPF/jump_table_blockaddr.ll
WARNING: Skipping unparsable RUN line: rm -rf %t && split-file %s %t && cd %t
llc: error: llc: <stdin>:7:1: error: expected top-level entity
int bar(int a) {
...
But we do want to have IR in the test and also want to have update_llc_test_checks.py, so for now, I will stick to update_llc_test_checks.py.
1bdae16 to
832bab5
Compare
@yonghong-song I've run all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yonghong-song,
Did a second pass for this pull request, all looks good.
Left two nits.
| SDValue Addr = DAG.getTargetGlobalAddress(GVal, DL, MVT::i64); | ||
|
|
||
| // Emit pseudo instruction | ||
| return SDValue(DAG.getMachineNode(BPF::LDIMM64, DL, MVT::i64, Addr), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question regarding direct BPF::LDIMM64 injection, like here, and BPFISD::Wrapper injection, like in getAddr.
As far as I understand, BPFISD::Wrapper ends up lowered to BPF::LDIMM64 because of the Pat rules in the BPFInstrInfo.td, hence these two techniques are effectively identical.
After landing this pull request:
LowerJumpTableandLowerConstantPoolwill useBPFISD::WrapperthroughgetAddr;LowerGlobalAddressandLowerBlockAddresswill useBPF::LDIMM64directly.
Will it make sense to do a small refactoring first, removing BPFISD::Wrapper and replacing it with DAG.getMachineNode(BPF::LDIMM64, DL, MVT::i64, Addr)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this (removing BPFISD::Wrapper and put all global address with DAG.getMachineNode()). This will add more cases in CustomInserter. I still prefer to have LowerJumpTable and LowerConstantPool in current implementation as they are easier to understand at DAG level with similar patterns in other architectures.
| cl::desc("Expand memcpy into load/store pairs in order")); | ||
|
|
||
| static cl::opt<unsigned> BPFMinimumJumpTableEntries( | ||
| "bpf-min-jump-table-entries", cl::init(13), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why 13?
I see that setMinimumJumpTableEntries is called only for a couple for archs: AVR effectively disables jump tables bu using UINT_MAX, WebAssembly uses the value of 2 (always introduce jump tables), PPC uses 64, everything else uses default: 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why 13?
This was suggested by Alexei in that private thread spawned from my RFC series, citing:
For now I would pick 13 to align with arm64 to make it slightly less
random and hopefully won't regress performance.
| ; CHECK: .cfi_startproc | ||
| ; CHECK: # %bb.0: # %entry | ||
| ; CHECK: r2 = BPF.JT.0.0 ll | ||
| ; CHECK: r2 = *(u64 *)(r2 + 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd mask the register numbers in the tests using awk, to keep the tests a bit more stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are fine here. If anything changed, the CHECKs can be easily regenerated with update_test_body.py script. Specific register numbers are helpful to ensure asm code correctness.
NOTE 1: We probably need cpu v5 or other flags to enable this feature.
We can add it later when necessary. Let us use cpu v4 for now.
NOTE 2: An option -bpf-min-jump-table-entries is implemented to control the minimum
number of entries to use a jump table on BPF. The default value 5 and this is
to make it easy to test. Eventually we will increase min jump table entries to be 13.
This patch adds jump table support. A new insn 'gotox <reg>' is
added to allow goto through a register. The register represents
the address in the current section.
Example 1 (switch statement):
=============================
Code:
struct simple_ctx {
int x;
int y;
int z;
};
int ret_user, ret_user2;
void bar(void);
int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2)
{
switch (ctx->x) {
case 1: ret_user = 18; break;
case 20: ret_user = 6; break;
case 16: ret_user = 9; break;
case 6: ret_user = 16; break;
case 8: ret_user = 14; break;
case 30: ret_user = 2; break;
default: ret_user = 1; break;
}
bar();
switch (ctx2->x) {
case 0: ret_user2 = 8; break;
case 31: ret_user2 = 5; break;
case 13: ret_user2 = 8; break;
case 1: ret_user2 = 3; break;
case 11: ret_user2 = 4; break;
default: ret_user2 = 29; break;
}
return 0;
}
Run: clang --target=bpf -mcpu=v4 -O2 -S test.c
The assembly code:
...
# %bb.1: # %entry
r1 <<= 3
r2 = .LJTI0_0 ll
r2 += r1
r1 = *(u64 *)(r2 + 0)
gotox r1
LBB0_2:
w1 = 18
goto LBB0_9
...
# %bb.10: # %sw.epilog
r1 <<= 3
r2 = .LJTI0_1 ll
r2 += r1
r1 = *(u64 *)(r2 + 0)
gotox r1
LBB0_11:
w1 = 8
goto LBB0_16
...
.section .rodata,"a",@progbits
.p2align 3, 0x0
.LJTI0_0:
.quad LBB0_2
.quad LBB0_8
...
.quad LBB0_7
.LJTI0_1:
.quad LBB0_11
.quad LBB0_13
...
Although we do have labels .LJTI0_0 and .LJTI0_1, but since they have
prefix '.L' so they won't appear in the .o file like other symbols.
Run: llvm-objdump -Sr test.o
...
4: 67 01 00 00 03 00 00 00 r1 <<= 0x3
5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
0000000000000028: R_BPF_64_64 .rodata
7: 0f 12 00 00 00 00 00 00 r2 += r1
...
29: 67 01 00 00 03 00 00 00 r1 <<= 0x3
30: 18 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xf0 ll
00000000000000f0: R_BPF_64_64 .rodata
32: 0f 12 00 00 00 00 00 00 r2 += r1
The size of jump table is not obvious. The libbpf needs to check all relocations
against .rodata section in order to get precise size in order to construct bpf
maps.
Example 2 (Simple computed goto):
=================================
Code:
int bar(int a) {
__label__ l1, l2;
void * volatile tgt;
int ret = 0;
if (a)
tgt = &&l1; // synthetic jump table generated here
else
tgt = &&l2; // another synthetic jump table
goto *tgt;
l1: ret += 1;
l2: ret += 2;
return ret;
}
Compile: clang --target=bpf -mcpu=v4 -O2 -c test1.c
Objdump: llvm-objdump -Sr test1.o
0: 18 02 00 00 50 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x50 ll
0000000000000000: R_BPF_64_64 .text
2: 16 01 02 00 00 00 00 00 if w1 == 0x0 goto +0x2 <bar+0x28>
3: 18 02 00 00 40 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x40 ll
0000000000000018: R_BPF_64_64 .text
5: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r2
6: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8)
7: 0d 01 00 00 00 00 00 00 gotox r1
8: b4 00 00 00 03 00 00 00 w0 = 0x3
9: 05 00 01 00 00 00 00 00 goto +0x1 <bar+0x58>
10: b4 00 00 00 02 00 00 00 w0 = 0x2
11: 95 00 00 00 00 00 00 00 exit
For this case, there is no jump table so it would be hard to track offset
during verification esp. when offset needs adjustment. So practically we
need to create two jump tables for '&&l1' and '&&l2' respectively.
Example 3 (More complicated computed goto):
===========================================
Code:
int foo(int a, int b) {
__label__ l1, l2, l3, l4;
void *jt1[] = {[0]=&&l1, [1]=&&l2};
void *jt2[] = {[0]=&&l3, [1]=&&l4};
int ret = 0;
goto *jt1[a % 2];
l1: ret += 1;
l2: ret += 3;
goto *jt2[b % 2];
l3: ret += 5;
l4: ret += 7;
return ret;
}
Compile: clang --target=bpf -mcpu=v4 -O2 -S test2.c
Asm code:
...
r3 = (s32)r2
r3 <<= 3
r2 = .L__const.foo.jt2 ll
r2 += r3
r1 = (s32)r1
r1 <<= 3
r3 = .L__const.foo.jt1 ll
r3 += r1
w0 = 0
r1 = *(u64 *)(r3 + 0)
gotox r1
.Ltmp0: # Block address taken
LBB0_1: # %l1
# =>This Inner Loop Header: Depth=1
w0 += 1
w0 += 3
r1 = *(u64 *)(r2 + 0)
gotox r1
.Ltmp1: # Block address taken
LBB0_2: # %l2
...
.type .L__const.foo.jt1,@object # @__const.foo.jt1
.section .rodata,"a",@progbits
.p2align 3, 0x0
.L__const.foo.jt1:
.quad .Ltmp0
.quad .Ltmp1
.size .L__const.foo.jt1, 16
.type .L__const.foo.jt2,@object # @__const.foo.jt2
.p2align 3, 0x0
.L__const.foo.jt2:
.quad .Ltmp2
.quad .Ltmp3
.size .L__const.foo.jt2, 16
Similar to switch statement case, for the binary, the symbols
.L__const.foo.jt* will not show up in the symbol table and jump table
will be in .rodata section.
We need to resolve Example 2 case.
Also with more libbpf work (dealing with .rodata sections etc.),
everything should work fine for Examples 1 and 3. But we could do
better by
- Replacing symbols like .L<...> with symbols appearing in
symbol table.
- Add jump tables to .jumptables section instead of .rodata section.
This should make things easier for libbpf. User can also benefit
from this as relocation/section will be easy to check.
Next two patches will fix Example 2 and improve all of them as
mentioned in the above.
Example 2, Asm code:
...
# %bb.0: # %entry
r2 = .LJTI0_0 ll
r2 = *(u64 *)(r2 + 0)
r3 = .LJTI0_1 ll
r3 = *(u64 *)(r3 + 0)
if w1 == 0 goto LBB0_2
# %bb.1: # %entry
r3 = r2
LBB0_2: # %entry
*(u64 *)(r10 - 8) = r3
r1 = *(u64 *)(r10 - 8)
gotox r1
.Ltmp0: # Block address taken
LBB0_3: # %l1
w0 = 3
goto LBB0_5
.Ltmp1: # Block address taken
LBB0_4: # %l2
w0 = 2
LBB0_5: # %.split
exit
...
.section .rodata,"a",@progbits
.p2align 3, 0x0
.LJTI0_0:
.quad LBB0_3
.LJTI0_1:
.quad LBB0_4
Example 3, Asm Code:
r3 = (s32)r2
r3 <<= 3
r2 = .LJTI0_0 ll
r2 += r3
r1 = (s32)r1
r1 <<= 3
r3 = .LJTI0_1 ll
r3 += r1
w0 = 0
r1 = *(u64 *)(r3 + 0)
gotox r1
.Ltmp0: # Block address taken
LBB0_1: # %l1
# =>This Inner Loop Header: Depth=1
w0 += 1
w0 += 3
r1 = *(u64 *)(r2 + 0)
gotox r1
.Ltmp1: # Block address taken
LBB0_2: # %l2
# =>This Inner Loop Header: Depth=1
w0 += 3
r1 = *(u64 *)(r2 + 0)
gotox r1
.Ltmp2: # Block address taken
LBB0_3: # %l3
w0 += 5
goto LBB0_5
.Ltmp3: # Block address taken
LBB0_4: # %l4
LBB0_5: # %.split17
w0 += 7
exit
...
.section .rodata,"a",@progbits
.p2align 3, 0x0
.LJTI0_0:
.quad LBB0_3
.quad LBB0_4
.LJTI0_1:
.quad LBB0_1
.quad LBB0_2
# -- End function
.type .L__const.foo.jt1,@object # @__const.foo.jt1
.p2align 3, 0x0
.L__const.foo.jt1:
.quad .Ltmp0
.quad .Ltmp1
.size .L__const.foo.jt1, 16
.type .L__const.foo.jt2,@object # @__const.foo.jt2
.p2align 3, 0x0
.L__const.foo.jt2:
.quad .Ltmp2
.quad .Ltmp3
.size .L__const.foo.jt2, 16
Note that for both above examples, the jump table section is '.rodata'
and labels have '.L' prefix which means labels won't show up in the
symbol table. As mentioned in previous patch, we want to
- Move jump tables to '.jumptables' section
- Rename '.L*' labels with proper labels which are visible in symbol table.
Note that for Example 3, there are extra global functions like
.L__const.foo.jt1 and .L__const.foo.jt2
and we are not able to remove them. But they won't show up in symbol
table either.
For jumptables from switch statements, generate 'llvm-readelf -s' visible
symbols and put jumptables into a dedicated section. Most work from
Eduard.
For the previous example 1,
Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c
Asm code:
...
# %bb.1: # %entry
r1 <<= 3
r2 = BPF.JT.0.0 ll
r2 += r1
r1 = *(u64 *)(r2 + 0)
gotox r1
LBB0_2:
w1 = 18
goto LBB0_9
...
# %bb.10: # %sw.epilog
r1 <<= 3
r2 = BPF.JT.0.1 ll
r2 += r1
r1 = *(u64 *)(r2 + 0)
gotox r1
LBB0_11:
w1 = 8
goto LBB0_16
...
.section .jumptables,"",@progbits
BPF.JT.0.0:
.quad LBB0_2
.quad LBB0_8
...
.quad LBB0_7
.size BPF.JT.0.0, 240
BPF.JT.0.1:
.quad LBB0_11
.quad LBB0_13
...
.quad LBB0_12
.size BPF.JT.0.1, 256
And symbols BPF.JT.0.{0,1} will be in symbol table.
The final binary:
4: 67 01 00 00 03 00 00 00 r1 <<= 0x3
5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
0000000000000028: R_BPF_64_64 BPF.JT.0.0
7: 0f 12 00 00 00 00 00 00 r2 += r1
...
29: 67 01 00 00 03 00 00 00 r1 <<= 0x3
30: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
00000000000000f0: R_BPF_64_64 BPF.JT.0.1
32: 0f 12 00 00 00 00 00 00 r2 += r1
...
Symbol table:
4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
5: 0000000000000000 4 OBJECT GLOBAL DEFAULT 6 ret_user
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND bar
7: 00000000000000f0 256 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1
and
[ 4] .jumptables PROGBITS 0000000000000000 0001c8 0001f0 00 0 0 1
For the previous example 2,
Compile: clang --target=bpf -mcpu=v4 -O2 -S test1.c
Asm code:
...
# %bb.0: # %entry
r2 = BPF.JT.0.0 ll
r2 = *(u64 *)(r2 + 0)
r3 = BPF.JT.0.1 ll
r3 = *(u64 *)(r3 + 0)
if w1 == 0 goto LBB0_2
# %bb.1: # %entry
r3 = r2
LBB0_2: # %entry
*(u64 *)(r10 - 8) = r3
r1 = *(u64 *)(r10 - 8)
gotox r1
...
.section .jumptables,"",@progbits
BPF.JT.0.0:
.quad LBB0_3
.size BPF.JT.0.0, 8
BPF.JT.0.1:
.quad LBB0_4
.size BPF.JT.0.1, 8
The binary: clang --target=bpf -mcpu=v4 -O2 -c test1.c
0: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
0000000000000000: R_BPF_64_64 BPF.JT.0.0
2: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0x0)
3: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll
0000000000000018: R_BPF_64_64 BPF.JT.0.1
5: 79 33 00 00 00 00 00 00 r3 = *(u64 *)(r3 + 0x0)
6: 16 01 01 00 00 00 00 00 if w1 == 0x0 goto +0x1 <bar+0x40>
7: bf 23 00 00 00 00 00 00 r3 = r2
8: 7b 3a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r3
9: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8)
10: 0d 01 00 00 00 00 00 00 gotox r1
4: 0000000000000000 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
5: 0000000000000008 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1
[ 4] .jumptables PROGBITS 0000000000000000 0000b8 000010 00 0 0 1
For the previous example 3,
Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c
Asm code:
...
r3 = (s32)r2
r3 <<= 3
r2 = BPF.JT.0.0 ll
r2 += r3
r1 = (s32)r1
r1 <<= 3
r3 = BPF.JT.0.1 ll
r3 += r1
w0 = 0
r1 = *(u64 *)(r3 + 0)
gotox r1
.Ltmp0: # Block address taken
LBB0_1: # %l1
# =>This Inner Loop Header: Depth=1
w0 += 1 # =>This Inner Loop Header: Depth=1
...
.section .jumptables,"",@progbits
BPF.JT.0.0:
.quad LBB0_3
.quad LBB0_4
.size BPF.JT.0.0, 16
BPF.JT.0.1:
.quad LBB0_1
.quad LBB0_2
.size BPF.JT.0.1, 16
The binary: clang --target=bpf -mcpu=v4 -O2 -c test2.c
12: bf 23 20 00 00 00 00 00 r3 = (s32)r2
13: 67 03 00 00 03 00 00 00 r3 <<= 0x3
14: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
0000000000000070: R_BPF_64_64 BPF.JT.0.0
16: 0f 32 00 00 00 00 00 00 r2 += r3
17: bf 11 20 00 00 00 00 00 r1 = (s32)r1
18: 67 01 00 00 03 00 00 00 r1 <<= 0x3
19: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll
0000000000000098: R_BPF_64_64 BPF.JT.0.1
21: 0f 13 00 00 00 00 00 00 r3 += r1
4: 0000000000000000 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0
5: 0000000000000010 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1
[ 4] .jumptables PROGBITS 0000000000000000 000160 000020 00 0 0 1
This is temporary and it makes easy to run bpf selftests. Once kernel side is ready, we will implement CPU V5 which will support jump tables.
To adjust, add '-mllvm -bpf-min-jump-table-entries=<n>' to the compilation flags.
832bab5 to
40b9097
Compare
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/16135 Here is the relevant piece of the build log for the reference |
Anton Protopopov says:
====================
BPF indirect jumps
This patchset implements a new type of map, instruction set, and uses
it to build support for indirect branches in BPF (on x86). (The same
map will be later used to provide support for indirect calls and static
keys.) See [1], [2] for more context.
Short table of contents:
* Patches 1-6 implement the new map of type
BPF_MAP_TYPE_INSN_SET and corresponding selftests. This map can
be used to track the "original -> xlated -> jitted mapping" for
a given program.
* Patches 7-12 implement the support for indirect jumps on x86 and add libbpf
support for LLVM-compiled programs containing indirect jumps, and selftests.
The jump table support was merged to LLVM and now can be
enabled with -mcpu=v4, see [3]. The __BPF_FEATURE_GOTOX
macros can be used to check if the compiler supports the
feature or not.
See individual patches for more details on the implementation details.
v10 -> v11 (this series):
* rearranged patches and split libbpf patch such that first 6 patches
implementing instruction arrays can be applied independently
* instruction arrays:
* move [fake] aux->used_maps assignment in this patch
* indirect jumps:
* call clear_insn_aux_data before bpf_remove_insns (AI)
* libbpf:
* remove the relocations check after the new LLVM is released (Eduard, Yonghong)
* libbpf: fix an index printed in pr_warn (AI)
* selftests:
* protect programs triggered by nanosleep from fake runs (Eduard)
* patch verifier_gotox to not emit .rel.jumptables
v9 -> v10 (https://lore.kernel.org/bpf/[email protected]/T/#t):
* Three bugs were noticed by AI in v9 (two old, one introduced by v9):
* [new] insn_array_alloc_size could overflow u32, switched to u64 (AI)
* map_ptr should be compared in regsafe for PTR_TO_INSN (AI)
* duplicate elements were copied in jt_from_map (AI)
* added a selftest in verifier_gotox with a jump table containing non-unique entries
v8 -> v9 (https://lore.kernel.org/bpf/[email protected]/T/#t):
* instruction arrays:
* remove the size restriction of 256 elements
* add a comments about addrs usage, old and new (Alexei)
* libbpf:
* properly prefix warnings (Andrii)
* cast j[t] to long long for printf and some other minor cleanups (Andrii)
* selftests:
* use __BPF_FEATURE_GOTOX in selftests and skip tests if it's not set (Eduard)
* fix a typo in a selftest assembly (AI)
v7 -> v8 (https://lore.kernel.org/bpf/[email protected]/T/#u):
* instruction arrays:
* simplify the bpf_prog_update_insn_ptrs function (Eduard)
* remove a semicolon after a function definition (AI)
* libbpf:
* add a proper error path in libbpf patch (AI)
* re-re-factor the create_jt_map & find_subprog_idx (Eduard)
* selftests:
* verifier_gotox: add a test for a jump table pointing to outside of a subprog (Eduard)
* used test__skip instead of just running an empty test
* split tests in bpf_gotox into subtests for convenience
* random:
* drop the docs commit for now
v6 -> v7 (https://lore.kernel.org/bpf/[email protected]/T/#t):
* rebased and dropped already merged commits
* instruction arrays
* use jit_data to find mappings from insn to jit (Alexei)
* alloc `ips` as part of the main allocation (Eduard)
* the `jitted_ip` member wasn't actually used (Eduard)
* remove the bpf_insn_ptr structure, which is not needed for this patch
* indirect jumps, kernel:
* fix a memory leak in `create_jt` (AI)
* use proper reg+8*ereg in `its_static_thunk` (AI)
* some minor cleanups (Eduard)
* indirect jumps, libbpf:
* refactor the `jt_adjust_off()` piece (Edurad)
* move "JUMPTABLES_SEC" into libbpf_internal.h (Eduard)
* remove an unnecessary if (Eduard)
* verifier_gotox: add tests to verify that `gotox rX` works with all registers
v5 -> v6 (https://lore.kernel.org/bpf/[email protected]/T/#u):
* instruction arrays:
* better document `struct bpf_insn_array_value` (Eduard)
* remove a condition in `bpf_insn_array_adjust_after_remove` (Eduard)
* make userspace see original, xlated, and jitted indexes (+original) (Eduard)
* indirect jumps, kernel:
* reject writes to the map
* reject unaligned ops
* add a check what `w` is not outside the program in check_config for `gotox` (Eduard)
* do not introduce unneeded `bpf_find_containing_subprog_idx`
* simplify error processing for `bpf_find_containing_subprog` (Eduard)
* add `insn_state |= DISCOVERED` when it's discovered (Eduard)
* support SUB operations on PTR_TO_INSN (Eduard)
* make `gotox_tmp_buf` a bpf_iarray and use helper to relocate it (Eduard)
* rename fields of `bpf_iarray` to more generic (Eduard)
* re-implement `visit_gotox_insn` in a loop (Eduard)
* some minor cleanups (Eduard)
* libbpf:
* `struct reloc_desc`: add a comment about `union` (Eduard)
* rename parameters of (and one other place in code) `{create,add}_jt_map` to `sym_off` (Eduard)
* `create_jt_map`: check that size/off are 8-byte aligned (Eduard)
* Selftests:
* instruction array selftests:
* only run tests on x86_64
* write a more generic function to test things to reduce code (Eduard)
* errno wasn't used in checks, so don't reset it (Eduard)
* print `i`, `xlated_off` and `map_out[i]` here (Eduard)
* added `verifier_gotox` selftests which do not depend on LLVM:
* disabled `bpf_gotox` tests by default
* other changes:
* remove an extra function in bpf disasm (Eduard)
* some minor cleanups in the insn_successors patch (Eduard)
* update documentation in `Documentation/bpf/linux-notes.html` about jumps, now it is supported :)
v3 -> v4 -> v5 (https://lore.kernel.org/bpf/[email protected]/):
* [v4 -> v5] rebased on top of the last bpf-next/master
* instruction arrays:
* add copyright (Alexei)
* remove mutexes, add frozen back (Alexei)
* setup 1:1 prog-map correspondence using atomic_xchg
* do not copy/paste array_map_get_next_key, add a common helper (Alexei)
* misc minor code cleanups (Alexei)
* indirect jumps, kernel side:
* remove jt_allocated, just check if insn is gotox (Eduard)
* use copy_register_state instead of individual copies (Eduard)
* in push_stack is_speculative should be inherited (Eduard)
* a few cleanups for insn_successors, including omitting error path (Eduard)
* check if reserved fields are used when considering `gotox` instruction (Eduard)
* read size and alignment of read from insn_array should be 8 (Eduard)
* put buffer for sorting in subfun info and realloc to grow as needed (Eduard)
* properly do `jump_point` / `prune_point` from `push_gotox_edge` (Eduard)
* use range_within to check states (Eduard)
* some minor cleanups and fix commit message (Eduard)
* indirect jumps, libbpf side:
* close map_fd in some error paths in create_jt_map (Andrii)
* maps for jump tables are actually not closed at all, fix this (Andrii)
* rename map from `jt` to `.jumptables` (Andrii)
* use `errstr` in an error message (Andrii)
* rephrase error message to look more standard (Andrii)
* misc other minor renames and cleanups (Andrii)
* selftests:
* add the frozen selftest back
* add a selftest for two jumps loading same table
* some other changes:
* rebase and split insn_successor changes into separate patch
* use PTR_ERR_OR_ZERO in the push stack patch (Eduard)
* indirect jumps on x86: properly re-read *pprog (Eduard)
v2 -> v3 (https://lore.kernel.org/bpf/[email protected]/):
* fix build failure when CONFIG_BPF_SYSCALL is not set (kbuild-bot)
* reformat bpftool help messages (Quentin)
v1 -> v2 (https://lore.kernel.org/bpf/[email protected]/):
* push_stack changes:
* sanitize_speculative_path should just return int (Eduard)
* return code from sanitize_speculative_path, not EFAULT (Eduard)
* when BPF_COMPLEXITY_LIMIT_JMP_SEQ is reached, return E2BIG (Eduard)
* indirect jumps:
* omit support for .imm=fd in gotox, as we're not using it for now (Eduard)
* struct jt -> struct bpf_iarray (Eduard)
* insn_successors: rewrite the interface to just return a pointer (Eduard)
* remove min_index/max_index, use umin_value/umax_value instead (Alexei, Eduard)
* move emit_indirect_jump args change to the previous patch (Eduard)
* add a comment to map_mem_size() (Eduard)
* use verifier_bug for some error cases in check_indirect_jump (Eduard)
* clear_insn_aux_data: use start,len instead of start,end (Eduard)
* make regs[insn->dst_reg].type = PTR_TO_INSN part of check_mem_access (Eduard)
* constant blinding changes:
* make subprog_start adjustment better readable (Eduard)
* do not set subprog len, it is already set (Eduard)
* libbpf:
* remove check that relocations from .rodata are ok (Anton)
* do not freeze the map, it is not necessary anymore (Anton)
* rename the goto_x -> gotox everywhere (Anton)
* use u64 when parsing LLVM jump tables (Eduard)
* split patch in two due to spaces->tabs change (Eduard)
* split bpftool changes to bpftool patch (Andrii)
* make sym_size it a union with ext_idx (Andrii)
* properly copy/free the jumptables_data section from elf (Andrii)
* a few cosmetic changes around create_jt_map (Andrii)
* fix some comments + rewrite patch description (Andrii)
* inline bpf_prog__append_subprog_offsets (Andrii)
* subprog_sec_offst -> subprog_sec_off (Andrii)
* !strcmp -> strcmp() == 0 (Andrii)
* make some function names more readable (Andrii)
* allocate table of subfunc offsets via libbpf_reallocarray (Andrii)
* selftests:
* squash insn_array* tests together (Anton)
* fixed build warnings (kernel test robot)
RFC -> v1 (https://lore.kernel.org/bpf/[email protected]/):
* I've tried to address all the comments provided by Alexei and
Eduard in RFC. Will try to list the most important of them below.
* One big change: move from older LLVM version [5] to newer [4].
Now LLVM generates jump tables as symbols in the new special
section ".jumptables". Another part of this change is that
libbpf now doesn't try to link map load and goto *rX, as
1) this is absolutely not reliable 2) for some use cases this
is impossible (namely, when more than one jump table can be used
in the same gotox instruction).
* Added insn_successors() support (Alexei, Eduard). This includes
getting rid of the ugly bpf_insn_set_iter_xlated_offset()
interface (Eduard).
* Removed hack for the unreachable instruction, as new LLVM thank to
Eduard doesn't generate it.
* Set mem_size for direct map access properly instead of hacking.
Remove off>0 check. (Alexei)
* Do not allocate new memory for min_index/max_index (Alexei, Eduard)
* Information required during check_cfg is now cached to be reused
later (Alexei + general logic for supporting multiple JT per jump)
* Properly compare registers in regsafe (Alexei, Eduard)
* Remove support for JMP32 (Eduard)
* Better checks in adjust_ptr_min_max_vals (Eduard)
* More selftests were added (but still there's room for more) which
directly use gotox (Alexei)
* More checks and verbose messages added
* "unique pointers" are no more in the map
Links:
1. https://lpc.events/event/18/contributions/1941/
2. https://lwn.net/Articles/1017439/
3. llvm/llvm-project#149715
4. llvm/llvm-project#149715 (comment)
6. rfc: https://lore.kernel.org/bpf/[email protected]/
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Add jump table (switch statement and computed goto) support for BPF backend.
A
gotox <reg>insn is implemented and the<reg>holds the target insn where the gotox will go.For a switch statement like
and the final binary
Note that for the above example,
-mllvm -bpf-min-jump-table-entries=5should be in compilation flags as the current default bpf-min-jump-table-entries is 13. For example.For computed goto like
The final binary:
A more complicated test with both switch-statement triggered jump table and compute gotos:
Compile with
The binary:
You can see jump table symbols are all different.