-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[BPF] Support Jump Table #149715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BPF] Support Jump Table #149715
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks! (Building this && rebasing my branch.) |
All looks to be compiling properly, I should be able to make this work altogether. I have two questions & nits below, will split them in two comments. Question 1: For switches and for computed goto relocations look a bit different. Switch:
computed gotos:
The latter two point to properly defined symbols:
but this is another step to find those? Can relocation point to the symbol directly as with the switch? |
Currently, jump tables contain 8 bytes per entry, is this intentional? (the offsets will never be greater than 4 bytes):
One related bug: the size of BPF.JT.0.0 computed as it points to 4-byte entries, and the
^ here the first two have size 2 (as in your example above), the BPF.JT.0.0 actually has size 5 (a switch from my test). However, symbol size of it is |
Signed-off-by: Anton Protopopov <[email protected]>
Ok, my |
This part is generated by the compiler directly and the bpf backend is not involved. I need to do some investigation to find out why and whether we could make a change to relocate to the symbol directly or not. |
Thanks. I will fix the bug. Regarding why we have 8 byte jump table entry. I guess this is probably due to the address is calculated from the start of the section. |
Great. I will try to address the bug you found and the relocation difference between switch-statement vs. computed goto ASAP. |
Just pushed a fix (BPF.JT.0.0/1 size) discovered by @aspsk in the above. |
Currently JT offsets are calculated in bytes, but I think it still would be simpler for libbpf/kernel if offsets would be calculated in instructions. Also, there would be no need to track offsets as 8 bytes, 4 bytes would suffice. The following part was responsible for this in old pr: void BPFAsmPrinter::emitJumpTableInfo()
...
SmallPtrSet<const MachineBasicBlock *, 16> EmittedSets;
auto *Base: const MCSymbolRefExpr * = MCSymbolRefExpr::create(Symbol: getJXAnchorSymbol(JTI), &Ctx: OutContext);
for (const MachineBasicBlock *MBB : JTBBs) {
if (!EmittedSets.insert(Ptr: MBB).second)
continue;
// Offset from gotox to target basic block expressed in number
// of instructions, e.g.:
//
// .L0_0_set_4 = ((LBB0_4 - .LBPF.JX.0.0) >> 3) - 1
const MCExpr *LHS = MCSymbolRefExpr::create(Symbol: MBB->getSymbol(), &Ctx: OutContext);
OutStreamer->emitAssignment(
Symbol: GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber()),
Value: MCBinaryExpr::createSub(
LHS: MCBinaryExpr::createAShr(
LHS: MCBinaryExpr::createSub(LHS, RHS: Base, &Ctx: OutContext),
RHS: MCConstantExpr::create(Value: 3, &Ctx: OutContext), &Ctx: OutContext),
RHS: MCConstantExpr::create(Value: 1, &Ctx: OutContext), &Ctx: OutContext));
}
// BPF.JT.0.0:
// .long .L0_0_set_4
// .long .L0_0_set_2
// ...
// .size BPF.JT.0.0, 128
MCSymbol *JTStart = getJTPublicSymbol(JTI);
OutStreamer->emitLabel(Symbol: JTStart);
for (const MachineBasicBlock *MBB : JTBBs) {
MCSymbol *SetSymbol = GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber());
const MCExpr *V = MCSymbolRefExpr::create(Symbol: SetSymbol, &Ctx: OutContext);
OutStreamer->emitValue(Value: V, Size: EntrySize);
}
const MCExpr *JTSize = MCConstantExpr::create(Value: JTBBs.size() * 4, &Ctx: OutContext);
OutStreamer->emitELFSize(Symbol: JTStart, Value: JTSize);
}
... The expression for |
The following example is not handled: int bar(int a) {
__label__ l1, l2;
void * volatile tgt;
int ret = 0;
if (a)
tgt = &&l1;
else
tgt = &&l2;
goto *tgt;
l1: ret += 1;
l2: ret += 2;
return ret;
} Currently the following code is produced:
As discussed previously, |
For computed goto, for 'const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();' the MJTI will nullptr. I didn't use the above to be consistent with computed goto jump table. |
Yes, make sense. Missed this one. |
One more thing to check, all labels e.g. BPF.JT.0.0 is global. What if there are two bpf progs both having BPF.JT.0.0 and they need to be linked together? Does libbpf can handle this properly? We need to double check with libbpf for this. |
For the new version, computed goto has the same relocation mechanism based on symbols. |
Good point, I think libbpf linker should be modified to take care of this. |
Well, arrays of labels declared in ".jumptables" sections can be lowered as jump tables, this way |
Let me take a look. |
Ok. With this current patch (and my dev kernel branch) smth like this
compiles into two map loads and one
(and it verifies and run properly). So far all looks good for me to start working on cleaning things up & more examples of computed gotos. |
92d1b62
to
a26a805
Compare
All looks ok now from my side, now fixing warnings triggered by CI, then sending out v1 |
Kernel v1 is here: https://lore.kernel.org/bpf/[email protected] |
Thanks @aspsk for the update! I will try to remove unneeded globals and add some tests from llvm side to make the pull request complete. |
Signed-off-by: Anton Protopopov <[email protected]>
ret i32 %ret.0 | ||
} | ||
|
||
; CHECK: w1 &= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CHECK-NEXT: whenever feasible so that FileCheck diagnostics will be better when an instruction is changed.
Consider using update_llc_test_checks.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also consider https://llvm.org/docs/TestingGuide.html#extra-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CHECK-NEXT: whenever feasible so that FileCheck diagnostics will be better when an instruction is changed. Consider using update_llc_test_checks.py
Thanks for the suggestion! I also want to dump '.jumptables' section and that is why I didn't use update_llc_test_checks.py.
5ca7a82
to
cc70bda
Compare
@aspsk @eddyz87 Just update the patch with following:
I also rebased on top of current main branch and fix one issue during the update.
|
I also tried to remove those global variables (with blockaddresses) when gotox is available at pretty late optimizations:
Unfortunately, tried a few examples, e.g.,
It does not work. |
When I try the example from a previous comment, I get the following error:
I'm at |
Uh-oh, that's because I forgot to pass |
Below slight modification of what Yonghong tried handles removal of leftover globals for the --- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -30,6 +30,9 @@
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"
#include <set>
@@ -321,6 +324,7 @@ private:
bool insertMissingCallerSavedSpills();
bool removeMayGotoZero();
bool addExitAfterUnreachable();
+ bool removeUnusedGV();
public:
@@ -338,6 +342,7 @@ public:
Changed |= insertMissingCallerSavedSpills();
Changed |= removeMayGotoZero();
Changed |= addExitAfterUnreachable();
+ Changed |= removeUnusedGV();
return Changed;
}
};
@@ -750,6 +755,29 @@ bool BPFMIPreEmitPeephole::addExitAfterUnreachable() {
return true;
}
+bool BPFMIPreEmitPeephole::removeUnusedGV() {
+ Module *M = MF->getFunction().getParent();
+ std::vector<GlobalVariable *> Targets;
+ for (GlobalVariable &Global : M->globals()) {
+ if (Global.getLinkage() != GlobalValue::PrivateLinkage)
+ continue;
+ if (!Global.isConstant() || !Global.hasInitializer())
+ continue;
+ Constant *CV = dyn_cast<Constant>(Global.getInitializer());
+ if (!CV)
+ continue;
+ ConstantArray *CA = dyn_cast<ConstantArray>(CV);
+ if (!CA)
+ continue;
+ Targets.push_back(&Global);
+ }
+ for (auto *G: Targets) {
+ G->replaceAllUsesWith(PoisonValue::get(G->getType())); // <----- Key change
+ G->eraseFromParent();
+ }
+ return true;
+}
+
} // end default namespace
INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole", |
Just updated the pull request to address unused global variable issue. I moved the above code to doFinalization() which handles at module level. |
Thanks, tested it with my current branch. |
NOTE 1: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary. Let us use cpu v4 for now. NOTE 2: An option -bpf-min-jump-table-entries is implemented to control the minimum number of entries to use a jump table on BPF. The default value 5 and this is to make it easy to test. Eventually we will increase min jump table entries to be 13. This patch adds jump table support. A new insn 'gotox <reg>' is added to allow goto through a register. The register represents the address in the current section. Example 1 (switch statement): ============================= Code: struct simple_ctx { int x; int y; int z; }; int ret_user, ret_user2; void bar(void); int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2) { switch (ctx->x) { case 1: ret_user = 18; break; case 20: ret_user = 6; break; case 16: ret_user = 9; break; case 6: ret_user = 16; break; case 8: ret_user = 14; break; case 30: ret_user = 2; break; default: ret_user = 1; break; } bar(); switch (ctx2->x) { case 0: ret_user2 = 8; break; case 31: ret_user2 = 5; break; case 13: ret_user2 = 8; break; case 1: ret_user2 = 3; break; case 11: ret_user2 = 4; break; default: ret_user2 = 29; break; } return 0; } Run: clang --target=bpf -mcpu=v4 -O2 -S test.c The assembly code: ... # %bb.1: # %entry r1 <<= 3 r2 = .LJTI0_0 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_2: w1 = 18 goto LBB0_9 ... # %bb.10: # %sw.epilog r1 <<= 3 r2 = .LJTI0_1 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_11: w1 = 8 goto LBB0_16 ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_2 .quad LBB0_8 ... .quad LBB0_7 .LJTI0_1: .quad LBB0_11 .quad LBB0_13 ... Although we do have labels .LJTI0_0 and .LJTI0_1, but since they have prefix '.L' so they won't appear in the .o file like other symbols. Run: llvm-objdump -Sr test.o ... 4: 67 01 00 00 03 00 00 00 r1 <<= 0x3 5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000028: R_BPF_64_64 .rodata 7: 0f 12 00 00 00 00 00 00 r2 += r1 ... 29: 67 01 00 00 03 00 00 00 r1 <<= 0x3 30: 18 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xf0 ll 00000000000000f0: R_BPF_64_64 .rodata 32: 0f 12 00 00 00 00 00 00 r2 += r1 The size of jump table is not obvious. The libbpf needs to check all relocations against .rodata section in order to get precise size in order to construct bpf maps. Example 2 (Simple computed goto): ================================= Code: int bar(int a) { __label__ l1, l2; void * volatile tgt; int ret = 0; if (a) tgt = &&l1; // synthetic jump table generated here else tgt = &&l2; // another synthetic jump table goto *tgt; l1: ret += 1; l2: ret += 2; return ret; } Compile: clang --target=bpf -mcpu=v4 -O2 -c test1.c Objdump: llvm-objdump -Sr test1.o 0: 18 02 00 00 50 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x50 ll 0000000000000000: R_BPF_64_64 .text 2: 16 01 02 00 00 00 00 00 if w1 == 0x0 goto +0x2 <bar+0x28> 3: 18 02 00 00 40 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x40 ll 0000000000000018: R_BPF_64_64 .text 5: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r2 6: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8) 7: 0d 01 00 00 00 00 00 00 gotox r1 8: b4 00 00 00 03 00 00 00 w0 = 0x3 9: 05 00 01 00 00 00 00 00 goto +0x1 <bar+0x58> 10: b4 00 00 00 02 00 00 00 w0 = 0x2 11: 95 00 00 00 00 00 00 00 exit For this case, there is no jump table so it would be hard to track offset during verification esp. when offset needs adjustment. So practically we need to create two jump tables for '&&l1' and '&&l2' respectively. Example 3 (More complicated computed goto): =========================================== Code: int foo(int a, int b) { __label__ l1, l2, l3, l4; void *jt1[] = {[0]=&&l1, [1]=&&l2}; void *jt2[] = {[0]=&&l3, [1]=&&l4}; int ret = 0; goto *jt1[a % 2]; l1: ret += 1; l2: ret += 3; goto *jt2[b % 2]; l3: ret += 5; l4: ret += 7; return ret; } Compile: clang --target=bpf -mcpu=v4 -O2 -S test2.c Asm code: ... r3 = (s32)r2 r3 <<= 3 r2 = .L__const.foo.jt2 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = .L__const.foo.jt1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp1: # Block address taken LBB0_2: # %l2 ... .type .L__const.foo.jt1,@object # @__const.foo.jt1 .section .rodata,"a",@progbits .p2align 3, 0x0 .L__const.foo.jt1: .quad .Ltmp0 .quad .Ltmp1 .size .L__const.foo.jt1, 16 .type .L__const.foo.jt2,@object # @__const.foo.jt2 .p2align 3, 0x0 .L__const.foo.jt2: .quad .Ltmp2 .quad .Ltmp3 .size .L__const.foo.jt2, 16 Similar to switch statement case, for the binary, the symbols .L__const.foo.jt* will not show up in the symbol table and jump table will be in .rodata section. We need to resolve Example 2 case. Also with more libbpf work (dealing with .rodata sections etc.), everything should work fine for Examples 1 and 3. But we could do better by - Replacing symbols like .L<...> with symbols appearing in symbol table. - Add jump tables to .jumptables section instead of .rodata section. This should make things easier for libbpf. User can also benefit from this as relocation/section will be easy to check. Next two patches will fix Example 2 and improve all of them as mentioned in the above.
Example 2, Asm code: ... # %bb.0: # %entry r2 = .LJTI0_0 ll r2 = *(u64 *)(r2 + 0) r3 = .LJTI0_1 ll r3 = *(u64 *)(r3 + 0) if w1 == 0 goto LBB0_2 # %bb.1: # %entry r3 = r2 LBB0_2: # %entry *(u64 *)(r10 - 8) = r3 r1 = *(u64 *)(r10 - 8) gotox r1 .Ltmp0: # Block address taken LBB0_3: # %l1 w0 = 3 goto LBB0_5 .Ltmp1: # Block address taken LBB0_4: # %l2 w0 = 2 LBB0_5: # %.split exit ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_3 .LJTI0_1: .quad LBB0_4 Example 3, Asm Code: r3 = (s32)r2 r3 <<= 3 r2 = .LJTI0_0 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = .LJTI0_1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp1: # Block address taken LBB0_2: # %l2 # =>This Inner Loop Header: Depth=1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp2: # Block address taken LBB0_3: # %l3 w0 += 5 goto LBB0_5 .Ltmp3: # Block address taken LBB0_4: # %l4 LBB0_5: # %.split17 w0 += 7 exit ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_3 .quad LBB0_4 .LJTI0_1: .quad LBB0_1 .quad LBB0_2 # -- End function .type .L__const.foo.jt1,@object # @__const.foo.jt1 .p2align 3, 0x0 .L__const.foo.jt1: .quad .Ltmp0 .quad .Ltmp1 .size .L__const.foo.jt1, 16 .type .L__const.foo.jt2,@object # @__const.foo.jt2 .p2align 3, 0x0 .L__const.foo.jt2: .quad .Ltmp2 .quad .Ltmp3 .size .L__const.foo.jt2, 16 Note that for both above examples, the jump table section is '.rodata' and labels have '.L' prefix which means labels won't show up in the symbol table. As mentioned in previous patch, we want to - Move jump tables to '.jumptables' section - Rename '.L*' labels with proper labels which are visible in symbol table. Note that for Example 3, there are extra global functions like .L__const.foo.jt1 and .L__const.foo.jt2 and we are not able to remove them. But they won't show up in symbol table either.
For jumptables from switch statements, generate 'llvm-readelf -s' visible symbols and put jumptables into a dedicated section. Most work from Eduard. For the previous example 1, Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c Asm code: ... # %bb.1: # %entry r1 <<= 3 r2 = BPF.JT.0.0 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_2: w1 = 18 goto LBB0_9 ... # %bb.10: # %sw.epilog r1 <<= 3 r2 = BPF.JT.0.1 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_11: w1 = 8 goto LBB0_16 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_2 .quad LBB0_8 ... .quad LBB0_7 .size BPF.JT.0.0, 240 BPF.JT.0.1: .quad LBB0_11 .quad LBB0_13 ... .quad LBB0_12 .size BPF.JT.0.1, 256 And symbols BPF.JT.0.{0,1} will be in symbol table. The final binary: 4: 67 01 00 00 03 00 00 00 r1 <<= 0x3 5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000028: R_BPF_64_64 BPF.JT.0.0 7: 0f 12 00 00 00 00 00 00 r2 += r1 ... 29: 67 01 00 00 03 00 00 00 r1 <<= 0x3 30: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 00000000000000f0: R_BPF_64_64 BPF.JT.0.1 32: 0f 12 00 00 00 00 00 00 r2 += r1 ... Symbol table: 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000000 4 OBJECT GLOBAL DEFAULT 6 ret_user 6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND bar 7: 00000000000000f0 256 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 and [ 4] .jumptables PROGBITS 0000000000000000 0001c8 0001f0 00 0 0 1 For the previous example 2, Compile: clang --target=bpf -mcpu=v4 -O2 -S test1.c Asm code: ... # %bb.0: # %entry r2 = BPF.JT.0.0 ll r2 = *(u64 *)(r2 + 0) r3 = BPF.JT.0.1 ll r3 = *(u64 *)(r3 + 0) if w1 == 0 goto LBB0_2 # %bb.1: # %entry r3 = r2 LBB0_2: # %entry *(u64 *)(r10 - 8) = r3 r1 = *(u64 *)(r10 - 8) gotox r1 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_3 .size BPF.JT.0.0, 8 BPF.JT.0.1: .quad LBB0_4 .size BPF.JT.0.1, 8 The binary: clang --target=bpf -mcpu=v4 -O2 -c test1.c 0: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000000: R_BPF_64_64 BPF.JT.0.0 2: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0x0) 3: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll 0000000000000018: R_BPF_64_64 BPF.JT.0.1 5: 79 33 00 00 00 00 00 00 r3 = *(u64 *)(r3 + 0x0) 6: 16 01 01 00 00 00 00 00 if w1 == 0x0 goto +0x1 <bar+0x40> 7: bf 23 00 00 00 00 00 00 r3 = r2 8: 7b 3a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r3 9: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8) 10: 0d 01 00 00 00 00 00 00 gotox r1 4: 0000000000000000 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000008 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 [ 4] .jumptables PROGBITS 0000000000000000 0000b8 000010 00 0 0 1 For the previous example 3, Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c Asm code: ... r3 = (s32)r2 r3 <<= 3 r2 = BPF.JT.0.0 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = BPF.JT.0.1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 # =>This Inner Loop Header: Depth=1 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_3 .quad LBB0_4 .size BPF.JT.0.0, 16 BPF.JT.0.1: .quad LBB0_1 .quad LBB0_2 .size BPF.JT.0.1, 16 The binary: clang --target=bpf -mcpu=v4 -O2 -c test2.c 12: bf 23 20 00 00 00 00 00 r3 = (s32)r2 13: 67 03 00 00 03 00 00 00 r3 <<= 0x3 14: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000070: R_BPF_64_64 BPF.JT.0.0 16: 0f 32 00 00 00 00 00 00 r2 += r3 17: bf 11 20 00 00 00 00 00 r1 = (s32)r1 18: 67 01 00 00 03 00 00 00 r1 <<= 0x3 19: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll 0000000000000098: R_BPF_64_64 BPF.JT.0.1 21: 0f 13 00 00 00 00 00 00 r3 += r1 4: 0000000000000000 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000010 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 [ 4] .jumptables PROGBITS 0000000000000000 000160 000020 00 0 0 1
This is temporary and it makes easy to run bpf selftests. Once kernel side is ready, we will implement CPU V5 which will support jump tables.
To adjust, add '-mllvm -bpf-min-jump-table-entries=<n>' to the compilation flags.
39dc3c4
to
da16130
Compare
@aspsk I just rebased on top of latest llvm-project main branch. With this llvm version, I tried with latest bpf-next with your current patch set To build bpf selftest, I got the following libbpf warnings like
I did a hack like below
to allow jump table with switch cases 3 or above. When running
and a crash
I think maybe it is time to send another revision? |
Yes, makes sense, plus there is potentially yet another instruction to be added for static keys.
Thanks! The current version is here: https://github.com/aspsk/bpf-next/tree/wip/indirect-jumps
Do you think this is related to the patch or just to the latest llvm?
Thanks a lot! I will run all tests with
Planning to send it this week. (WIP, but I still need to address one comment from Eduard and one from Andrii.) |
and
I checked libbpf source code. Looks like libbpf needs to handle this.
The above is just hack to add
but change to cpuv4 for progs/bpf_goto_x.c.
Thanks! |
; CHECK-NEXT: LBB0_5: # %sw.epilog | ||
; CHECK-NEXT: w0 = 0 | ||
; CHECK-NEXT: exit | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use
; UTC_ARGS: --disable
; CHECCK: .section ...
...
; UTC_ARGS: --enable
then can use the UTC script for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will do.
; return ret; | ||
; } | ||
; | ||
; Compilation Flags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may search ;--- gen
within llvm/test. Those files utilize https://llvm.org/docs/TestingGuide.html#extra-files to make update easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following:
$ cat test/CodeGen/BPF/jump_table_blockaddr.ll
; RUN: rm -rf %t && split-file %s %t && cd %t
; RUN: llc -march=bpf -mcpu=v4 < %s | FileCheck %s
; CHECK: bar
;--- test.c
int bar(int a) {
__label__ l1, l2;
void * volatile tgt;
int ret = 0;
if (a)
tgt = &&l1; // synthetic jump table generated here
else
tgt = &&l2; // another synthetic jump table
goto *tgt;
l1: ret += 1;
l2: ret += 2;
return ret;
}
;--- gen
clang --target=bpf -mcpu=v4 -O2 -emit-llvm -S test.c -o -
;--- test.ll
And run utils/update_test_body.py test/CodeGen/BPF/jump_table_blockaddr.ll
. It does generate .ll file properly.
...
;--- test.ll
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "bpf"
; Function Attrs: nofree norecurse nounwind memory(inaccessiblemem: readwrite)
define dso_local range(i32 2, 4) i32 @bar(i32 noundef %0) local_unnamed_addr #0 {
%2 = alloca ptr, align 8
%3 = icmp eq i32 %0, 0
%4 = select i1 %3, ptr blockaddress(@bar, %7), ptr blockaddress(@bar, %6)
store volatile ptr %4, ptr %2, align 8, !tbaa !2
%5 = load volatile ptr, ptr %2, align 8, !tbaa !2
indirectbr ptr %5, [label %6, label %7]
...
But with the above generated .ll file, update_llc_test_checks.py not working any more.
$ utils/update_llc_test_checks.py test/CodeGen/BPF/jump_table_blockaddr.ll
WARNING: Skipping unparsable RUN line: rm -rf %t && split-file %s %t && cd %t
llc: error: llc: <stdin>:7:1: error: expected top-level entity
int bar(int a) {
...
But we do want to have IR in the test and also want to have update_llc_test_checks.py, so for now, I will stick to update_llc_test_checks.py.
e364e86
to
1bdae16
Compare
1bdae16
to
832bab5
Compare
Add jump table (switch statement and computed goto) support for BPF backend.
A
gotox <reg>
insn is implemented and the<reg>
holds the target insn where the gotox will go.For a switch statement like
and the final binary
Note that for the above example,
-mllvm -bpf-min-jump-table-entries=5
should be in compilation flags as the current default bpf-min-jump-table-entries is 13. For example.For computed goto like
The final binary:
A more complicated test with both switch-statement triggered jump table and compute gotos:
Compile with
The binary:
You can see jump table symbols are all different.