[BPF] Support Jump Table #149715

yonghong-song · 2025-07-20T15:45:00Z

Add jump table (switch statement and computed goto) support for BPF backend.
A gotox <reg> insn is implemented and the <reg> holds the target insn where the gotox will go.

For a switch statement like

...
            switch (ctx->x) {
            case 1: ret_user = 18; break;
            case 20: ret_user = 6; break;
            case 16: ret_user = 9; break;
            case 6: ret_user = 16; break;
            case 8: ret_user = 14; break;
            case 30: ret_user = 2; break;
            default: ret_user = 1; break;
            }
...

and the final binary

    The final binary:
       4:       67 01 00 00 03 00 00 00 r1 <<= 0x3                                                           
       5:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll                                  
                0000000000000028:  R_BPF_64_64  BPF.JT.0.0                                                   
       7:       0f 12 00 00 00 00 00 00 r2 += r1
        ...
    Symbol table:
     4: 0000000000000000   240 OBJECT  GLOBAL DEFAULT     4 BPF.JT.0.0
     5: 0000000000000000     4 OBJECT  GLOBAL DEFAULT     6 ret_user
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bar
     7: 00000000000000f0   256 OBJECT  GLOBAL DEFAULT     4 BPF.JT.0.1
    and
  [ 4] .jumptables       PROGBITS        0000000000000000 0001c8 0001f0 00      0   0  1

Note that for the above example, -mllvm -bpf-min-jump-table-entries=5 should be in compilation flags as the current default bpf-min-jump-table-entries is 13. For example.

clang --target=bpf -mcpu=v4 -O2 -mllvm -bpf-min-jump-table-entries=5 -S -g test.c

For computed goto like

      int foo(int a, int b) {
        __label__ l1, l2, l3, l4;
        void *jt1[] = {[0]=&&l1, [1]=&&l2};
        void *jt2[] = {[0]=&&l3, [1]=&&l4};
        int ret = 0;
    
        goto *jt1[a % 2];
        l1: ret += 1;
        l2: ret += 3;
        goto *jt2[b % 2];
        l3: ret += 5;
        l4: ret += 7;
        return ret;
      }

The final binary:

      12:       bf 23 20 00 00 00 00 00 r3 = (s32)r2
      13:       67 03 00 00 03 00 00 00 r3 <<= 0x3
      14:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
                0000000000000070:  R_BPF_64_64  BPF.JT.0.0
      16:       0f 32 00 00 00 00 00 00 r2 += r3
      17:       bf 11 20 00 00 00 00 00 r1 = (s32)r1
      18:       67 01 00 00 03 00 00 00 r1 <<= 0x3
      19:       18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll
                0000000000000098:  R_BPF_64_64  BPF.JT.0.1
      21:       0f 13 00 00 00 00 00 00 r3 += r1
    
  [ 4] .jumptables       PROGBITS        0000000000000000 000160 000020 00      0   0  1

     4: 0000000000000000    16 OBJECT  GLOBAL DEFAULT     4 BPF.JT.0.0
     5: 0000000000000010    16 OBJECT  GLOBAL DEFAULT     4 BPF.JT.0.1

A more complicated test with both switch-statement triggered jump table and compute gotos:

$ cat test3.c
struct simple_ctx {
        int x;
        int y;
        int z;
};

int ret_user, ret_user2;
void bar(void);
int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2, int a, int b)
{
        __label__ l1, l2, l3, l4;
        void *jt1[] = {[0]=&&l1, [1]=&&l2};
        void *jt2[] = {[0]=&&l3, [1]=&&l4};
        int ret = 0;

        goto *jt1[a % 2];
        l1: ret += 1;
        l2: ret += 3;
        goto *jt2[b % 2];
        l3: ret += 5;
        l4: ret += 7;

        bar();

        switch (ctx->x) {
        case 1: ret_user = 18; break;
        case 20: ret_user = 6; break;
        case 16: ret_user = 9; break;
        case 6: ret_user = 16; break;
        case 8: ret_user = 14; break;
        case 30: ret_user = 2; break;
        default: ret_user = 1; break;
        }

        return ret;
}

Compile with

  clang --target=bpf -mcpu=v4 -O2 -S test3.c
  clang --target=bpf -mcpu=v4 -O2 -c test3.c

The binary:

     /* For computed goto */
      13:       bf 42 20 00 00 00 00 00 r2 = (s32)r4                                                         
      14:       67 02 00 00 03 00 00 00 r2 <<= 0x3                                                           
      15:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll                                  
                0000000000000078:  R_BPF_64_64  BPF.JT.0.1                                                   
      17:       0f 21 00 00 00 00 00 00 r1 += r2                                                             
      18:       bf 32 20 00 00 00 00 00 r2 = (s32)r3                                                         
      19:       67 02 00 00 03 00 00 00 r2 <<= 0x3                                                           
      20:       18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll                                  
                00000000000000a0:  R_BPF_64_64  BPF.JT.0.2                                                   
      22:       0f 23 00 00 00 00 00 00 r3 += r2

      /* For switch statement */
      39:       67 01 00 00 03 00 00 00 r1 <<= 0x3
      40:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
                0000000000000140:  R_BPF_64_64  BPF.JT.0.0
      42:       0f 12 00 00 00 00 00 00 r2 += r1

You can see jump table symbols are all different.

github-actions · 2025-07-20T15:48:04Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/BPF/BPFInstrInfo.cpp

aspsk · 2025-07-21T06:38:56Z

Thanks! (Building this && rebasing my branch.)

aspsk · 2025-07-21T08:59:29Z

All looks to be compiling properly, I should be able to make this work altogether.

I have two questions & nits below, will split them in two comments.

Question 1: For switches and for computed goto relocations look a bit different. Switch:

       4:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
                0000000000000020:  R_BPF_64_64  BPF.JT.0.0

computed gotos:

      55:       18 02 00 00 28 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x28 ll
                00000000000001b8:  R_BPF_64_64  .jumptables
...
      63:       18 01 00 00 38 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x38 ll
                00000000000001f8:  R_BPF_64_64  .jumptables

The latter two point to properly defined symbols:

     3: 0000000000000028    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt1
     4: 0000000000000038    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt2

but this is another step to find those? Can relocation point to the symbol directly as with the switch?

aspsk · 2025-07-21T08:59:41Z

Currently, jump tables contain 8 bytes per entry, is this intentional? (the offsets will never be greater than 4 bytes):

Hex dump of section '.jumptables':
0x00000000 48000000 00000000 c8000000 00000000 H...............
0x00000010 88000000 00000000 a8000000 00000000 ................
etc.

One related bug: the size of BPF.JT.0.0 computed as it points to 4-byte entries, and the __const.simple_test.jt* computed as they point to 8-byte entries:

Symbol table '.symtab' contains 20 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS aspsk_play.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     3 syscall
     3: 0000000000000028    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt1
     4: 0000000000000038    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt2
...
    16: 0000000000000000    20 NOTYPE  GLOBAL DEFAULT     5 BPF.JT.0.0

^ here the first two have size 2 (as in your example above), the BPF.JT.0.0 actually has size 5 (a switch from my test). However, symbol size of it is 20 = 5*4, while for __const.simple_test.jt1 it is 16 = 2*8

Signed-off-by: Anton Protopopov <[email protected]>

aspsk · 2025-07-21T12:50:11Z

Ok, my bpf_goto_x tests from RFC pass now with this branch (my kernel branch is here). I will continue with adding support/tests for computed goto and addressing the comments from RFC (main ones left is on register liveness and removing min,max->index). [Just in case, I am on PTO in mountains this and the next week, and may be 100% offline on some days.]

yonghong-song · 2025-07-21T15:02:29Z

All looks to be compiling properly, I should be able to make this work altogether.

I have two questions & nits below, will split them in two comments.

Question 1: For switches and for computed goto relocations look a bit different. Switch:
       4:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
                0000000000000020:  R_BPF_64_64  BPF.JT.0.0
computed gotos:
      55:       18 02 00 00 28 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x28 ll
                00000000000001b8:  R_BPF_64_64  .jumptables
...
      63:       18 01 00 00 38 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x38 ll
                00000000000001f8:  R_BPF_64_64  .jumptables
The latter two point to properly defined symbols:
     3: 0000000000000028    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt1
     4: 0000000000000038    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt2
but this is another step to find those? Can relocation point to the symbol directly as with the switch?

This part is generated by the compiler directly and the bpf backend is not involved. I need to do some investigation to find out why and whether we could make a change to relocate to the symbol directly or not.

yonghong-song · 2025-07-21T15:08:44Z

Currently, jump tables contain 8 bytes per entry, is this intentional? (the offsets will never be greater than 4 bytes):
Hex dump of section '.jumptables':
0x00000000 48000000 00000000 c8000000 00000000 H...............
0x00000010 88000000 00000000 a8000000 00000000 ................
etc.
One related bug: the size of BPF.JT.0.0 computed as it points to 4-byte entries, and the __const.simple_test.jt* computed as they point to 8-byte entries:
Symbol table '.symtab' contains 20 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS aspsk_play.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     3 syscall
     3: 0000000000000028    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt1
     4: 0000000000000038    16 OBJECT  LOCAL  DEFAULT     5 __const.simple_test.jt2
...
    16: 0000000000000000    20 NOTYPE  GLOBAL DEFAULT     5 BPF.JT.0.0
^ here the first two have size 2 (as in your example above), the BPF.JT.0.0 actually has size 5 (a switch from my test). However, symbol size of it is 20 = 5*4, while for __const.simple_test.jt1 it is 16 = 2*8

Thanks. I will fix the bug. Regarding why we have 8 byte jump table entry. I guess this is probably due to the address is calculated from the start of the section.

yonghong-song · 2025-07-21T15:11:53Z

Ok, my bpf_goto_x tests from RFC pass now with this branch (my kernel branch is here). I will continue with adding support/tests for computed goto and addressing the comments from RFC (main ones left is on register liveness and removing min,max->index). [Just in case, I am on PTO in mountains this and the next week, and may be 100% offline on some days.]

Great. I will try to address the bug you found and the relocation difference between switch-statement vs. computed goto ASAP.

yonghong-song · 2025-07-21T15:31:33Z

Just pushed a fix (BPF.JT.0.0/1 size) discovered by @aspsk in the above.

eddyz87 · 2025-07-21T17:03:25Z

Currently JT offsets are calculated in bytes, but I think it still would be simpler for libbpf/kernel if offsets would be calculated in instructions. Also, there would be no need to track offsets as 8 bytes, 4 bytes would suffice. The following part was responsible for this in old pr:

void BPFAsmPrinter::emitJumpTableInfo()
    ...
    SmallPtrSet<const MachineBasicBlock *, 16> EmittedSets;
     auto *Base: const MCSymbolRefExpr * = MCSymbolRefExpr::create(Symbol: getJXAnchorSymbol(JTI), &Ctx: OutContext);
     for (const MachineBasicBlock *MBB : JTBBs) {
       if (!EmittedSets.insert(Ptr: MBB).second)
         continue;

       // Offset from gotox to target basic block expressed in number
       // of instructions, e.g.:
       //
       //   .L0_0_set_4 = ((LBB0_4 - .LBPF.JX.0.0) >> 3) - 1
       const MCExpr *LHS = MCSymbolRefExpr::create(Symbol: MBB->getSymbol(), &Ctx: OutContext);
       OutStreamer->emitAssignment(
           Symbol: GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber()),
           Value: MCBinaryExpr::createSub(
               LHS: MCBinaryExpr::createAShr(
                   LHS: MCBinaryExpr::createSub(LHS, RHS: Base, &Ctx: OutContext),
                   RHS: MCConstantExpr::create(Value: 3, &Ctx: OutContext), &Ctx: OutContext),
               RHS: MCConstantExpr::create(Value: 1, &Ctx: OutContext), &Ctx: OutContext));
     }
     // BPF.JT.0.0:
     //    .long   .L0_0_set_4
     //    .long   .L0_0_set_2
     //    ...
     //    .size   BPF.JT.0.0, 128
     MCSymbol *JTStart = getJTPublicSymbol(JTI);
     OutStreamer->emitLabel(Symbol: JTStart);
     for (const MachineBasicBlock *MBB : JTBBs) {
       MCSymbol *SetSymbol = GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber());
       const MCExpr *V = MCSymbolRefExpr::create(Symbol: SetSymbol, &Ctx: OutContext);
       OutStreamer->emitValue(Value: V, Size: EntrySize);
     }
     const MCExpr *JTSize = MCConstantExpr::create(Value: JTBBs.size() * 4, &Ctx: OutContext);
     OutStreamer->emitELFSize(Symbol: JTStart, Value: JTSize);
   }
   ...

The expression for _set_ labels would probably be simpler in this pr.

eddyz87 · 2025-07-21T17:29:42Z

The following example is not handled:

int bar(int a) {
  __label__ l1, l2;
  void * volatile tgt;
  int ret = 0;
  if (a)
    tgt = &&l1;
  else
    tgt = &&l2;
  goto *tgt;
 l1: ret += 1;
 l2: ret += 2;
 return ret;
}

Currently the following code is produced:

$ (use-my-llvm ; clang -O2 -S -o - --target=bpf jt-vars.c )
        .file   "jt-vars.c"
        .text
        .globl  bar                             # -- Begin function bar
        .p2align        3
        .type   bar,@function
bar:                                    # @bar
# %bb.0:                                # %entry
        r2 = .Ltmp0 ll
        if w1 == 0 goto LBB0_2
# %bb.1:                                # %entry
        r2 = .Ltmp1 ll
LBB0_2:                                 # %entry
        *(u64 *)(r10 - 8) = r2
        r1 = *(u64 *)(r10 - 8)
        gotox r1
.Ltmp1:                                 # Block address taken
LBB0_3:                                 # %l1
        w0 = 3
        goto LBB0_5
.Ltmp0:                                 # Block address taken
LBB0_4:                                 # %l2
        w0 = 2
LBB0_5:                                 # %.split
        exit
.Lfunc_end0:
        .size   bar, .Lfunc_end0-bar
                                        # -- End function
        .addrsig

As discussed previously, r2 = .Ltmp0 ll should be converted to an access to a single element jump table. Otherwise kernel won't be able to adjust .Ltmp0 properly in verifier.c:do_misc_fixups.

yonghong-song · 2025-07-21T17:38:26Z

Currently JT offsets are calculated in bytes, but I think it still would be simpler for libbpf/kernel if offsets would be calculated in instructions. Also, there would be no need to track offsets as 8 bytes, 4 bytes would suffice. The following part was responsible for this in old pr:

void BPFAsmPrinter::emitJumpTableInfo()
    ...
    SmallPtrSet<const MachineBasicBlock *, 16> EmittedSets;
     auto *Base: const MCSymbolRefExpr * = MCSymbolRefExpr::create(Symbol: getJXAnchorSymbol(JTI), &Ctx: OutContext);
     for (const MachineBasicBlock *MBB : JTBBs) {
       if (!EmittedSets.insert(Ptr: MBB).second)
         continue;

       // Offset from gotox to target basic block expressed in number
       // of instructions, e.g.:
       //
       //   .L0_0_set_4 = ((LBB0_4 - .LBPF.JX.0.0) >> 3) - 1
       const MCExpr *LHS = MCSymbolRefExpr::create(Symbol: MBB->getSymbol(), &Ctx: OutContext);
       OutStreamer->emitAssignment(
           Symbol: GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber()),
           Value: MCBinaryExpr::createSub(
               LHS: MCBinaryExpr::createAShr(
                   LHS: MCBinaryExpr::createSub(LHS, RHS: Base, &Ctx: OutContext),
                   RHS: MCConstantExpr::create(Value: 3, &Ctx: OutContext), &Ctx: OutContext),
               RHS: MCConstantExpr::create(Value: 1, &Ctx: OutContext), &Ctx: OutContext));
     }
     // BPF.JT.0.0:
     //    .long   .L0_0_set_4
     //    .long   .L0_0_set_2
     //    ...
     //    .size   BPF.JT.0.0, 128
     MCSymbol *JTStart = getJTPublicSymbol(JTI);
     OutStreamer->emitLabel(Symbol: JTStart);
     for (const MachineBasicBlock *MBB : JTBBs) {
       MCSymbol *SetSymbol = GetJTSetSymbol(UID: JTI, MBBID: MBB->getNumber());
       const MCExpr *V = MCSymbolRefExpr::create(Symbol: SetSymbol, &Ctx: OutContext);
       OutStreamer->emitValue(Value: V, Size: EntrySize);
     }
     const MCExpr *JTSize = MCConstantExpr::create(Value: JTBBs.size() * 4, &Ctx: OutContext);
     OutStreamer->emitELFSize(Symbol: JTStart, Value: JTSize);
   }
   ...

The expression for _set_ labels would probably be simpler in this pr.

For computed goto, for 'const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();' the MJTI will nullptr. I didn't use the above to be consistent with computed goto jump table.

yonghong-song · 2025-07-21T17:38:53Z

The following example is not handled:

int bar(int a) {
  __label__ l1, l2;
  void * volatile tgt;
  int ret = 0;
  if (a)
    tgt = &&l1;
  else
    tgt = &&l2;
  goto *tgt;
 l1: ret += 1;
 l2: ret += 2;
 return ret;
}

Currently the following code is produced:

$ (use-my-llvm ; clang -O2 -S -o - --target=bpf jt-vars.c )
        .file   "jt-vars.c"
        .text
        .globl  bar                             # -- Begin function bar
        .p2align        3
        .type   bar,@function
bar:                                    # @bar
# %bb.0:                                # %entry
        r2 = .Ltmp0 ll
        if w1 == 0 goto LBB0_2
# %bb.1:                                # %entry
        r2 = .Ltmp1 ll
LBB0_2:                                 # %entry
        *(u64 *)(r10 - 8) = r2
        r1 = *(u64 *)(r10 - 8)
        gotox r1
.Ltmp1:                                 # Block address taken
LBB0_3:                                 # %l1
        w0 = 3
        goto LBB0_5
.Ltmp0:                                 # Block address taken
LBB0_4:                                 # %l2
        w0 = 2
LBB0_5:                                 # %.split
        exit
.Lfunc_end0:
        .size   bar, .Lfunc_end0-bar
                                        # -- End function
        .addrsig

As discussed previously, r2 = .Ltmp0 ll should be converted to an access to a single element jump table. Otherwise kernel won't be able to adjust .Ltmp0 properly in verifier.c:do_misc_fixups.

Yes, make sense. Missed this one.

yonghong-song · 2025-07-21T17:40:28Z

One more thing to check, all labels e.g. BPF.JT.0.0 is global. What if there are two bpf progs both having BPF.JT.0.0 and they need to be linked together? Does libbpf can handle this properly? We need to double check with libbpf for this.

yonghong-song · 2025-07-21T17:41:21Z

For the new version, computed goto has the same relocation mechanism based on symbols.

eddyz87 · 2025-07-21T17:57:29Z

One more thing to check, all labels e.g. BPF.JT.0.0 is global. What if there are two bpf progs both having BPF.JT.0.0 and they need to be linked together? Does libbpf can handle this properly? We need to double check with libbpf for this.

Good point, I think libbpf linker should be modified to take care of this.

eddyz87 · 2025-07-21T18:01:38Z

For computed goto, for 'const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();' the MJTI will nullptr. I didn't use the above to be consistent with computed goto jump table.

Well, arrays of labels declared in ".jumptables" sections can be lowered as jump tables, this way MF->getJumpTableInfo() won't be empty. A slight complication on llvm side is worth it if there would be simplification on kernel/libbpf side, imo.

yonghong-song · 2025-07-21T19:14:11Z

For computed goto, for 'const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();' the MJTI will nullptr. I didn't use the above to be consistent with computed goto jump table.

Well, arrays of labels declared in ".jumptables" sections can be lowered as jump tables, this way MF->getJumpTableInfo() won't be empty. A slight complication on llvm side is worth it if there would be simplification on kernel/libbpf side, imo.

Let me take a look.

aspsk · 2025-07-22T11:04:19Z

Ok. With this current patch (and my dev kernel branch) smth like this

__label__ l1, l2, l3, l4;
void *jt1[] = {[0]=&&l1, [1]=&&l2};
void *jt2[] = {[0]=&&l3, [1]=&&l4};

...

if (ctx->x % 2)
        goto *jt1[a];
else
        goto *jt2[b];

l1: ret += 1;
l2: ret += 3;
l3: ret += 5;
l4: ret += 7;

compiles into two map loads and one gotox:

# bpftool prog dump x id 42
...
; if (ctx->x % 2)
  10: (67) r3 <<= 3
  11: (18) r4 = 0xffff8881beef5f90
  13: (0f) r4 += r3
  14: (57) r2 &= 1
  15: (15) if r2 == 0x0 goto pc+4
  16: (67) r1 <<= 3
  17: (18) r4 = 0xffff888beefeef10
  19: (0f) r4 += r1
  20: (79) r1 = *(u64 *)(r4 +0)
  21: (0d) gotox r1
...

(and it verifies and run properly).

So far all looks good for me to start working on cleaning things up & more examples of computed gotos.

eddyz87 · 2025-08-21T00:43:40Z

@yonghong-song ,

When I try the example from a previous comment, I get the following error:

fatal error: error in backend: Cannot select: <U+001B>[0;36mt6<U+001B>[0m: ch = brind <U+001B>[0;30mt0<U+001B>[0m, <U+001B>[0;35mt5<U+001B>[0m                                                                                                                 
  <U+001B>[0;35mt5<U+001B>[0m: i64,ch = load<(invariant load (s64) from %ir.indirect.goto.dest.in, !tbaa !9)> <U+001B>[0;30mt0<U+001B>[0m, <U+001B>[0;32mt2<U+001B>[0m, undef:i64                                                                              
    <U+001B>[0;32mt2<U+001B>[0m: i64,ch = CopyFromReg <U+001B>[0;30mt0<U+001B>[0m, Register:i64 %8                                                                                                                                                             
In function: foo2

I'm at cc70bda3c145 (HEAD -> yhs-jumptable, yhs/jumptable) ("[BPF] Add some selftests").

Uh-oh, that's because I forgot to pass -mcpuv4 option.
We should probably do a better error reporting in such cases.

eddyz87 · 2025-08-21T01:43:05Z

Below slight modification of what Yonghong tried handles removal of leftover globals for the foo2 example. But it needs to be changed to a proper module pass and checks should be duplicated between lowering phase and this pass.

--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -30,6 +30,9 @@
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Module.h"
 #include "llvm/Support/Debug.h"
 #include <set>

@@ -321,6 +324,7 @@ private:
   bool insertMissingCallerSavedSpills();
   bool removeMayGotoZero();
   bool addExitAfterUnreachable();
+  bool removeUnusedGV();

 public:

@@ -338,6 +342,7 @@ public:
     Changed |= insertMissingCallerSavedSpills();
     Changed |= removeMayGotoZero();
     Changed |= addExitAfterUnreachable();
+    Changed |= removeUnusedGV();
     return Changed;
   }
 };
@@ -750,6 +755,29 @@ bool BPFMIPreEmitPeephole::addExitAfterUnreachable() {
   return true;
 }

+bool BPFMIPreEmitPeephole::removeUnusedGV() {
+  Module *M = MF->getFunction().getParent();
+  std::vector<GlobalVariable *> Targets;
+  for (GlobalVariable &Global : M->globals()) {
+    if (Global.getLinkage() != GlobalValue::PrivateLinkage)
+      continue;
+    if (!Global.isConstant() || !Global.hasInitializer())
+      continue;
+    Constant *CV = dyn_cast<Constant>(Global.getInitializer());
+    if (!CV)
+      continue;
+    ConstantArray *CA = dyn_cast<ConstantArray>(CV);
+    if (!CA)
+      continue;
+    Targets.push_back(&Global);
+  }
+  for (auto *G: Targets) {
+    G->replaceAllUsesWith(PoisonValue::get(G->getType())); // <----- Key change
+    G->eraseFromParent();
+  }
+  return true;
+}
+
 } // end default namespace

 INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole",

yonghong-song · 2025-08-22T22:44:10Z

Below slight modification of what Yonghong tried handles removal of leftover globals for the foo2 example. But it needs to be changed to a proper module pass and checks should be duplicated between lowering phase and this pass.

--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -30,6 +30,9 @@
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Module.h"
 #include "llvm/Support/Debug.h"
 #include <set>

@@ -321,6 +324,7 @@ private:
   bool insertMissingCallerSavedSpills();
   bool removeMayGotoZero();
   bool addExitAfterUnreachable();
+  bool removeUnusedGV();

 public:

@@ -338,6 +342,7 @@ public:
     Changed |= insertMissingCallerSavedSpills();
     Changed |= removeMayGotoZero();
     Changed |= addExitAfterUnreachable();
+    Changed |= removeUnusedGV();
     return Changed;
   }
 };
@@ -750,6 +755,29 @@ bool BPFMIPreEmitPeephole::addExitAfterUnreachable() {
   return true;
 }

+bool BPFMIPreEmitPeephole::removeUnusedGV() {
+  Module *M = MF->getFunction().getParent();
+  std::vector<GlobalVariable *> Targets;
+  for (GlobalVariable &Global : M->globals()) {
+    if (Global.getLinkage() != GlobalValue::PrivateLinkage)
+      continue;
+    if (!Global.isConstant() || !Global.hasInitializer())
+      continue;
+    Constant *CV = dyn_cast<Constant>(Global.getInitializer());
+    if (!CV)
+      continue;
+    ConstantArray *CA = dyn_cast<ConstantArray>(CV);
+    if (!CA)
+      continue;
+    Targets.push_back(&Global);
+  }
+  for (auto *G: Targets) {
+    G->replaceAllUsesWith(PoisonValue::get(G->getType())); // <----- Key change
+    G->eraseFromParent();
+  }
+  return true;
+}
+
 } // end default namespace

 INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole",

Just updated the pull request to address unused global variable issue. I moved the above code to doFinalization() which handles at module level.

aspsk · 2025-08-25T08:11:22Z

Thanks, tested it with my current branch.

yonghong-song · 2025-09-02T15:43:35Z

@aspsk I just rebased on top of latest llvm-project main branch.
Based on discussion with @4ast, we will not introduce v5 at this moment. So jump table support will be in v4. The reason is that adding more and more versions are not scale.

With this llvm version, I tried with latest bpf-next with your current patch set
https://lore.kernel.org/bpf/[email protected]/T/#mbfb4fc7728228e74fd17c5147a64f21933b9987f

To build bpf selftest, I got the following libbpf warnings like

libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)

I did a hack like below

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 606d7d5a48a7..51e390eb7141 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -742,7 +742,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
 # Define test_progs-cpuv4 test runner.
 ifneq ($(CLANG_CPUV4),)
 TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
-TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
+TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS -mllvm -bpf-min-jump-table-entries=3
 $(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
 endif

to allow jump table with switch cases 3 or above.

When running test_progs-cpuv4 -j, I got some failures like

libbpf: elf: skipping relo section(6) .rel.jumptables for section(5) .jumptables                                                                   
libbpf: elf: skipping unrecognized data section(22) .comment             
libbpf: elf: skipping section(23) .note.GNU-stack (size 0)                                                                                         
run_core_reloc_tests:PASS:obj_open 0 nsec                                                                                                          
run_core_reloc_tests:PASS:find_probe 0 nsec                                                                                                        
libbpf: prog 'test_core_bitfields_direct': BPF program load failed: -EINVAL                                                                        
libbpf: prog 'test_core_bitfields_direct': -- BEGIN PROG LOAD LOG --                                                                               
Failed to properly initialize insn array                                                                                                           
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0                                                   
-- END PROG LOAD LOG --                                                                                                                            
libbpf: prog 'test_core_bitfields_direct': failed to load: -EINVAL                                                                                 
libbpf: failed to load object 'test_core_reloc_bitfields_direct.bpf.o'                                                                             
run_core_reloc_tests:FAIL:obj_load unexpected error: -22 (errno 22)                                                                                
#84/50   core_reloc/direct:bitfields:FAIL 
...

and a crash

[  893.591003] BUG: kernel NULL pointer dereference, address: 0000000000000000                                                                     
[  893.591926] #PF: supervisor instruction fetch in kernel mode          
[  893.591926] #PF: error_code(0x0010) - not-present page                
[  893.591926] PGD 1084e8067 P4D 0                                                                                                                 
[  893.591926] Oops: Oops: 0010 [#1] SMP KASAN NOPTI                     
[  893.591926] CPU: 0 UID: 0 PID: 2014 Comm: test_progs-cpuv Tainted: G           OE       6.17.0-rc1-g9fec31b21ada #55 PREEMP                     
[  893.591926] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE                                                                                        
[  893.591926] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/4                    
[  893.591926] RIP: 0010:0x0                                                                                                                       
[  893.591926] Code: Unable to access opcode bytes at 0xffffffffffffffd6.                                                                          
[  893.591926] RSP: 0018:ff1100011c54f9f0 EFLAGS: 00010286                                                                                         
[  893.591926] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000                                                                   
[  893.591926] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000                                                                   
[  893.591926] RBP: ff1100011c54fad0 R08: 0000000000000000 R09: ff1100011c54fc04                                                                   
[  893.591926] R10: dffffc0000000000 R11: ffffffffa000d44c R12: ffa000000026d030                                                                   
[  893.591926] R13: ff1100011c54fbc0 R14: ff1100011c54fc00 R15: ff110001086ae952                                                                   
[  893.591926] FS:  00007f40ab1730c0(0000) GS:ff11000238634000(0000) knlGS:0000000000000000                                                        
[  893.591926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                   
[  893.591926] CR2: ffffffffffffffd6 CR3: 000000011114f002 CR4: 0000000000371ef0                                                                   
[  893.591926] Call Trace:          
[  893.591926]  <TASK>              
[  893.591926]  ? __pfx___cant_migrate+0x10/0x10                         
[  893.591926]  ? __pfx___might_resched+0x10/0x10                        
[  893.591926]  ? srso_alias_return_thunk+0x5/0xfbef5                    
[  893.591926]  ? migrate_disable+0xd0/0x170                             
[  893.591926]  ? srso_alias_return_thunk+0x5/0xfbef5                    
[  893.591926]  ? bpf_flow_dissect+0x130/0x3d0                           
[  893.591926]  ? bpf_prog_test_run_flow_dissector+0x355/0x550           
[  893.591926]  ? __pfx_bpf_prog_test_run_flow_dissector+0x10/0x10                                                                                 
[  893.591926]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10                 
[  893.591926]  ? srso_alias_return_thunk+0x5/0xfbef5                    
[  893.591926]  ? __fget_files+0x205/0x260                               
[  893.591926]  ? bpf_prog_test_run+0x262/0x4d0                          
[  893.591926]  ? __pfx_bpf_prog_test_run+0x10/0x10                      
[  893.591926]  ? __sys_bpf+0x4ff/0x740                                  
[  893.591926]  ? __pfx___sys_bpf+0x10/0x10                              
[  893.591926]  ? srso_alias_return_thunk+0x5/0xfbef5                    
[  893.591926]  ? xfd_validate_state+0x33/0x100                          
[  893.591926]  ? __x64_sys_bpf+0x7c/0x90                                
[  893.591926]  ? do_syscall_64+0x61/0x160                               
[  893.591926]  ? arch_exit_to_user_mode_prepare+0x9/0x50                
[  893.591926]  ? srso_alias_return_thunk+0x5/0xfbef5                    
[  893.591926]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e               
[  893.591926]  </TASK>

I think maybe it is time to send another revision?

aspsk · 2025-09-02T16:43:21Z

Hi @yonghong-song

@aspsk I just rebased on top of latest llvm-project main branch. Based on discussion with @4ast, we will not introduce v5 at this moment. So jump table support will be in v4. The reason is that adding more and more versions are not scale.

Yes, makes sense, plus there is potentially yet another instruction to be added for static keys.

With this llvm version, I tried with latest bpf-next with your current patch set https://lore.kernel.org/bpf/[email protected]/T/#mbfb4fc7728228e74fd17c5147a64f21933b9987f

Thanks! The current version is here: https://github.com/aspsk/bpf-next/tree/wip/indirect-jumps

To build bpf selftest, I got the following libbpf warnings like

libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)

Do you think this is related to the patch or just to the latest llvm?

I did a hack like below

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 606d7d5a48a7..51e390eb7141 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -742,7 +742,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
 # Define test_progs-cpuv4 test runner.
 ifneq ($(CLANG_CPUV4),)
 TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
-TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
+TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS -mllvm -bpf-min-jump-table-entries=3
 $(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
 endif

to allow jump table with switch cases 3 or above.

When running test_progs-cpuv4 -j, I got some failures like
[...] snip

Thanks a lot! I will run all tests with -bpf-min-jump-table-entries=3 tomorrow with my current branch and check the failures.

I think maybe it is time to send another revision?

Planning to send it this week. (WIP, but I still need to address one comment from Eduard and one from Andrii.)

yonghong-song · 2025-09-02T17:43:28Z

Hi @yonghong-song

@aspsk I just rebased on top of latest llvm-project main branch. Based on discussion with @4ast, we will not introduce v5 at this moment. So jump table support will be in v4. The reason is that adding more and more versions are not scale.

Yes, makes sense, plus there is potentially yet another instruction to be added for static keys.

With this llvm version, I tried with latest bpf-next with your current patch set https://lore.kernel.org/bpf/[email protected]/T/#mbfb4fc7728228e74fd17c5147a64f21933b9987f

Thanks! The current version is here: https://github.com/aspsk/bpf-next/tree/wip/indirect-jumps
To build bpf selftest, I got the following libbpf warnings like
libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping unrecognized data section(6) .comment
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)
libbpf: elf: skipping section(7) .note.GNU-stack (size 0)
Do you think this is related to the patch or just to the latest llvm?
The related code:

 3958                         } else if (strcmp(name, JUMPTABLES_SEC) == 0) {
 3959                                 memcpy(&obj->efile.jumptables_data, data, sizeof(*data));
 3960                                 obj->efile.jumptables_data_shndx = idx;
 3961                         } else {
 3962                                 pr_info("elf: skipping unrecognized data section(%d) %s\n",
 3963                                         idx, name);
 3964                         }

and

 3993                 } else {
 3994                         pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name,
 3995                                 (size_t)sh->sh_size);
 3996                 }

I checked libbpf source code. Looks like libbpf needs to handle this.

I did a hack like below

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 606d7d5a48a7..51e390eb7141 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -742,7 +742,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
 # Define test_progs-cpuv4 test runner.
 ifneq ($(CLANG_CPUV4),)
 TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
-TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
+TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS -mllvm -bpf-min-jump-table-entries=3
 $(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
 endif

to allow jump table with switch cases 3 or above.
When running test_progs-cpuv4 -j, I got some failures like
[...] snip

Thanks a lot! I will run all tests with -bpf-min-jump-table-entries=3 tomorrow with my current branch and check the failures.

The above is just hack to add -mllvm -bpf-min-jump-table-entries=3 in the Makefile file.
The proper way probably should be like below gcc style:

# The following tests contain C code that, although technically legal,
# triggers GCC warnings that cannot be disabled: declaration of
# anonymous struct types in function parameter lists.
progs/btf_dump_test_case_bitfields.c-bpf_gcc-CFLAGS := -Wno-error
progs/btf_dump_test_case_namespacing.c-bpf_gcc-CFLAGS := -Wno-error
progs/btf_dump_test_case_packing.c-bpf_gcc-CFLAGS := -Wno-error
progs/btf_dump_test_case_padding.c-bpf_gcc-CFLAGS := -Wno-error
progs/btf_dump_test_case_syntax.c-bpf_gcc-CFLAGS := -Wno-error

but change to cpuv4 for progs/bpf_goto_x.c.

I think maybe it is time to send another revision?

Planning to send it this week. (WIP, but I still need to address one comment from Eduard and one from Andrii.)

Thanks!

MaskRay · 2025-09-03T05:18:39Z

llvm/test/CodeGen/BPF/jump_table_swith_stmt.ll

+; CHECK-NEXT:  LBB0_5:                                 # %sw.epilog
+; CHECK-NEXT:       w0 = 0
+; CHECK-NEXT:       exit
+


You can use

; UTC_ARGS: --disable ; CHECCK: .section ... ... ; UTC_ARGS: --enable

then can use the UTC script for the rest.

Sounds good. Will do.

MaskRay · 2025-09-03T05:20:23Z

llvm/test/CodeGen/BPF/jump_table_blockaddr.ll

+;       return ret;
+;     }
+;
+; Compilation Flags:


You may search ;--- gen within llvm/test. Those files utilize https://llvm.org/docs/TestingGuide.html#extra-files to make update easier

I tried the following:

$ cat test/CodeGen/BPF/jump_table_blockaddr.ll ; RUN: rm -rf %t && split-file %s %t && cd %t ; RUN: llc -march=bpf -mcpu=v4 < %s | FileCheck %s ; CHECK: bar ;--- test.c int bar(int a) { __label__ l1, l2; void * volatile tgt; int ret = 0; if (a) tgt = &&l1; // synthetic jump table generated here else tgt = &&l2; // another synthetic jump table goto *tgt; l1: ret += 1; l2: ret += 2; return ret; } ;--- gen clang --target=bpf -mcpu=v4 -O2 -emit-llvm -S test.c -o - ;--- test.ll

And run utils/update_test_body.py test/CodeGen/BPF/jump_table_blockaddr.ll. It does generate .ll file properly.

... ;--- test.ll ; ModuleID = 'test.c' source_filename = "test.c" target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" target triple = "bpf" ; Function Attrs: nofree norecurse nounwind memory(inaccessiblemem: readwrite) define dso_local range(i32 2, 4) i32 @bar(i32 noundef %0) local_unnamed_addr #0 { %2 = alloca ptr, align 8 %3 = icmp eq i32 %0, 0 %4 = select i1 %3, ptr blockaddress(@bar, %7), ptr blockaddress(@bar, %6) store volatile ptr %4, ptr %2, align 8, !tbaa !2 %5 = load volatile ptr, ptr %2, align 8, !tbaa !2 indirectbr ptr %5, [label %6, label %7] ...

But with the above generated .ll file, update_llc_test_checks.py not working any more.

$ utils/update_llc_test_checks.py test/CodeGen/BPF/jump_table_blockaddr.ll WARNING: Skipping unparsable RUN line: rm -rf %t && split-file %s %t && cd %t llc: error: llc: <stdin>:7:1: error: expected top-level entity int bar(int a) { ...

But we do want to have IR in the test and also want to have update_llc_test_checks.py, so for now, I will stick to update_llc_test_checks.py.

aspsk · 2025-09-10T12:47:30Z

When running test_progs-cpuv4 -j, I got some failures like

@yonghong-song I've run all the test_progs tests on my current branch with -bpf-min-jump-table-entries=3. There are 129 tests which contain a gotox instruction (wrote a script to filter them out), 123 pass, 6 fail (but they also fail on vanilla bpf-next in my setup, as I somehow can't run vmtest locally -- my machine seems to have too old distro...). So I am preparing v2 and send it soon.

eddyz87

Hi @yonghong-song,

Did a second pass for this pull request, all looks good.
Left two nits.

eddyz87 · 2025-09-12T23:43:31Z

llvm/lib/Target/BPF/BPFISelLowering.cpp

+  SDValue Addr = DAG.getTargetGlobalAddress(GVal, DL, MVT::i64);
+
+  // Emit pseudo instruction
+  return SDValue(DAG.getMachineNode(BPF::LDIMM64, DL, MVT::i64, Addr), 0);


Question regarding direct BPF::LDIMM64 injection, like here, and BPFISD::Wrapper injection, like in getAddr.
As far as I understand, BPFISD::Wrapper ends up lowered to BPF::LDIMM64 because of the Pat rules in the BPFInstrInfo.td, hence these two techniques are effectively identical.
After landing this pull request:

LowerJumpTable and LowerConstantPool will use BPFISD::Wrapper through getAddr;

LowerGlobalAddress and LowerBlockAddress will use BPF::LDIMM64 directly.

Will it make sense to do a small refactoring first, removing BPFISD::Wrapper and replacing it with DAG.getMachineNode(BPF::LDIMM64, DL, MVT::i64, Addr)?

We could do this (removing BPFISD::Wrapper and put all global address with DAG.getMachineNode()). This will add more cases in CustomInserter. I still prefer to have LowerJumpTable and LowerConstantPool in current implementation as they are easier to understand at DAG level with similar patterns in other architectures.

eddyz87 · 2025-09-15T19:19:20Z

llvm/lib/Target/BPF/BPFISelLowering.cpp

  cl::desc("Expand memcpy into load/store pairs in order"));

+static cl::opt<unsigned> BPFMinimumJumpTableEntries(
+    "bpf-min-jump-table-entries", cl::init(13), cl::Hidden,


Just out of curiosity, why 13?
I see that setMinimumJumpTableEntries is called only for a couple for archs: AVR effectively disables jump tables bu using UINT_MAX, WebAssembly uses the value of 2 (always introduce jump tables), PPC uses 64, everything else uses default: 4.

Just out of curiosity, why 13?

This was suggested by Alexei in that private thread spawned from my RFC series, citing:

For now I would pick 13 to align with arm64 to make it slightly less random and hopefully won't regress performance.

eddyz87 · 2025-09-15T19:21:52Z

llvm/test/CodeGen/BPF/jump_table_blockaddr.ll

+; CHECK: 	.cfi_startproc
+; CHECK: # %bb.0:                                # %entry
+; CHECK: 	r2 = BPF.JT.0.0 ll
+; CHECK: 	r2 = *(u64 *)(r2 + 0)


Nit: I'd mask the register numbers in the tests using awk, to keep the tests a bit more stable.

I think we are fine here. If anything changed, the CHECKs can be easily regenerated with update_test_body.py script. Specific register numbers are helpful to ensure asm code correctness.

@progbits

NOTE 1: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary. Let us use cpu v4 for now. NOTE 2: An option -bpf-min-jump-table-entries is implemented to control the minimum number of entries to use a jump table on BPF. The default value 5 and this is to make it easy to test. Eventually we will increase min jump table entries to be 13. This patch adds jump table support. A new insn 'gotox <reg>' is added to allow goto through a register. The register represents the address in the current section. Example 1 (switch statement): ============================= Code: struct simple_ctx { int x; int y; int z; }; int ret_user, ret_user2; void bar(void); int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2) { switch (ctx->x) { case 1: ret_user = 18; break; case 20: ret_user = 6; break; case 16: ret_user = 9; break; case 6: ret_user = 16; break; case 8: ret_user = 14; break; case 30: ret_user = 2; break; default: ret_user = 1; break; } bar(); switch (ctx2->x) { case 0: ret_user2 = 8; break; case 31: ret_user2 = 5; break; case 13: ret_user2 = 8; break; case 1: ret_user2 = 3; break; case 11: ret_user2 = 4; break; default: ret_user2 = 29; break; } return 0; } Run: clang --target=bpf -mcpu=v4 -O2 -S test.c The assembly code: ... # %bb.1: # %entry r1 <<= 3 r2 = .LJTI0_0 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_2: w1 = 18 goto LBB0_9 ... # %bb.10: # %sw.epilog r1 <<= 3 r2 = .LJTI0_1 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_11: w1 = 8 goto LBB0_16 ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_2 .quad LBB0_8 ... .quad LBB0_7 .LJTI0_1: .quad LBB0_11 .quad LBB0_13 ... Although we do have labels .LJTI0_0 and .LJTI0_1, but since they have prefix '.L' so they won't appear in the .o file like other symbols. Run: llvm-objdump -Sr test.o ... 4: 67 01 00 00 03 00 00 00 r1 <<= 0x3 5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000028: R_BPF_64_64 .rodata 7: 0f 12 00 00 00 00 00 00 r2 += r1 ... 29: 67 01 00 00 03 00 00 00 r1 <<= 0x3 30: 18 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 r2 = 0xf0 ll 00000000000000f0: R_BPF_64_64 .rodata 32: 0f 12 00 00 00 00 00 00 r2 += r1 The size of jump table is not obvious. The libbpf needs to check all relocations against .rodata section in order to get precise size in order to construct bpf maps. Example 2 (Simple computed goto): ================================= Code: int bar(int a) { __label__ l1, l2; void * volatile tgt; int ret = 0; if (a) tgt = &&l1; // synthetic jump table generated here else tgt = &&l2; // another synthetic jump table goto *tgt; l1: ret += 1; l2: ret += 2; return ret; } Compile: clang --target=bpf -mcpu=v4 -O2 -c test1.c Objdump: llvm-objdump -Sr test1.o 0: 18 02 00 00 50 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x50 ll 0000000000000000: R_BPF_64_64 .text 2: 16 01 02 00 00 00 00 00 if w1 == 0x0 goto +0x2 <bar+0x28> 3: 18 02 00 00 40 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x40 ll 0000000000000018: R_BPF_64_64 .text 5: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r2 6: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8) 7: 0d 01 00 00 00 00 00 00 gotox r1 8: b4 00 00 00 03 00 00 00 w0 = 0x3 9: 05 00 01 00 00 00 00 00 goto +0x1 <bar+0x58> 10: b4 00 00 00 02 00 00 00 w0 = 0x2 11: 95 00 00 00 00 00 00 00 exit For this case, there is no jump table so it would be hard to track offset during verification esp. when offset needs adjustment. So practically we need to create two jump tables for '&&l1' and '&&l2' respectively. Example 3 (More complicated computed goto): =========================================== Code: int foo(int a, int b) { __label__ l1, l2, l3, l4; void *jt1[] = {[0]=&&l1, [1]=&&l2}; void *jt2[] = {[0]=&&l3, [1]=&&l4}; int ret = 0; goto *jt1[a % 2]; l1: ret += 1; l2: ret += 3; goto *jt2[b % 2]; l3: ret += 5; l4: ret += 7; return ret; } Compile: clang --target=bpf -mcpu=v4 -O2 -S test2.c Asm code: ... r3 = (s32)r2 r3 <<= 3 r2 = .L__const.foo.jt2 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = .L__const.foo.jt1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp1: # Block address taken LBB0_2: # %l2 ... .type .L__const.foo.jt1,@object # @__const.foo.jt1 .section .rodata,"a",@progbits .p2align 3, 0x0 .L__const.foo.jt1: .quad .Ltmp0 .quad .Ltmp1 .size .L__const.foo.jt1, 16 .type .L__const.foo.jt2,@object # @__const.foo.jt2 .p2align 3, 0x0 .L__const.foo.jt2: .quad .Ltmp2 .quad .Ltmp3 .size .L__const.foo.jt2, 16 Similar to switch statement case, for the binary, the symbols .L__const.foo.jt* will not show up in the symbol table and jump table will be in .rodata section. We need to resolve Example 2 case. Also with more libbpf work (dealing with .rodata sections etc.), everything should work fine for Examples 1 and 3. But we could do better by - Replacing symbols like .L<...> with symbols appearing in symbol table. - Add jump tables to .jumptables section instead of .rodata section. This should make things easier for libbpf. User can also benefit from this as relocation/section will be easy to check. Next two patches will fix Example 2 and improve all of them as mentioned in the above.

@progbits

Example 2, Asm code: ... # %bb.0: # %entry r2 = .LJTI0_0 ll r2 = *(u64 *)(r2 + 0) r3 = .LJTI0_1 ll r3 = *(u64 *)(r3 + 0) if w1 == 0 goto LBB0_2 # %bb.1: # %entry r3 = r2 LBB0_2: # %entry *(u64 *)(r10 - 8) = r3 r1 = *(u64 *)(r10 - 8) gotox r1 .Ltmp0: # Block address taken LBB0_3: # %l1 w0 = 3 goto LBB0_5 .Ltmp1: # Block address taken LBB0_4: # %l2 w0 = 2 LBB0_5: # %.split exit ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_3 .LJTI0_1: .quad LBB0_4 Example 3, Asm Code: r3 = (s32)r2 r3 <<= 3 r2 = .LJTI0_0 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = .LJTI0_1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp1: # Block address taken LBB0_2: # %l2 # =>This Inner Loop Header: Depth=1 w0 += 3 r1 = *(u64 *)(r2 + 0) gotox r1 .Ltmp2: # Block address taken LBB0_3: # %l3 w0 += 5 goto LBB0_5 .Ltmp3: # Block address taken LBB0_4: # %l4 LBB0_5: # %.split17 w0 += 7 exit ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI0_0: .quad LBB0_3 .quad LBB0_4 .LJTI0_1: .quad LBB0_1 .quad LBB0_2 # -- End function .type .L__const.foo.jt1,@object # @__const.foo.jt1 .p2align 3, 0x0 .L__const.foo.jt1: .quad .Ltmp0 .quad .Ltmp1 .size .L__const.foo.jt1, 16 .type .L__const.foo.jt2,@object # @__const.foo.jt2 .p2align 3, 0x0 .L__const.foo.jt2: .quad .Ltmp2 .quad .Ltmp3 .size .L__const.foo.jt2, 16 Note that for both above examples, the jump table section is '.rodata' and labels have '.L' prefix which means labels won't show up in the symbol table. As mentioned in previous patch, we want to - Move jump tables to '.jumptables' section - Rename '.L*' labels with proper labels which are visible in symbol table. Note that for Example 3, there are extra global functions like .L__const.foo.jt1 and .L__const.foo.jt2 and we are not able to remove them. But they won't show up in symbol table either.

@progbits

For jumptables from switch statements, generate 'llvm-readelf -s' visible symbols and put jumptables into a dedicated section. Most work from Eduard. For the previous example 1, Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c Asm code: ... # %bb.1: # %entry r1 <<= 3 r2 = BPF.JT.0.0 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_2: w1 = 18 goto LBB0_9 ... # %bb.10: # %sw.epilog r1 <<= 3 r2 = BPF.JT.0.1 ll r2 += r1 r1 = *(u64 *)(r2 + 0) gotox r1 LBB0_11: w1 = 8 goto LBB0_16 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_2 .quad LBB0_8 ... .quad LBB0_7 .size BPF.JT.0.0, 240 BPF.JT.0.1: .quad LBB0_11 .quad LBB0_13 ... .quad LBB0_12 .size BPF.JT.0.1, 256 And symbols BPF.JT.0.{0,1} will be in symbol table. The final binary: 4: 67 01 00 00 03 00 00 00 r1 <<= 0x3 5: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000028: R_BPF_64_64 BPF.JT.0.0 7: 0f 12 00 00 00 00 00 00 r2 += r1 ... 29: 67 01 00 00 03 00 00 00 r1 <<= 0x3 30: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 00000000000000f0: R_BPF_64_64 BPF.JT.0.1 32: 0f 12 00 00 00 00 00 00 r2 += r1 ... Symbol table: 4: 0000000000000000 240 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000000 4 OBJECT GLOBAL DEFAULT 6 ret_user 6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND bar 7: 00000000000000f0 256 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 and [ 4] .jumptables PROGBITS 0000000000000000 0001c8 0001f0 00 0 0 1 For the previous example 2, Compile: clang --target=bpf -mcpu=v4 -O2 -S test1.c Asm code: ... # %bb.0: # %entry r2 = BPF.JT.0.0 ll r2 = *(u64 *)(r2 + 0) r3 = BPF.JT.0.1 ll r3 = *(u64 *)(r3 + 0) if w1 == 0 goto LBB0_2 # %bb.1: # %entry r3 = r2 LBB0_2: # %entry *(u64 *)(r10 - 8) = r3 r1 = *(u64 *)(r10 - 8) gotox r1 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_3 .size BPF.JT.0.0, 8 BPF.JT.0.1: .quad LBB0_4 .size BPF.JT.0.1, 8 The binary: clang --target=bpf -mcpu=v4 -O2 -c test1.c 0: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000000: R_BPF_64_64 BPF.JT.0.0 2: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0x0) 3: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll 0000000000000018: R_BPF_64_64 BPF.JT.0.1 5: 79 33 00 00 00 00 00 00 r3 = *(u64 *)(r3 + 0x0) 6: 16 01 01 00 00 00 00 00 if w1 == 0x0 goto +0x1 <bar+0x40> 7: bf 23 00 00 00 00 00 00 r3 = r2 8: 7b 3a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r3 9: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 0x8) 10: 0d 01 00 00 00 00 00 00 gotox r1 4: 0000000000000000 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000008 8 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 [ 4] .jumptables PROGBITS 0000000000000000 0000b8 000010 00 0 0 1 For the previous example 3, Compile: clang --target=bpf -mcpu=v4 -O2 -S test.c Asm code: ... r3 = (s32)r2 r3 <<= 3 r2 = BPF.JT.0.0 ll r2 += r3 r1 = (s32)r1 r1 <<= 3 r3 = BPF.JT.0.1 ll r3 += r1 w0 = 0 r1 = *(u64 *)(r3 + 0) gotox r1 .Ltmp0: # Block address taken LBB0_1: # %l1 # =>This Inner Loop Header: Depth=1 w0 += 1 # =>This Inner Loop Header: Depth=1 ... .section .jumptables,"",@progbits BPF.JT.0.0: .quad LBB0_3 .quad LBB0_4 .size BPF.JT.0.0, 16 BPF.JT.0.1: .quad LBB0_1 .quad LBB0_2 .size BPF.JT.0.1, 16 The binary: clang --target=bpf -mcpu=v4 -O2 -c test2.c 12: bf 23 20 00 00 00 00 00 r3 = (s32)r2 13: 67 03 00 00 03 00 00 00 r3 <<= 0x3 14: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000070: R_BPF_64_64 BPF.JT.0.0 16: 0f 32 00 00 00 00 00 00 r2 += r3 17: bf 11 20 00 00 00 00 00 r1 = (s32)r1 18: 67 01 00 00 03 00 00 00 r1 <<= 0x3 19: 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r3 = 0x0 ll 0000000000000098: R_BPF_64_64 BPF.JT.0.1 21: 0f 13 00 00 00 00 00 00 r3 += r1 4: 0000000000000000 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.0 5: 0000000000000010 16 OBJECT GLOBAL DEFAULT 4 BPF.JT.0.1 [ 4] .jumptables PROGBITS 0000000000000000 000160 000020 00 0 0 1

This is temporary and it makes easy to run bpf selftests. Once kernel side is ready, we will implement CPU V5 which will support jump tables.

To adjust, add '-mllvm -bpf-min-jump-table-entries=<n>' to the compilation flags.

llvm-ci · 2025-09-16T18:14:27Z

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/16135

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
...
[325/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/Attr.cpp.o
[326/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTestDeclVisitor.cpp.o
[327/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTestTypeLocVisitor.cpp.o
[328/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/ReplacementsYamlTest.cpp.o
[329/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/FixedPointString.cpp.o
[330/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/ParsedSourceLocationTest.cpp.o
[331/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RefactoringActionRulesTest.cpp.o
[332/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/InitListExprPostOrder.cpp.o
[333/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/InitListExprPostOrderNoQueue.cpp.o
[334/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/AST/ASTImporterTest.cpp.o
FAILED: tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/AST/ASTImporterTest.cpp.o 
/usr/bin/c++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/unittests -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/Tooling -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googletest/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googlemock/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-dangling-reference -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -O3 -DNDEBUG -std=c++17  -Wno-variadic-macros -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -MD -MT tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/AST/ASTImporterTest.cpp.o -MF tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/AST/ASTImporterTest.cpp.o.d -o tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/AST/ASTImporterTest.cpp.o -c /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/AST/ASTImporterTest.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[335/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/TreeTestBase.cpp.o
[336/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/CompilerInstanceTest.cpp.o
[337/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/ReparseWorkingDirTest.cpp.o
[338/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTestPostOrderVisitor.cpp.o
[339/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/MutationsTest.cpp.o
[340/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/TreeTest.cpp.o
[341/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/ASTUnitTest.cpp.o
[342/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/CodeGenActionTest.cpp.o
[343/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/TextDiagnosticTest.cpp.o
[344/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RefactoringTest.cpp.o
[345/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/ToolingTest.cpp.o
[346/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/SynthesisTest.cpp.o
[347/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/PCHPreambleTest.cpp.o
[348/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RefactoringCallbacksTest.cpp.o
[349/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/FrontendActionTest.cpp.o
[350/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/OutputStreamTest.cpp.o
[351/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/SearchPathTest.cpp.o
[352/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/CallbacksLeaf.cpp.o
[353/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/CompilerInvocationTest.cpp.o
[354/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/CallbacksCallExpr.cpp.o
[355/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/CallbacksBinaryOperator.cpp.o
[356/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/CallbacksCompoundAssignOperator.cpp.o
[357/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/SourceCodeBuildersTest.cpp.o
[358/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/RecursiveASTVisitorTests/CallbacksUnaryOperator.cpp.o
[359/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/TokensTest.cpp.o
[360/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/Syntax/BuildTreeTest.cpp.o
[361/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/ASTMatchers/ASTMatchersNodeTest.cpp.o
[362/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/SourceCodeTest.cpp.o
[363/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/ASTMatchers/ASTMatchersTraversalTest.cpp.o
[364/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/ASTMatchers/ASTMatchersNarrowingTest.cpp.o
[365/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/StencilTest.cpp.o
[366/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Frontend/NoAlterCodeGenActionTest.cpp.o
[367/1189] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Tooling/TransformerTest.cpp.o
ninja: build stopped: subcommand failed.

Anton Protopopov says: ==================== BPF indirect jumps This patchset implements a new type of map, instruction set, and uses it to build support for indirect branches in BPF (on x86). (The same map will be later used to provide support for indirect calls and static keys.) See [1], [2] for more context. Short table of contents: * Patches 1-6 implement the new map of type BPF_MAP_TYPE_INSN_SET and corresponding selftests. This map can be used to track the "original -> xlated -> jitted mapping" for a given program. * Patches 7-12 implement the support for indirect jumps on x86 and add libbpf support for LLVM-compiled programs containing indirect jumps, and selftests. The jump table support was merged to LLVM and now can be enabled with -mcpu=v4, see [3]. The __BPF_FEATURE_GOTOX macros can be used to check if the compiler supports the feature or not. See individual patches for more details on the implementation details. v10 -> v11 (this series): * rearranged patches and split libbpf patch such that first 6 patches implementing instruction arrays can be applied independently * instruction arrays: * move [fake] aux->used_maps assignment in this patch * indirect jumps: * call clear_insn_aux_data before bpf_remove_insns (AI) * libbpf: * remove the relocations check after the new LLVM is released (Eduard, Yonghong) * libbpf: fix an index printed in pr_warn (AI) * selftests: * protect programs triggered by nanosleep from fake runs (Eduard) * patch verifier_gotox to not emit .rel.jumptables v9 -> v10 (https://lore.kernel.org/bpf/[email protected]/T/#t): * Three bugs were noticed by AI in v9 (two old, one introduced by v9): * [new] insn_array_alloc_size could overflow u32, switched to u64 (AI) * map_ptr should be compared in regsafe for PTR_TO_INSN (AI) * duplicate elements were copied in jt_from_map (AI) * added a selftest in verifier_gotox with a jump table containing non-unique entries v8 -> v9 (https://lore.kernel.org/bpf/[email protected]/T/#t): * instruction arrays: * remove the size restriction of 256 elements * add a comments about addrs usage, old and new (Alexei) * libbpf: * properly prefix warnings (Andrii) * cast j[t] to long long for printf and some other minor cleanups (Andrii) * selftests: * use __BPF_FEATURE_GOTOX in selftests and skip tests if it's not set (Eduard) * fix a typo in a selftest assembly (AI) v7 -> v8 (https://lore.kernel.org/bpf/[email protected]/T/#u): * instruction arrays: * simplify the bpf_prog_update_insn_ptrs function (Eduard) * remove a semicolon after a function definition (AI) * libbpf: * add a proper error path in libbpf patch (AI) * re-re-factor the create_jt_map & find_subprog_idx (Eduard) * selftests: * verifier_gotox: add a test for a jump table pointing to outside of a subprog (Eduard) * used test__skip instead of just running an empty test * split tests in bpf_gotox into subtests for convenience * random: * drop the docs commit for now v6 -> v7 (https://lore.kernel.org/bpf/[email protected]/T/#t): * rebased and dropped already merged commits * instruction arrays * use jit_data to find mappings from insn to jit (Alexei) * alloc `ips` as part of the main allocation (Eduard) * the `jitted_ip` member wasn't actually used (Eduard) * remove the bpf_insn_ptr structure, which is not needed for this patch * indirect jumps, kernel: * fix a memory leak in `create_jt` (AI) * use proper reg+8*ereg in `its_static_thunk` (AI) * some minor cleanups (Eduard) * indirect jumps, libbpf: * refactor the `jt_adjust_off()` piece (Edurad) * move "JUMPTABLES_SEC" into libbpf_internal.h (Eduard) * remove an unnecessary if (Eduard) * verifier_gotox: add tests to verify that `gotox rX` works with all registers v5 -> v6 (https://lore.kernel.org/bpf/[email protected]/T/#u): * instruction arrays: * better document `struct bpf_insn_array_value` (Eduard) * remove a condition in `bpf_insn_array_adjust_after_remove` (Eduard) * make userspace see original, xlated, and jitted indexes (+original) (Eduard) * indirect jumps, kernel: * reject writes to the map * reject unaligned ops * add a check what `w` is not outside the program in check_config for `gotox` (Eduard) * do not introduce unneeded `bpf_find_containing_subprog_idx` * simplify error processing for `bpf_find_containing_subprog` (Eduard) * add `insn_state |= DISCOVERED` when it's discovered (Eduard) * support SUB operations on PTR_TO_INSN (Eduard) * make `gotox_tmp_buf` a bpf_iarray and use helper to relocate it (Eduard) * rename fields of `bpf_iarray` to more generic (Eduard) * re-implement `visit_gotox_insn` in a loop (Eduard) * some minor cleanups (Eduard) * libbpf: * `struct reloc_desc`: add a comment about `union` (Eduard) * rename parameters of (and one other place in code) `{create,add}_jt_map` to `sym_off` (Eduard) * `create_jt_map`: check that size/off are 8-byte aligned (Eduard) * Selftests: * instruction array selftests: * only run tests on x86_64 * write a more generic function to test things to reduce code (Eduard) * errno wasn't used in checks, so don't reset it (Eduard) * print `i`, `xlated_off` and `map_out[i]` here (Eduard) * added `verifier_gotox` selftests which do not depend on LLVM: * disabled `bpf_gotox` tests by default * other changes: * remove an extra function in bpf disasm (Eduard) * some minor cleanups in the insn_successors patch (Eduard) * update documentation in `Documentation/bpf/linux-notes.html` about jumps, now it is supported :) v3 -> v4 -> v5 (https://lore.kernel.org/bpf/[email protected]/): * [v4 -> v5] rebased on top of the last bpf-next/master * instruction arrays: * add copyright (Alexei) * remove mutexes, add frozen back (Alexei) * setup 1:1 prog-map correspondence using atomic_xchg * do not copy/paste array_map_get_next_key, add a common helper (Alexei) * misc minor code cleanups (Alexei) * indirect jumps, kernel side: * remove jt_allocated, just check if insn is gotox (Eduard) * use copy_register_state instead of individual copies (Eduard) * in push_stack is_speculative should be inherited (Eduard) * a few cleanups for insn_successors, including omitting error path (Eduard) * check if reserved fields are used when considering `gotox` instruction (Eduard) * read size and alignment of read from insn_array should be 8 (Eduard) * put buffer for sorting in subfun info and realloc to grow as needed (Eduard) * properly do `jump_point` / `prune_point` from `push_gotox_edge` (Eduard) * use range_within to check states (Eduard) * some minor cleanups and fix commit message (Eduard) * indirect jumps, libbpf side: * close map_fd in some error paths in create_jt_map (Andrii) * maps for jump tables are actually not closed at all, fix this (Andrii) * rename map from `jt` to `.jumptables` (Andrii) * use `errstr` in an error message (Andrii) * rephrase error message to look more standard (Andrii) * misc other minor renames and cleanups (Andrii) * selftests: * add the frozen selftest back * add a selftest for two jumps loading same table * some other changes: * rebase and split insn_successor changes into separate patch * use PTR_ERR_OR_ZERO in the push stack patch (Eduard) * indirect jumps on x86: properly re-read *pprog (Eduard) v2 -> v3 (https://lore.kernel.org/bpf/[email protected]/): * fix build failure when CONFIG_BPF_SYSCALL is not set (kbuild-bot) * reformat bpftool help messages (Quentin) v1 -> v2 (https://lore.kernel.org/bpf/[email protected]/): * push_stack changes: * sanitize_speculative_path should just return int (Eduard) * return code from sanitize_speculative_path, not EFAULT (Eduard) * when BPF_COMPLEXITY_LIMIT_JMP_SEQ is reached, return E2BIG (Eduard) * indirect jumps: * omit support for .imm=fd in gotox, as we're not using it for now (Eduard) * struct jt -> struct bpf_iarray (Eduard) * insn_successors: rewrite the interface to just return a pointer (Eduard) * remove min_index/max_index, use umin_value/umax_value instead (Alexei, Eduard) * move emit_indirect_jump args change to the previous patch (Eduard) * add a comment to map_mem_size() (Eduard) * use verifier_bug for some error cases in check_indirect_jump (Eduard) * clear_insn_aux_data: use start,len instead of start,end (Eduard) * make regs[insn->dst_reg].type = PTR_TO_INSN part of check_mem_access (Eduard) * constant blinding changes: * make subprog_start adjustment better readable (Eduard) * do not set subprog len, it is already set (Eduard) * libbpf: * remove check that relocations from .rodata are ok (Anton) * do not freeze the map, it is not necessary anymore (Anton) * rename the goto_x -> gotox everywhere (Anton) * use u64 when parsing LLVM jump tables (Eduard) * split patch in two due to spaces->tabs change (Eduard) * split bpftool changes to bpftool patch (Andrii) * make sym_size it a union with ext_idx (Andrii) * properly copy/free the jumptables_data section from elf (Andrii) * a few cosmetic changes around create_jt_map (Andrii) * fix some comments + rewrite patch description (Andrii) * inline bpf_prog__append_subprog_offsets (Andrii) * subprog_sec_offst -> subprog_sec_off (Andrii) * !strcmp -> strcmp() == 0 (Andrii) * make some function names more readable (Andrii) * allocate table of subfunc offsets via libbpf_reallocarray (Andrii) * selftests: * squash insn_array* tests together (Anton) * fixed build warnings (kernel test robot) RFC -> v1 (https://lore.kernel.org/bpf/[email protected]/): * I've tried to address all the comments provided by Alexei and Eduard in RFC. Will try to list the most important of them below. * One big change: move from older LLVM version [5] to newer [4]. Now LLVM generates jump tables as symbols in the new special section ".jumptables". Another part of this change is that libbpf now doesn't try to link map load and goto *rX, as 1) this is absolutely not reliable 2) for some use cases this is impossible (namely, when more than one jump table can be used in the same gotox instruction). * Added insn_successors() support (Alexei, Eduard). This includes getting rid of the ugly bpf_insn_set_iter_xlated_offset() interface (Eduard). * Removed hack for the unreachable instruction, as new LLVM thank to Eduard doesn't generate it. * Set mem_size for direct map access properly instead of hacking. Remove off>0 check. (Alexei) * Do not allocate new memory for min_index/max_index (Alexei, Eduard) * Information required during check_cfg is now cached to be reused later (Alexei + general logic for supporting multiple JT per jump) * Properly compare registers in regsafe (Alexei, Eduard) * Remove support for JMP32 (Eduard) * Better checks in adjust_ptr_min_max_vals (Eduard) * More selftests were added (but still there's room for more) which directly use gotox (Alexei) * More checks and verbose messages added * "unique pointers" are no more in the map Links: 1. https://lpc.events/event/18/contributions/1941/ 2. https://lwn.net/Articles/1017439/ 3. llvm/llvm-project#149715 4. llvm/llvm-project#149715 (comment) 6. rfc: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

yonghong-song force-pushed the jumptable branch from e5f33c7 to 9bcd328 Compare July 20, 2025 15:57

yonghong-song requested review from 4ast and eddyz87 July 20, 2025 16:49

eddyz87 reviewed Jul 20, 2025

View reviewed changes

llvm/lib/Target/BPF/BPFInstrInfo.cpp Outdated Show resolved Hide resolved

yonghong-song force-pushed the jumptable branch from 9bcd328 to 3942bc6 Compare July 21, 2025 00:31

yonghong-song mentioned this pull request Jul 21, 2025

[LLVM][BranchFolding] Infinite loop happens in BranchFolding with BPF jumptable support #149712

Closed

aspsk added a commit to aspsk/bpf-next that referenced this pull request Jul 21, 2025

[tmp] libbpf: support new LLVM llvm/llvm-project#149715

aade400

Signed-off-by: Anton Protopopov <[email protected]>

yonghong-song force-pushed the jumptable branch from 3942bc6 to cf5eba1 Compare July 21, 2025 15:28

yonghong-song force-pushed the jumptable branch from cf5eba1 to 3574861 Compare July 21, 2025 17:30

yonghong-song force-pushed the jumptable branch 2 times, most recently from 92d1b62 to a26a805 Compare July 22, 2025 16:01

yonghong-song force-pushed the jumptable branch from 39dc3c4 to da16130 Compare September 2, 2025 05:57

MaskRay reviewed Sep 3, 2025

View reviewed changes

yonghong-song force-pushed the jumptable branch 2 times, most recently from 1bdae16 to 832bab5 Compare September 4, 2025 00:41

eddyz87 approved these changes Sep 15, 2025

View reviewed changes

Yonghong Song added 8 commits September 16, 2025 08:07

[BPF] Ensure GotoX only supported at CPU V4

f1e3bc6

This is temporary and it makes easy to run bpf selftests. Once kernel side is ready, we will implement CPU V5 which will support jump tables.

[BPF] Set minimum switch cases to be 13

2684fdc

To adjust, add '-mllvm -bpf-min-jump-table-entries=<n>' to the compilation flags.

[BPF] Add some selftests

a1d065e

[BPF] Remove unused globals whose were used for jump table

40d98a3

Update the test with utils/update_test_body.py

40b9097

yonghong-song force-pushed the jumptable branch from 832bab5 to 40b9097 Compare September 16, 2025 15:18

yonghong-song merged commit c3fb2e1 into llvm:main Sep 16, 2025
9 checks passed

tuliom mentioned this pull request Nov 5, 2025

[BPF] __bpf_trap declaration emitted even if not called #165696

Closed

[BPF] Support Jump Table #149715

[BPF] Support Jump Table #149715

Uh oh!

Conversation

yonghong-song commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aspsk commented Jul 21, 2025

Uh oh!

aspsk commented Jul 21, 2025

Uh oh!

aspsk commented Jul 21, 2025

Uh oh!

aspsk commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

eddyz87 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddyz87 commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

eddyz87 commented Jul 21, 2025

Uh oh!

eddyz87 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yonghong-song commented Jul 21, 2025

Uh oh!

aspsk commented Jul 22, 2025

Uh oh!

eddyz87 commented Aug 21, 2025

Uh oh!

eddyz87 commented Aug 21, 2025

Uh oh!

yonghong-song commented Aug 22, 2025

Uh oh!

aspsk commented Aug 25, 2025

Uh oh!

yonghong-song commented Sep 2, 2025

Uh oh!

aspsk commented Sep 2, 2025

Uh oh!

yonghong-song commented Sep 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aspsk commented Sep 10, 2025

Uh oh!

eddyz87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonghong-song commented Jul 20, 2025 •

edited

Loading

github-actions bot commented Jul 20, 2025 •

edited

Loading

eddyz87 commented Jul 21, 2025 •

edited

Loading

eddyz87 commented Jul 21, 2025 •

edited

Loading