[CIR] Update Accumulate Bits Algorithm #1688

Andres-Salamanca · 2025-06-16T19:36:07Z

This PR addresses the feedback from llvm/llvm-project#142041 (comment). Our algorithm for accumulating bitfields has diverged from CodeGen since Clang 19.

There is one key difference: in CIR, we use the function getBitfieldStorageType, which checks whether the bit width of the current accumulation run is a valid fundamental width (i.e., a power of two: 8, 16, 32, 64). If it is, it returns a CIR type of that size otherwise, it returns an array with the closest alignment.

For example, given the following struct:

struct S {
  int a : 4;
  int b : 27;
  int c : 17;
  int d : 2;
  int e : 15;
  unsigned f;
};

The CodeGen output is:

%struct.S = type { i64, i16, i32 }

Whereas the new CIR algorithm produces:

!cir.record<struct "S" {!cir.array<!u8i x 7>, !u16i, !u32i}>

In CIR, the algorithm accumulates up to field d, resulting in 50 accumulated bits. Since 50 is not a fundamental width, the closest alignment is 56 bits, which leads to the type !cir.array<!u8i x 7>. The algorithm stops before accumulating field e because including it would exceed the register size (64), which is not ideal.

At this point, it's unclear whether this divergence from CodeGen represents an improvement. If we wanted to match CodeGen exactly, we would need to replace the use of getBitfieldStorageType with getUIntNType. The difference is that getUIntNType always returns the closest power-of-two integer type instead of falling back to an array when the size is not a fundamental width.

With this change, CIR would match CodeGen's layout exactly. It would require the following small code change:

diff --git a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
index 7c1802bff077..17538b191738 100644
--- a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
+++ b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
@@ -616,7 +616,7 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
       if (!InstallBest) {
         // Determine if accumulating the just-seen span will create an expensive
         // access unit or not.
-        mlir::Type Type = getBitfieldStorageType(astContext.toBits(AccessSize));
+        mlir::Type Type = getUIntNType(astContext.toBits(AccessSize));
         if (!astContext.getTargetInfo().hasCheapUnalignedBitFieldAccess())
           llvm_unreachable("NYI");

@@ -674,12 +674,12 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
         // remain there after a stable sort.
         mlir::Type Type;
         if (BestClipped) {
-          assert(getSize(getBitfieldStorageType(
+          assert(getSize(getUIntNType(
                      astContext.toBits(AccessSize))) > AccessSize &&
                  "Clipped access need not be clipped");
           Type = getByteArrayType(AccessSize);
         } else {
-          Type = getBitfieldStorageType(astContext.toBits(AccessSize));
+          Type = getUIntNType(astContext.toBits(AccessSize));
           assert(getSize(Type) == AccessSize &&
                  "Unclipped access must be clipped");
         }

You can see a comparison between the two functions https://godbolt.org/z/qjx1MaEWG.

I'm currently unsure whether using one function over the other has performance implications. Regarding the ARM error I mentioned in the meeting: it was an assert I had forgotten to update. It's now fixed sorry for the confusion.

Andres-Salamanca · 2025-06-16T19:37:02Z

@andykaylor @bcardosolopes @gitoleg

andykaylor · 2025-06-16T21:00:21Z

It's difficult to compare the implementations without seeing a side-by-side comparison of the resulting LLVM IR for accessing elements in a structure. I suspect that representing the memory using an array of bytes is going to be bad compared to using power-of-two integers because they result in different physical representations on little-endian machines.

Consider this example:

https://godbolt.org/z/3eWjee55f

struct S {
  int a : 4;
  int b : 27;
  int c : 17;
  int d : 2;
  int e : 15;
  unsigned f;
};

S s;

int f() {
  return s.c;
}

The current incubator produces

%struct.S = type { i32, [3 x i8], i16, i32 }

I think we could get similar results with your PR by replacing a and b with a single integer field.

The LLVM IR generated looks like this:

define dso_local range(i32 -65536, 65536) i32 @_Z1fv() local_unnamed_addr #0 !dbg !4 {
  %1 = load i24, ptr getelementptr inbounds nuw (i8, ptr @s, i64 4), align 4, !dbg !7
  %2 = shl i24 %1, 7, !dbg !7
  %3 = ashr exact i24 %2, 7, !dbg !7
  %4 = sext i24 %3 to i32, !dbg !7
  ret i32 %4, !dbg !8
}

That doesn't seem too bad until you consider that c is placed in bytes 0-3, so it crosses the boundary where it is split in little-endian representation. The x86-64 assembly generated looks like this:

_Z1fv:
.L_Z1fv$local:
  movzx eax, byte ptr [rip + .Ls$local+6]
  movzx ecx, word ptr [rip + .Ls$local+4]
  shl ecx, 15
  shl eax, 31
  or eax, ecx
  sar eax, 15
  ret

We had to load two different values and combine them to get the correct result!

By contrast, classic codegen emits this:

%struct.S = type { i32, i24, i16, i32 }

define dso_local noundef i32 @f()() local_unnamed_addr #0 !dbg !30 {
  %1 = load i32, ptr getelementptr inbounds (%struct.S, ptr @s, i64 0, i32 1), align 4, !dbg !33
  %2 = shl i32 %1, 15, !dbg !33
  %3 = ashr exact i32 %2, 15, !dbg !33
  ret i32 %3, !dbg !34
}

And the generated x86-64 assembly is this:

f():
  mov eax, dword ptr [rip + s+4]
  shl eax, 15
  sar eax, 15
  ret

It almost looks like classic codegen is cheating by telling LLVM IR to load an i32 instead of an i24 but it can do that because using an i24 in the structure definition means that the endianness is baked into the structure.

andykaylor · 2025-06-16T21:01:31Z

That is to say, I think you should make the change to get this to emit the same structure that classic codegen emits.

Andres-Salamanca · 2025-06-16T23:05:21Z

Yes, @andykaylor 100% you're correct
Previously, with the array, it looked like this in my machine:

_Z1fv:                                  # @_Z1fv
.L_Z1fv$local:
    .type   .L_Z1fv$local,@function
    .cfi_startproc
# %bb.0:
    movzbl  .Ls$local+6(%rip), %eax
    shll    $16, %eax
    movzwl  .Ls$local+4(%rip), %ecx
    orl     %eax, %ecx
                                            
    movl    %ecx, %eax
    shlq    $47, %rax
    sarq    $47, %rax
                                            
    movl    %eax, -4(%rsp)
    movl    -4(%rsp), %eax
    retq

And now, using power-of-two types, it looks much better exactly like CodeGen:

_Z1fv:                                  # @_Z1fv
.L_Z1fv$local:
    .type   .L_Z1fv$local,@function
    .cfi_startproc
# %bb.0:
    movq    .Ls$local(%rip), %rax
    shlq    $15, %rax
    sarq    $47, %rax
                                            
    retq

Thanks for the detailed explanation, Andy ;)

bcardosolopes · 2025-06-17T23:57:52Z

Thanks for fixing this. I agree with Andy here too, matching classic codegen is the way to go!

This PR addresses the feedback from llvm/llvm-project#142041 (comment). Our algorithm for accumulating bitfields has diverged from CodeGen since Clang 19. There is one key difference: in CIR, we use the function `getBitfieldStorageType`, which checks whether the bit width of the current accumulation run is a valid fundamental width (i.e., a power of two: 8, 16, 32, 64). If it is, it returns a CIR type of that size otherwise, it returns an array with the closest alignment. For example, given the following struct: ```c struct S { int a : 4; int b : 27; int c : 17; int d : 2; int e : 15; unsigned f; }; ``` The CodeGen output is: ```llvm %struct.S = type { i64, i16, i32 } ``` Whereas the new CIR algorithm produces: ```mlir !cir.record<struct "S" {!cir.array<!u8i x 7>, !u16i, !u32i}> ``` In CIR, the algorithm accumulates up to field `d`, resulting in 50 accumulated bits. Since 50 is not a fundamental width, the closest alignment is 56 bits, which leads to the type `!cir.array<!u8i x 7>`. The algorithm stops before accumulating field `e` because including it would exceed the register size (64), which is not ideal. At this point, it's unclear whether this divergence from CodeGen represents an improvement. If we wanted to match CodeGen exactly, we would need to replace the use of `getBitfieldStorageType` with `getUIntNType`. The difference is that `getUIntNType` always returns the closest power-of-two integer type instead of falling back to an array when the size is not a fundamental width. With this change, CIR would match CodeGen's layout exactly. It would require the following small code change: ```diff diff --git a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp index 7c1802b..17538b191738 100644 --- a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp +++ b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp @@ -616,7 +616,7 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, if (!InstallBest) { // Determine if accumulating the just-seen span will create an expensive // access unit or not. - mlir::Type Type = getBitfieldStorageType(astContext.toBits(AccessSize)); + mlir::Type Type = getUIntNType(astContext.toBits(AccessSize)); if (!astContext.getTargetInfo().hasCheapUnalignedBitFieldAccess()) llvm_unreachable("NYI"); @@ -674,12 +674,12 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, // remain there after a stable sort. mlir::Type Type; if (BestClipped) { - assert(getSize(getBitfieldStorageType( + assert(getSize(getUIntNType( astContext.toBits(AccessSize))) > AccessSize && "Clipped access need not be clipped"); Type = getByteArrayType(AccessSize); } else { - Type = getBitfieldStorageType(astContext.toBits(AccessSize)); + Type = getUIntNType(astContext.toBits(AccessSize)); assert(getSize(Type) == AccessSize && "Unclipped access must be clipped"); } ``` You can see a comparison between the two functions https://godbolt.org/z/qjx1MaEWG. I'm currently unsure whether using one function over the other has performance implications. Regarding the **ARM error I mentioned in the meeting: it was an `assert` I had forgotten to update. It's now fixed sorry for the confusion.**

Andres-Salamanca added 2 commits June 16, 2025 14:06

Update accumulated bit fields algorithm to align with CodeGen

44a9f9f

Fix rebase

978b843

Andres-Salamanca requested review from bcardosolopes and lanza as code owners June 16, 2025 19:36

Replace getBitfieldStorageType with getUIntNType and update tests

8b1777e

Fine-grained bitfield access NYI

b79b622

bcardosolopes approved these changes Jun 17, 2025

View reviewed changes

bcardosolopes merged commit 6ddc48e into llvm:main Jun 17, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR] Update Accumulate Bits Algorithm #1688

[CIR] Update Accumulate Bits Algorithm #1688

Uh oh!

Andres-Salamanca commented Jun 16, 2025

Uh oh!

Andres-Salamanca commented Jun 16, 2025

Uh oh!

andykaylor commented Jun 16, 2025

Uh oh!

andykaylor commented Jun 16, 2025

Uh oh!

Andres-Salamanca commented Jun 16, 2025

Uh oh!

bcardosolopes commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CIR] Update Accumulate Bits Algorithm #1688

[CIR] Update Accumulate Bits Algorithm #1688

Uh oh!

Conversation

Andres-Salamanca commented Jun 16, 2025

Uh oh!

Andres-Salamanca commented Jun 16, 2025

Uh oh!

andykaylor commented Jun 16, 2025

Uh oh!

andykaylor commented Jun 16, 2025

Uh oh!

Andres-Salamanca commented Jun 16, 2025

Uh oh!

bcardosolopes commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants