Skip to content

Conversation

bogner
Copy link
Contributor

@bogner bogner commented Sep 4, 2025

This should demonstrate #147352 well enough to look at how it will affect the backends, but it still needs a fair amount of work and cleanup.

  • We need a lot more tests - I have checked that the offload test suite still passes with these changes but haven't verified all of the numerous related bugs
  • I still need to figure out how to stage this, so the git history is a complete mess.

Copy link

github-actions bot commented Sep 4, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,c,cpp -- clang/include/clang/AST/CharUnits.h clang/lib/AST/APValue.cpp clang/lib/AST/RecordLayoutBuilder.cpp clang/lib/Basic/Targets/DirectX.h clang/lib/CodeGen/CGAtomic.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGExprAgg.cpp clang/lib/CodeGen/CGExprConstant.cpp clang/lib/CodeGen/CGHLSLRuntime.cpp clang/lib/CodeGen/CGHLSLRuntime.h clang/lib/CodeGen/CGObjCMac.cpp clang/lib/CodeGen/CGRecordLayoutBuilder.cpp clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp clang/lib/CodeGen/HLSLBufferLayoutBuilder.h clang/lib/CodeGen/TargetInfo.h clang/lib/CodeGen/Targets/DirectX.cpp clang/lib/CodeGen/Targets/SPIR.cpp clang/lib/Sema/SemaChecking.cpp clang/lib/StaticAnalyzer/Core/Store.cpp clang/test/CodeGenHLSL/basic-target.c llvm/include/llvm/Analysis/DXILResource.h llvm/include/llvm/Frontend/HLSL/CBuffer.h llvm/lib/Analysis/DXILResource.cpp llvm/lib/Frontend/HLSL/CBuffer.cpp llvm/lib/IR/Type.cpp llvm/lib/Target/DirectX/DXILCBufferAccess.cpp llvm/lib/Target/SPIRV/SPIRVCBufferAccess.cpp llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp llvm/lib/TargetParser/TargetDataLayout.cpp llvm/unittests/Analysis/DXILResourceTest.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/include/clang/AST/CharUnits.h b/clang/include/clang/AST/CharUnits.h
index bd7457f1c..e570bfae6 100644
--- a/clang/include/clang/AST/CharUnits.h
+++ b/clang/include/clang/AST/CharUnits.h
@@ -165,7 +165,7 @@ namespace clang {
       CharUnits operator% (QuantityType N) const {
         return CharUnits(Quantity % N);
       }
-      CharUnits operator% (const CharUnits &Other) const {
+      CharUnits operator%(const CharUnits &Other) const {
         return CharUnits(Quantity % Other.Quantity);
       }
       CharUnits operator+ (const CharUnits &Other) const {
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 6c6a84709..5571356e7 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4681,7 +4681,7 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
   } else if (E->getType().getAddressSpace() == LangAS::hlsl_constant) {
     // This is an array inside of a cbuffer.
     Addr = EmitPointerWithAlignment(E->getBase(), &EltBaseInfo, &EltTBAAInfo);
-    auto *Idx = EmitIdxAfterBase(/*Promote*/true);
+    auto *Idx = EmitIdxAfterBase(/*Promote*/ true);
 
     // ...
     CharUnits RowAlignedSize = getContext()
@@ -4693,7 +4693,7 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
     llvm::Value *ScaledIdx = Builder.CreateMul(Idx, RowAlignedSizeVal);
 
     CharUnits EltAlign =
-      getArrayElementAlign(Addr.getAlignment(), Idx, RowAlignedSize);
+        getArrayElementAlign(Addr.getAlignment(), Idx, RowAlignedSize);
     llvm::Value *EltPtr =
         emitArraySubscriptGEP(*this, Int8Ty, Addr.emitRawPointer(*this),
                               ScaledIdx, false, SignedIndices, E->getExprLoc());
diff --git a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
index 0515b469f..20ebd6bb3 100644
--- a/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
+++ b/clang/lib/CodeGen/HLSLBufferLayoutBuilder.h
@@ -46,8 +46,7 @@ public:
                const llvm::SmallVector<int32_t> *Packoffsets = nullptr);
 
   /// Lays out an array type following HLSL buffer rules.
-  llvm::Type *
-  layOutArray(const ConstantArrayType *AT);
+  llvm::Type *layOutArray(const ConstantArrayType *AT);
 
   /// Lays out a type following HLSL buffer rules. Arrays and structures will be
   /// padded appropriately and nested objects will be converted as appropriate.
diff --git a/clang/lib/CodeGen/TargetInfo.h b/clang/lib/CodeGen/TargetInfo.h
index 8b59fde4b..ba26f9a98 100644
--- a/clang/lib/CodeGen/TargetInfo.h
+++ b/clang/lib/CodeGen/TargetInfo.h
@@ -448,8 +448,8 @@ public:
     return nullptr;
   }
 
-  virtual llvm::Type *
-  getHLSLPadding(CodeGenModule &CGM, CharUnits NumBytes) const {
+  virtual llvm::Type *getHLSLPadding(CodeGenModule &CGM,
+                                     CharUnits NumBytes) const {
     return llvm::ArrayType::get(llvm::Type::getInt8Ty(CGM.getLLVMContext()),
                                 NumBytes.getQuantity());
   }
diff --git a/clang/lib/CodeGen/Targets/SPIR.cpp b/clang/lib/CodeGen/Targets/SPIR.cpp
index 9db3daebd..3aa2b4554 100644
--- a/clang/lib/CodeGen/Targets/SPIR.cpp
+++ b/clang/lib/CodeGen/Targets/SPIR.cpp
@@ -57,8 +57,8 @@ public:
   getHLSLType(CodeGenModule &CGM, const Type *Ty,
               const SmallVector<int32_t> *Packoffsets = nullptr) const override;
 
-  llvm::Type *
-  getHLSLPadding(CodeGenModule &CGM, CharUnits NumBytes) const override {
+  llvm::Type *getHLSLPadding(CodeGenModule &CGM,
+                             CharUnits NumBytes) const override {
     unsigned Size = NumBytes.getQuantity();
     return llvm::TargetExtType::get(CGM.getLLVMContext(), "spirv.Padding", {},
                                     {Size});

@Icohedron
Copy link
Contributor

Icohedron commented Sep 4, 2025

This shader fails to compile due to hitting an assert:

// compile args: -T cs_6_2 -E CSMain
typedef uint32_t4 uint32_t8[2];
void Foo(uint32_t8) {}
cbuffer Constants {
  uint32_t8 s;
}
[numthreads(1, 1, 1)]
void CSMain() {
  Foo(s);
}
clang-dxc: /workspace/llvm-project/llvm/include/llvm/Support/Casting.h:572: decltype(auto) llvm::cast(From &) [To = clang::ConstantArrayType, From = clang::QualType]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: bin/clang-dxc -T cs_6_2 -E CSMain test.hlsl
1.      <eof> parser at end of file
2.      test.hlsl:7:6: LLVM IR generation of declaration 'CSMain'
3.      test.hlsl:7:6: Generating code for declaration 'CSMain'
 #0 0x00005f0f884f1e1b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /workspace/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
 #1 0x00005f0f884ef7b3 llvm::sys::RunSignalHandlers() /workspace/llvm-project/llvm/lib/Support/Signals.cpp:105:18
 #2 0x00005f0f8845b670 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /workspace/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #3 0x00005f0f8845b670 CrashRecoverySignalHandler(int) /workspace/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
 #4 0x00007fd04ae419c0 __restore_rt (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x419c0)
 #5 0x00007fd04ae9cf3c __pthread_kill_implementation (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x9cf3c)
 #6 0x00007fd04ae4190e gsignal (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x4190e)
 #7 0x00007fd04ae28942 abort (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x28942)
 #8 0x00007fd04ae2885e _nl_load_domain.cold (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x2885e)
 #9 0x00007fd04ae396f7 (/nix/store/lmn7lwydprqibdkghw7wgcn21yhllz13-glibc-2.40-66/lib/libc.so.6+0x396f7)
#10 0x00005f0f88a80990 clang::CodeGen::CodeGenFunction::EmitAggregateCopy(clang::CodeGen::LValue, clang::CodeGen::LValue, clang::QualType, clang::CodeGen::AggValueSlot::Overlap_t, bool) /workspace/llvm-project/clang/lib/CodeGen/CGExprAgg.cpp:2306:7
...

Removing the typedef and just using uint32_t4 [2] lets the shader compile without hitting the assertion error.
Making the typedef a scalar or a vector also does not cause the assert to be hit.
Therefore the issue appears to be due to the use of a typedef that is defined to be an array of any type.

Edit: This issue has been fixed!

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 6aa172d to 24a61c4 Compare September 8, 2025 17:12
@s-perron
Copy link
Contributor

s-perron commented Sep 9, 2025

I'm seeing a problem for SPIR-V that we may need to work out. I understand why you do it, and I don't have a solution yet.

// This struct has a size of 12 bytes (float + float2).
// In a cbuffer array, HLSL layout rules require each element to start on a
// 16-byte boundary. This means the compiler must insert 4 bytes of padding
// after each element.
struct PaddedStruct
{
    float f;
    float2 v2;
};

cbuffer MyCBuffer : register(b0)
{

    // The last element of the array is peeled.
    // @myArray = external hidden addrspace(12) global <{ [3 x <{ %PaddedStruct, [4 x i8] }>], %PaddedStruct }>, align 1
    PaddedStruct myArray[4];
    float anotherValue;
};

RWStructuredBuffer<float4> output : register(u0);

[numthreads(1, 1, 1)]
void main(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint index = dispatchThreadID.x % 4;
    // The GEP associated with indexing into the array will use a GEP on i8 and could overflow the
    // array on the struct. The array overflow to get the next element in the struct is illegal in spir-v.
    //  %4 = mul i64 %idxprom, 32
    //  %arrayidx = getelementptr i8, ptr addrspace(12) @myArray, i64 %4
    float2 value = myArray[index].v2;

    // Use the value after the array to ensure it's not optimized out.
    output[0] = float4(value, 0.0f, anotherValue);
}

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 24a61c4 to 8e728c5 Compare September 10, 2025 02:06
Comment on lines 4697 to 4699
llvm::Value *EltPtr =
emitArraySubscriptGEP(*this, Int8Ty, Addr.emitRawPointer(*this),
ScaledIdx, false, SignedIndices, E->getExprLoc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing an i8 gep does work for SPIR-V. We want to avoid that as much as possible. Can this be turned into a typed GEP with the padded type when needed? I tried writing it myself so I could make a suggestion, but I can't get it right.

See https://discourse.llvm.org/t/type-based-gep-plans/87183/14

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogner You are still generating this GEP with an i8. Can we change this to a GEP on the array type.

Start with the test at the end. Compile with clang-dxc test_minimal_peeling.hlsl -T cs_6_8 -spirv -fcgl

The access to myArray is:

  %3 = load i32, ptr %index, align 4, !tbaa !4
  %idxprom = zext i32 %3 to i64
  %4 = mul i64 %idxprom, 16
  %arrayidx = getelementptr i8, ptr addrspace(12) @myArray, i64 %4 ; <-- problem for SPIR-V.
  %f = getelementptr inbounds nuw %struct.OrigType, ptr addrspace(12) %arrayidx, i32 0, i32 0
  %5 = load float, ptr addrspace(12) %f, align 1, !tbaa !14

It would be better if it could be

  %3 = load i32, ptr %index, align 4, !tbaa !4
  %idxprom = zext i32 %3 to i64
  %arrayidx = getelementptr [4 x <{ %OrigType, target("spirv.Padding", 12) }>], ptr addrspace(12) @myArray, i64 %idxprom ; <-- Explicit array
  %f = getelementptr inbounds nuw %struct.OrigType, ptr addrspace(12) %arrayidx, i32 0, i32 0
  %5 = load float, ptr addrspace(12) %f, align 1, !tbaa !14

You use the original array size with the padding.

struct OrigType {
  float f;
};

cbuffer MyCBuffer {
  OrigType myArray[4];
  float anotherValue;
};

RWBuffer<float4> output;

[numthreads(1, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID) {
  uint index = DTid.x % 4;
  float v = myArray[index].f;
  float f = anotherValue;
  output[0] = float4(v, f, 0, 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this out in 92bd225 (which still needs test updates). This looks mostly reasonable with the caveat that we do need a bit of a fictional type for this to work (the array with padding on all elements including the last one). Since we don't actually read the padding this is probably fine.

@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 8e728c5 to 90e49a2 Compare September 25, 2025 16:26
@bogner
Copy link
Contributor Author

bogner commented Sep 25, 2025

Updated with typed GEPs and padding types if you want to take a look at what code we're generating, but the diff is kind of unreadable at the moment. I'll be cleaning this up and getting things ready for staging next.

This explicitly adds two 3-element vectors to the DataLayout so that
they'll be element-aligned. We need to do this more generally for
vectors, but this unblocks some very common cases.

Workaround for llvm#123968
This introduces the `dx.Padding` type as an alternative to the
`dx.Layout` types that are currently used for cbuffers. Later, we'll
remove the `dx.Layout` types completely, but making the backend handle
either makes it easier to stage the necessary changes to get there.

See llvm#147352 for details.
Update the `operator%` overload that accepts `CharUnits` to return
`CharUnits` to match the other `operator%`. This is more logical than
returning an `int64` and cleans up users that want to continue to do
math with the result.

Many users of this were explicitly comparing against 0. I considered
updating these to compare against `CharUnits::Zero` or even introducing
an `explicit operator bool()`, but they all feel clearer if we update
them to use the existing `isMultipleOf()` function instead.
CHECK-lines ignore whitespace, so we can remove some here and make this
a bit easier to read.
DXILResource was falling over trying to name a resource type that
contained an array, such as `StructuredBuffer<float[3][2]>`. Handle this
by walking through array types to gather the dimensions.
This abandons the `dx.Layout` idea and just uses explicit padding.

Note: Reordered fields break stuff, including ones from implicit bindings.
TODO: This must go in after the frontend changes
@bogner bogner force-pushed the 2025-09-cbuffer-layout branch from 90e49a2 to c73d2a3 Compare September 28, 2025 17:25
This is a bit tricky - when we have a padded array we have a struct with
a slightly smaller array followed by the last element, since that last
element doesn't itself have padding. Here, we create an imaginary array
type with the padding on every element and index into that - this looks
like we index out of bounds of the smaller array for the last element
but this is okay because we only read the memory of the value that is
legitimately there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants