Skip to content

Conversation

@bogner
Copy link
Contributor

@bogner bogner commented Nov 10, 2025

This change drops the use of the "Layout" type and instead uses explicit padding throughout the compiler to represent types in HLSL buffers.

There are a few parts to this, though it's difficult to split them up as they're very interdependent:

  1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding of arbitrary types.
  2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when generating aggregate copies, array accesses, and structure accesses.
  3. Simplify DXILCBufferAccesses such that it directly replaces accesses with dx.resource.getpointer rather than recalculating the layout.
  4. Basic infrastructure for SPIR-V handling, but the implementation itself will need work in follow ups.

Fixes several issues, including #138996, #144573, and #156084.
Resolves #147352.

@bogner bogner changed the title [HLSL][DirectX] Use a padding types for HLSL buffers. [HLSL][DirectX] Use a padding type for HLSL buffers. Nov 10, 2025
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. backend:DirectX HLSL HLSL Language Support backend:SPIR-V llvm:ir labels Nov 10, 2025
@bogner bogner requested review from hekota and s-perron November 10, 2025 22:26
@llvmbot
Copy link
Member

llvmbot commented Nov 10, 2025

@llvm/pr-subscribers-backend-spir-v
@llvm/pr-subscribers-backend-directx

@llvm/pr-subscribers-llvm-ir

Author: Justin Bogner (bogner)

Changes

This change drops the use of the "Layout" type and instead uses explicit padding throughout the compiler to represent types in HLSL buffers.

There are a few parts to this, though it's difficult to split them up as they're very interdependent:

  1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding of arbitrary types.
  2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when generating aggregate copies, array accesses, and structure accesses.
  3. Simplify DXILCBufferAccesses such that it directly replaces accesses with dx.resource.getpointer rather than recalculating the layout.
  4. Basic infrastructure for SPIR-V handling, but the implementation itself will need work in follow ups.

Fixes several issues, including #138996, #144573, and #156084.
Resolves #147352.


Patch is 167.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/167404.diff

37 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+16-5)
  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+4)
  • (modified) clang/lib/CodeGen/CGHLSLRuntime.cpp (+266-25)
  • (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+25-12)
  • (modified) clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp (+84-223)
  • (modified) clang/lib/CodeGen/HLSLBufferLayoutBuilder.h (+28-19)
  • (modified) clang/lib/CodeGen/TargetInfo.h (+10)
  • (modified) clang/lib/CodeGen/Targets/DirectX.cpp (+16-4)
  • (modified) clang/lib/CodeGen/Targets/SPIR.cpp (+17-4)
  • (modified) clang/test/CodeGenHLSL/ArrayAssignable.hlsl (+51-19)
  • (modified) clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl (+3-3)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer.hlsl (+164-87)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer_and_namespaces.hlsl (+5-5)
  • (added) clang/test/CodeGenHLSL/resources/cbuffer_geps.hlsl (+117)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer_with_packoffset.hlsl (+24-14)
  • (modified) clang/test/CodeGenHLSL/resources/cbuffer_with_static_global_and_function.hlsl (+1-1)
  • (modified) clang/test/CodeGenHLSL/resources/default_cbuffer.hlsl (+5-6)
  • (modified) clang/test/CodeGenHLSL/resources/default_cbuffer_with_layout.hlsl (+18-10)
  • (modified) llvm/docs/DirectX/DXILResources.rst (+5-5)
  • (modified) llvm/include/llvm/Frontend/HLSL/CBuffer.h (+2-4)
  • (modified) llvm/lib/Frontend/HLSL/CBuffer.cpp (+17-18)
  • (modified) llvm/lib/IR/Type.cpp (+4)
  • (modified) llvm/lib/Target/DirectX/DXILCBufferAccess.cpp (+28-282)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/SPIRV/SPIRVCBufferAccess.cpp (+6-1)
  • (modified) llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp (-3)
  • (removed) llvm/test/CodeGen/DirectX/CBufferAccess/array-typedgep.ll (-32)
  • (modified) llvm/test/CodeGen/DirectX/CBufferAccess/arrays.ll (+23-97)
  • (removed) llvm/test/CodeGen/DirectX/CBufferAccess/float.ll (-25)
  • (modified) llvm/test/CodeGen/DirectX/CBufferAccess/gep-ce-two-uses.ll (+9-13)
  • (removed) llvm/test/CodeGen/DirectX/CBufferAccess/memcpy.ll (-216)
  • (modified) llvm/test/CodeGen/DirectX/CBufferAccess/scalars.ll (+8-63)
  • (modified) llvm/test/CodeGen/DirectX/CBufferAccess/unused.ll (+1-1)
  • (removed) llvm/test/CodeGen/DirectX/CBufferAccess/vectors.ll (-119)
  • (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+1-1)
  • (modified) llvm/test/CodeGen/SPIRV/hlsl-resources/cbuffer.ll (+4-4)
  • (modified) llvm/test/CodeGen/SPIRV/hlsl-resources/cbuffer_unused.ll (+9-9)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index a837f00732748..f2451b16e78be 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4641,11 +4641,17 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
                                  LHS.getBaseInfo(), TBAAAccessInfo());
   }
 
-  // The HLSL runtime handle the subscript expression on global resource arrays.
-  if (getLangOpts().HLSL && (E->getType()->isHLSLResourceRecord() ||
-                             E->getType()->isHLSLResourceRecordArray())) {
-    std::optional<LValue> LV =
-        CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this);
+  // The HLSL runtime handles subscript expressions on global resource arrays
+  // and objects with HLSL buffer layouts.
+  if (getLangOpts().HLSL) {
+    std::optional<LValue> LV;
+    if (E->getType()->isHLSLResourceRecord() ||
+        E->getType()->isHLSLResourceRecordArray()) {
+      LV = CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this);
+    } else if (E->getType().getAddressSpace() == LangAS::hlsl_constant) {
+      LV = CGM.getHLSLRuntime().emitBufferArraySubscriptExpr(E, *this,
+                                                             EmitIdxAfterBase);
+    }
     if (LV.has_value())
       return *LV;
   }
@@ -5110,6 +5116,11 @@ LValue CodeGenFunction::EmitMemberExpr(const MemberExpr *E) {
     EmitIgnoredExpr(E->getBase());
     return EmitDeclRefLValue(DRE);
   }
+  if (getLangOpts().HLSL &&
+      E->getType().getAddressSpace() == LangAS::hlsl_constant) {
+    // We have an HLSL buffer - emit using HLSL's layout rules.
+    return CGM.getHLSLRuntime().emitBufferMemberExpr(*this, E);
+  }
 
   Expr *BaseExpr = E->getBase();
   // Check whether the underlying base pointer is a constant null.
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index eee397f1f3d19..bd7d30b10d4d0 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -2279,6 +2279,10 @@ void CodeGenFunction::EmitAggregateCopy(LValue Dest, LValue Src, QualType Ty,
     }
   }
 
+  if (getLangOpts().HLSL && Ty.getAddressSpace() == LangAS::hlsl_constant)
+    if (CGM.getHLSLRuntime().emitBufferCopy(*this, DestPtr, SrcPtr, Ty))
+      return;
+
   // Aggregate assignment turns into llvm.memcpy.  This is almost valid per
   // C99 6.5.16.1p3, which states "If the value being stored in an object is
   // read from another object that overlaps in anyway the storage of the first
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 4bdba9b3da502..c42619541a87b 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -13,10 +13,11 @@
 //===----------------------------------------------------------------------===//
 
 #include "CGHLSLRuntime.h"
-#include "Address.h"
 #include "CGDebugInfo.h"
+#include "CGRecordLayout.h"
 #include "CodeGenFunction.h"
 #include "CodeGenModule.h"
+#include "HLSLBufferLayoutBuilder.h"
 #include "TargetInfo.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/AST/Attrs.inc"
@@ -26,6 +27,7 @@
 #include "clang/AST/Type.h"
 #include "clang/Basic/TargetOptions.h"
 #include "clang/Frontend/FrontendDiagnostic.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Frontend/HLSL/RootSignatureMetadata.h"
@@ -265,9 +267,9 @@ CGHLSLRuntime::convertHLSLSpecificType(const Type *T,
   assert(T->isHLSLSpecificType() && "Not an HLSL specific type!");
 
   // Check if the target has a specific translation for this type first.
-  if (llvm::Type *TargetTy =
+  if (llvm::Type *LayoutTy =
           CGM.getTargetCodeGenInfo().getHLSLType(CGM, T, OffsetInfo))
-    return TargetTy;
+    return LayoutTy;
 
   llvm_unreachable("Generic handling of HLSL types is not supported.");
 }
@@ -278,23 +280,18 @@ llvm::Triple::ArchType CGHLSLRuntime::getArch() {
 
 // Emits constant global variables for buffer constants declarations
 // and creates metadata linking the constant globals with the buffer global.
-void CGHLSLRuntime::emitBufferGlobalsAndMetadata(const HLSLBufferDecl *BufDecl,
-                                                 llvm::GlobalVariable *BufGV) {
+void CGHLSLRuntime::emitBufferGlobalsAndMetadata(
+    const HLSLBufferDecl *BufDecl, llvm::GlobalVariable *BufGV,
+    const CGHLSLOffsetInfo &OffsetInfo) {
   LLVMContext &Ctx = CGM.getLLVMContext();
 
   // get the layout struct from constant buffer target type
   llvm::Type *BufType = BufGV->getValueType();
-  llvm::Type *BufLayoutType =
-      cast<llvm::TargetExtType>(BufType)->getTypeParameter(0);
   llvm::StructType *LayoutStruct = cast<llvm::StructType>(
-      cast<llvm::TargetExtType>(BufLayoutType)->getTypeParameter(0));
+      cast<llvm::TargetExtType>(BufType)->getTypeParameter(0));
 
-  // Start metadata list associating the buffer global variable with its
-  // constatns
-  SmallVector<llvm::Metadata *> BufGlobals;
-  BufGlobals.push_back(ValueAsMetadata::get(BufGV));
-
-  const auto *ElemIt = LayoutStruct->element_begin();
+  SmallVector<std::pair<VarDecl *, uint32_t>> DeclsWithOffset;
+  size_t OffsetIdx = 0;
   for (Decl *D : BufDecl->buffer_decls()) {
     if (isa<CXXRecordDecl, EmptyDecl>(D))
       // Nothing to do for this declaration.
@@ -326,14 +323,27 @@ void CGHLSLRuntime::emitBufferGlobalsAndMetadata(const HLSLBufferDecl *BufDecl,
       continue;
     }
 
+    DeclsWithOffset.emplace_back(VD, OffsetInfo[OffsetIdx++]);
+  }
+
+  llvm::stable_sort(DeclsWithOffset, [](const auto &LHS, const auto &RHS) {
+    return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
+  });
+
+  // Associate the buffer global variable with its constants
+  SmallVector<llvm::Metadata *> BufGlobals;
+  BufGlobals.reserve(DeclsWithOffset.size() + 1);
+  BufGlobals.push_back(ValueAsMetadata::get(BufGV));
+
+  auto ElemIt = LayoutStruct->element_begin();
+  for (auto &[VD, _] : DeclsWithOffset) {
+    if (CGM.getTargetCodeGenInfo().isHLSLPadding(*ElemIt))
+      ++ElemIt;
+
     assert(ElemIt != LayoutStruct->element_end() &&
            "number of elements in layout struct does not match");
     llvm::Type *LayoutType = *ElemIt++;
 
-    // FIXME: handle resources inside user defined structs
-    // (llvm/wg-hlsl#175)
-
-    // create global variable for the constant and to metadata list
     GlobalVariable *ElemGV =
         cast<GlobalVariable>(CGM.GetAddrOfGlobalVar(VD, LayoutType));
     BufGlobals.push_back(ValueAsMetadata::get(ElemGV));
@@ -410,18 +420,17 @@ void CGHLSLRuntime::addBuffer(const HLSLBufferDecl *BufDecl) {
 
   // create global variable for the constant buffer
   CGHLSLOffsetInfo OffsetInfo = CGHLSLOffsetInfo::fromDecl(*BufDecl);
-  llvm::TargetExtType *TargetTy = cast<llvm::TargetExtType>(
-      convertHLSLSpecificType(ResHandleTy, OffsetInfo));
+  llvm::Type *LayoutTy = convertHLSLSpecificType(ResHandleTy, OffsetInfo);
   llvm::GlobalVariable *BufGV = new GlobalVariable(
-      TargetTy, /*isConstant*/ false,
-      GlobalValue::LinkageTypes::ExternalLinkage, PoisonValue::get(TargetTy),
+      LayoutTy, /*isConstant*/ false,
+      GlobalValue::LinkageTypes::ExternalLinkage, PoisonValue::get(LayoutTy),
       llvm::formatv("{0}{1}", BufDecl->getName(),
                     BufDecl->isCBuffer() ? ".cb" : ".tb"),
       GlobalValue::NotThreadLocal);
   CGM.getModule().insertGlobalVariable(BufGV);
 
   // Add globals for constant buffer elements and create metadata nodes
-  emitBufferGlobalsAndMetadata(BufDecl, BufGV);
+  emitBufferGlobalsAndMetadata(BufDecl, BufGV, OffsetInfo);
 
   // Initialize cbuffer from binding (implicit or explicit)
   initializeBufferFromBinding(BufDecl, BufGV);
@@ -440,7 +449,7 @@ void CGHLSLRuntime::addRootSignature(
                      SignatureDecl->getRootElements(), nullptr, M);
 }
 
-llvm::TargetExtType *
+llvm::StructType *
 CGHLSLRuntime::getHLSLBufferLayoutType(const RecordType *StructType) {
   const auto Entry = LayoutTypes.find(StructType);
   if (Entry != LayoutTypes.end())
@@ -449,7 +458,7 @@ CGHLSLRuntime::getHLSLBufferLayoutType(const RecordType *StructType) {
 }
 
 void CGHLSLRuntime::addHLSLBufferLayoutType(const RecordType *StructType,
-                                            llvm::TargetExtType *LayoutTy) {
+                                            llvm::StructType *LayoutTy) {
   assert(getHLSLBufferLayoutType(StructType) == nullptr &&
          "layout type for this struct already exist");
   LayoutTypes[StructType] = LayoutTy;
@@ -1101,3 +1110,235 @@ std::optional<LValue> CGHLSLRuntime::emitResourceArraySubscriptExpr(
   }
   return CGF.MakeAddrLValue(TmpVar, ResultTy, AlignmentSource::Decl);
 }
+
+std::optional<LValue> CGHLSLRuntime::emitBufferArraySubscriptExpr(
+    const ArraySubscriptExpr *E, CodeGenFunction &CGF,
+    llvm::function_ref<llvm::Value *(bool Promote)> EmitIdxAfterBase) {
+  // Find the element type to index by first padding the element type per HLSL
+  // buffer rules, and then padding out to a 16-byte register boundary if
+  // necessary.
+  llvm::Type *LayoutTy =
+      HLSLBufferLayoutBuilder(CGF.CGM).layOutType(E->getType());
+  uint64_t LayoutSizeInBits =
+      CGM.getDataLayout().getTypeSizeInBits(LayoutTy).getFixedValue();
+  CharUnits ElementSize = CharUnits::fromQuantity(LayoutSizeInBits / 8);
+  CharUnits RowAlignedSize = ElementSize.alignTo(CharUnits::fromQuantity(16));
+  if (RowAlignedSize > ElementSize) {
+    llvm::Type *Padding = CGM.getTargetCodeGenInfo().getHLSLPadding(
+        CGM, RowAlignedSize - ElementSize);
+    assert(Padding && "No padding type for target?");
+    LayoutTy = llvm::StructType::get(CGF.getLLVMContext(), {LayoutTy, Padding},
+                                     /*isPacked=*/true);
+  }
+
+  // If the layout type doesn't introduce any padding, we don't need to do
+  // anything special.
+  llvm::Type *OrigTy = CGF.CGM.getTypes().ConvertTypeForMem(E->getType());
+  if (LayoutTy == OrigTy)
+    return std::nullopt;
+
+  LValueBaseInfo EltBaseInfo;
+  TBAAAccessInfo EltTBAAInfo;
+  Address Addr =
+      CGF.EmitPointerWithAlignment(E->getBase(), &EltBaseInfo, &EltTBAAInfo);
+  llvm::Value *Idx = EmitIdxAfterBase(/*Promote*/ true);
+
+  // Index into the object as-if we have an array of the padded element type,
+  // and then dereference the element itself to avoid reading padding that may
+  // be past the end of the in-memory object.
+  SmallVector<llvm::Value *, 2> Indices;
+  Indices.push_back(Idx);
+  Indices.push_back(llvm::ConstantInt::get(Indices[0]->getType(), 0));
+
+  llvm::Value *GEP = CGF.Builder.CreateGEP(LayoutTy, Addr.emitRawPointer(CGF),
+                                           Indices, "cbufferidx");
+  Addr = Address(GEP, Addr.getElementType(), RowAlignedSize, KnownNonNull);
+
+  return CGF.MakeAddrLValue(Addr, E->getType(), EltBaseInfo, EltTBAAInfo);
+}
+
+namespace {
+/// Utility for emitting copies following the HLSL buffer layout rules (ie,
+/// copying out of a cbuffer).
+class HLSLBufferCopyEmitter {
+  CodeGenFunction &CGF;
+  Address DestPtr;
+  Address SrcPtr;
+  llvm::Type *LayoutTy = nullptr;
+
+  SmallVector<llvm::Value *> CurStoreIndices;
+  SmallVector<llvm::Value *> CurLoadIndices;
+
+  void emitCopyAtIndices(llvm::Type *FieldTy, llvm::ConstantInt *StoreIndex,
+                         llvm::ConstantInt *LoadIndex) {
+    CurStoreIndices.push_back(StoreIndex);
+    CurLoadIndices.push_back(LoadIndex);
+    auto RestoreIndices = llvm::make_scope_exit([&]() {
+      CurStoreIndices.pop_back();
+      CurLoadIndices.pop_back();
+    });
+
+    // First, see if this is some kind of aggregate and recurse.
+    if (processArray(FieldTy))
+      return;
+    if (processBufferLayoutArray(FieldTy))
+      return;
+    if (processStruct(FieldTy))
+      return;
+
+    // When we have a scalar or vector element we can emit the copy.
+    CharUnits Align = CharUnits::fromQuantity(
+        CGF.CGM.getDataLayout().getABITypeAlign(FieldTy));
+    Address SrcGEP = RawAddress(
+        CGF.Builder.CreateInBoundsGEP(LayoutTy, SrcPtr.getBasePointer(),
+                                      CurLoadIndices, "cbuf.src"),
+        FieldTy, Align, SrcPtr.isKnownNonNull());
+    Address DestGEP = CGF.Builder.CreateInBoundsGEP(
+        DestPtr, CurStoreIndices, FieldTy, Align, "cbuf.dest");
+    llvm::Value *Load = CGF.Builder.CreateLoad(SrcGEP, "cbuf.load");
+    CGF.Builder.CreateStore(Load, DestGEP);
+  }
+
+  bool processArray(llvm::Type *FieldTy) {
+    auto *AT = dyn_cast<llvm::ArrayType>(FieldTy);
+    if (!AT)
+      return false;
+
+    // If we have an llvm::ArrayType this is just a regular array with no top
+    // level padding, so all we need to do is copy each member.
+    for (unsigned I = 0, E = AT->getNumElements(); I < E; ++I)
+      emitCopyAtIndices(AT->getElementType(),
+                        llvm::ConstantInt::get(CGF.SizeTy, I),
+                        llvm::ConstantInt::get(CGF.SizeTy, I));
+    return true;
+  }
+
+  bool processBufferLayoutArray(llvm::Type *FieldTy) {
+    // A buffer layout array is a struct with two elements: the padded array,
+    // and the last element. That is, is should look something like this:
+    //
+    //   { [%n x { %type, %padding }], %type }
+    //
+    auto *ST = dyn_cast<llvm::StructType>(FieldTy);
+    if (!ST || ST->getNumElements() != 2)
+      return false;
+
+    auto *PaddedEltsTy = dyn_cast<llvm::ArrayType>(ST->getElementType(0));
+    if (!PaddedEltsTy)
+      return false;
+
+    auto *PaddedTy = dyn_cast<llvm::StructType>(PaddedEltsTy->getElementType());
+    if (!PaddedTy || PaddedTy->getNumElements() != 2)
+      return false;
+
+    if (!CGF.CGM.getTargetCodeGenInfo().isHLSLPadding(
+            PaddedTy->getElementType(1)))
+      return false;
+
+    llvm::Type *ElementTy = ST->getElementType(1);
+    if (PaddedTy->getElementType(0) != ElementTy)
+      return false;
+
+    // All but the last of the logical array elements are in the padded array.
+    unsigned NumElts = PaddedEltsTy->getNumElements() + 1;
+
+    // Add an extra indirection to the load for the struct and walk the
+    // array prefix.
+    CurLoadIndices.push_back(llvm::ConstantInt::get(CGF.Int32Ty, 0));
+    for (unsigned I = 0; I < NumElts - 1; ++I) {
+      // We need to copy the element itself, without the padding.
+      CurLoadIndices.push_back(llvm::ConstantInt::get(CGF.SizeTy, I));
+      emitCopyAtIndices(ElementTy, llvm::ConstantInt::get(CGF.SizeTy, I),
+                        llvm::ConstantInt::get(CGF.Int32Ty, 0));
+      CurLoadIndices.pop_back();
+    }
+    CurLoadIndices.pop_back();
+
+    // Now copy the last element.
+    emitCopyAtIndices(ElementTy,
+                      llvm::ConstantInt::get(CGF.SizeTy, NumElts - 1),
+                      llvm::ConstantInt::get(CGF.Int32Ty, 1));
+
+    return true;
+  }
+
+  bool processStruct(llvm::Type *FieldTy) {
+    auto *ST = dyn_cast<llvm::StructType>(FieldTy);
+    if (!ST)
+      return false;
+
+    // Copy the struct field by field, but skip any explicit padding.
+    unsigned Skipped = 0;
+    for (unsigned I = 0, E = ST->getNumElements(); I < E; ++I) {
+      llvm::Type *ElementTy = ST->getElementType(I);
+      if (CGF.CGM.getTargetCodeGenInfo().isHLSLPadding(ElementTy))
+        ++Skipped;
+      else
+        emitCopyAtIndices(ElementTy, llvm::ConstantInt::get(CGF.Int32Ty, I),
+                          llvm::ConstantInt::get(CGF.Int32Ty, I + Skipped));
+    }
+    return true;
+  }
+
+public:
+  HLSLBufferCopyEmitter(CodeGenFunction &CGF, Address DestPtr, Address SrcPtr)
+      : CGF(CGF), DestPtr(DestPtr), SrcPtr(SrcPtr) {}
+
+  bool emitCopy(QualType CType) {
+    LayoutTy = HLSLBufferLayoutBuilder(CGF.CGM).layOutType(CType);
+
+    // TODO: We should be able to fall back to a regular memcpy if the layout
+    // type doesn't have any padding, but that runs into issues in the backend
+    // currently.
+    //
+    // See https://github.com/llvm/wg-hlsl/issues/351
+    emitCopyAtIndices(LayoutTy, llvm::ConstantInt::get(CGF.SizeTy, 0),
+                      llvm::ConstantInt::get(CGF.SizeTy, 0));
+    return true;
+  }
+};
+} // namespace
+
+bool CGHLSLRuntime::emitBufferCopy(CodeGenFunction &CGF, Address DestPtr,
+                                   Address SrcPtr, QualType CType) {
+  return HLSLBufferCopyEmitter(CGF, DestPtr, SrcPtr).emitCopy(CType);
+}
+
+LValue CGHLSLRuntime::emitBufferMemberExpr(CodeGenFunction &CGF,
+                                           const MemberExpr *E) {
+  LValue Base =
+      CGF.EmitCheckedLValue(E->getBase(), CodeGenFunction::TCK_MemberAccess);
+  auto *Field = dyn_cast<FieldDecl>(E->getMemberDecl());
+  assert(Field && "Unexpected access into HLSL buffer");
+
+  // Get the field index for the struct.
+  const RecordDecl *Rec = Field->getParent();
+  unsigned FieldIdx =
+      CGM.getTypes().getCGRecordLayout(Rec).getLLVMFieldNo(Field);
+
+  // Work out the buffer layout type to index into.
+  QualType RecType = CGM.getContext().getCanonicalTagType(Rec);
+  assert(RecType->isStructureOrClassType() && "Invalid type in HLSL buffer");
+  // Since this is a member of the buffer and not the buffer itself, we
+  // shouldn't have any offsets we need to contend with.
+  CGHLSLOffsetInfo EmptyOffsets;
+  llvm::StructType *LayoutTy = HLSLBufferLayoutBuilder(CGM).layOutStruct(
+      RecType->getAsCanonical<RecordType>(), EmptyOffsets);
+
+  // Now index into the struct, making sure that the type we return is the
+  // buffer layout type rather than the original type in the AST.
+  QualType FieldType = Field->getType();
+  llvm::Type *FieldLLVMTy = CGM.getTypes().ConvertTypeForMem(FieldType);
+  CharUnits Align = CharUnits::fromQuantity(
+      CGF.CGM.getDataLayout().getABITypeAlign(FieldLLVMTy));
+  Address Addr(CGF.Builder.CreateStructGEP(LayoutTy, Base.getPointer(CGF),
+                                           FieldIdx, Field->getName()),
+               FieldLLVMTy, Align, KnownNonNull);
+
+  LValue LV = LValue::MakeAddr(Addr, FieldType, CGM.getContext(),
+                               LValueBaseInfo(AlignmentSource::Type),
+                               CGM.getTBAAAccessInfo(FieldType));
+  LV.getQuals().addCVRQualifiers(Base.getVRQualifiers());
+
+  return LV;
+}
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.h b/clang/lib/CodeGen/CGHLSLRuntime.h
index 488a322ca7569..07d6ab87305bb 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.h
+++ b/clang/lib/CodeGen/CGHLSLRuntime.h
@@ -15,20 +15,19 @@
 #ifndef LLVM_CLANG_LIB_CODEGEN_CGHLSLRUNTIME_H
 #define LLVM_CLANG_LIB_CODEGEN_CGHLSLRUNTIME_H
 
-#include "llvm/ADT/DenseMap.h"
-#include "llvm/IR/IRBuilder.h"
-#include "llvm/IR/Intrinsics.h"
-#include "llvm/IR/IntrinsicsDirectX.h"
-#include "llvm/IR/IntrinsicsSPIRV.h"
-
+#include "Address.h"
 #include "clang/AST/Attr.h"
 #include "clang/AST/Decl.h"
 #include "clang/Basic/Builtins.h"
 #include "clang/Basic/HLSLRuntime.h"
-
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Frontend/HLSL/HLSLResource.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/IntrinsicsDirectX.h"
+#include "llvm/IR/IntrinsicsSPIRV.h"
 
 #include <optional>
 #include <vector>
@@ -100,6 +99,11 @@ class CGHLSLOffsetInfo {
   /// of the HLSL buffer after all of the elements with specified offset.
   static CGHLSLOffsetInfo fromDecl(const HLSLBufferDecl &BufDecl);
 
+  /// Comparison function for offsets received from `operator[]` suitable for
+  /// use in a `stable_sort`. This will order implicit bindings after explicit
+  /// offsets.
+  static bool compareOffsets(uint32_t LHS, uint32_t RHS) { return LHS < RHS; }
+
   /// Get the given offset, or `~0U` if there is no offset for the member.
   uint32_t operator[](size_t I) const {
     if (Offsets.empty())
@@ -214,19 +218,28 @@ class CGHLSLRuntime {
 
   llvm::Instruction *getConvergenceToken(llvm::BasicBlock &BB);
 
-  llvm::TargetExtType *
-  getHLSLBufferLayoutType(const RecordType *LayoutStructTy);
+  llvm::StructType *getHLSLBufferLayoutType(const RecordType *LayoutStructTy);
   void addHLSLBufferLayoutType(const RecordType *LayoutStructTy,
-                               llvm::TargetExtType *LayoutTy);
+                               llvm::StructType *LayoutTy);
   void emitInitListOpaqueValues(CodeGenFunction &CGF, InitListExpr *E);
...
[truncated]

This change drops the use of the "Layout" type and instead uses explicit
padding throughout the compiler to represent types in HLSL buffers.

There are a few parts to this, though it's difficult to split them up as
they're very interdependent:

1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding
   of arbitrary types.
2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when
   generating aggregate copies, array accesses, and structure accesses.
3. Simplify DXILCBufferAccesses such that it directly replaces accesses
   with dx.resource.getpointer rather than recalculating the layout.
4. Basic infrastructure for SPIR-V handling, but the implementation
   itself will need work in follow ups.

Fixes several issues, including llvm#138996, llvm#144573, and llvm#156084.
Resolves llvm#147352.
@bogner bogner force-pushed the 2025-11-10-cbuffer-padding branch from e974162 to c4a9c64 Compare November 11, 2025 20:09
use(s3.c2.a1);
// CHECK: load <2 x float>, ptr addrspace(2) getelementptr (<{ %A, target("dx.Padding", 8) }>, ptr addrspace(2) @s4, i32 2, i32 0), align 8
use(s4[2].a1);
// CHECK: load <3 x i16>, ptr addrspace(2) getelementptr inbounds nuw (%B, ptr addrspace(2) getelementptr (<{ %B, target("dx.Padding", 2) }>, ptr addrspace(2) getelementptr (<{ <{ [5 x <{ %B, target("dx.Padding", 2) }>], %B }>, target("dx.Padding", 2) }>, ptr addrspace(2) @s5, i32 3, i32 0), i32 5, i32 0), i32 0, i32 1), align 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to see what the spir-v backend does with this. I was expect the array sizes to be part of the GEPs, but it is not. A direct translation in SPIR-V would use the OpPtrAccessChain intructions, but they are not available in shaders.

I'll have to see how we can translate that. Also, I'm not sure how this will work for the structured GEP that @Keenuts is working on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me: we'll have to change codegen to emit the sgep instruction, and remove the zero indices, but seems OK

Copy link
Member

@hekota hekota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this taking shape! I just reviewed the Clang part so far and it is looking good - just a few comments.

Comment on lines 31 to 45
/// Lays out a struct type following HLSL buffer rules and considering
/// PackOffsets, if provided. Previously created layout structs are cached by
/// CGHLSLRuntime.
///
/// The function iterates over all fields of the record type (including base
/// classes) and calls layoutField to converts each field to its corresponding
/// LLVM type and to calculate its HLSL constant buffer layout. Any embedded
/// structs (or arrays of structs) are converted to layout types as well.
///
/// When PackOffsets are specified the elements will be placed based on the
/// user-specified offsets. Not all elements must have a
/// packoffset/register(c#) annotation though. For those that don't, the
/// PackOffsets array will contain -1 value instead. These elements must be
/// placed at the end of the layout after all of the elements with specific
/// offset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated. It talks about PackOffsets, but the code uses OffsetInfo. And layoutField function does not exist.

Comment on lines 268 to 272
if (llvm::Type *TargetTy =
if (llvm::Type *LayoutTy =
CGM.getTargetCodeGenInfo().getHLSLType(CGM, T, OffsetInfo))
return TargetTy;
return LayoutTy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be renamed. All resource types go thought this.

Comment on lines 36 to 37
// TODO: We should have target("dx.Padding", 2) padding after %B, but we don't correctly handle
// 2- and 3-element vectors inside structs yet because of DataLayout rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue for this? It might be worth adding a link to it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked to #123968

// INLINE-NEXT-SAME: @"llvm.dx.resource.handlefromimplicitbinding.tdx.CBuffer_tdx.Layout_s___cblayout_$Globalss_4_0tt"(i32 0, i32 0, i32 1, i32 0, i1 false)
// INLINE-NEXT: store target("dx.CBuffer", target("dx.Layout", %"__cblayout_$Globals", 4, 0)) %[[HANDLE]], ptr @"$Globals.cb", align 4
// INLINE-NEXT: %[[HANDLE:.*]] = call target("dx.CBuffer", %"__cblayout_$Globals")
// INLINE-NEXT-SAME: @"llvm.dx.resource.handlefromimplicitbinding.tdx.CBuffer_s___cblayout_$Globalsst"(i32 0, i32 0, i32 1, i32 0, i1 false, ptr @"$Globals.str")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// INLINE-NEXT-SAME: @"llvm.dx.resource.handlefromimplicitbinding.tdx.CBuffer_s___cblayout_$Globalsst"(i32 0, i32 0, i32 1, i32 0, i1 false, ptr @"$Globals.str")
// INLINE-NEXT-SAME: @"llvm.dx.resource.handlefromimplicitbinding.tdx.CBuffer_s___cblayout_$Globalsst"(i32 0, i32 0, i32 1, i32 0, ptr @"$Globals.str")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was curious why this wasn't broken, turns out this had INLINE-NEXT-SAME instead of INLINE-SAME, which isn't a FileCheck directive at all :/

I've adjusted this to just check the whole thing on one line.

Comment on lines 329 to 331
llvm::stable_sort(DeclsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the vast majority of cbuffers does not use packoffset, let alone in the way that changes the order of the members, I am wondering if we could skip some of the work:

Suggested change
llvm::stable_sort(DeclsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});
if (!OffsetInfo.empty())
llvm::stable_sort(DeclsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});

// be past the end of the in-memory object.
SmallVector<llvm::Value *, 2> Indices;
Indices.push_back(Idx);
Indices.push_back(llvm::ConstantInt::get(Indices[0]->getType(), 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Indices.push_back(llvm::ConstantInt::get(Indices[0]->getType(), 0));
Indices.push_back(llvm::ConstantInt::get(Idx->getType(), 0));

Comment on lines 1322 to 1323
// Since this is a member of the buffer and not the buffer itself, we
// shouldn't have any offsets we need to contend with.
Copy link
Member

@hekota hekota Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a while to figure out what it means :)

Suggested change
// Since this is a member of the buffer and not the buffer itself, we
// shouldn't have any offsets we need to contend with.
// Since this is a member of struct/class in the buffer and not the struct/class itself, we
// shouldn't have any offsets on the members we need to contend with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded

Comment on lines 55 to 57
llvm::stable_sort(FieldsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as in one of my comments above, I wonder if we could save some of the work that is not necessary in the common case:

Suggested change
llvm::stable_sort(FieldsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});
if (!OffsetInfo.empty())
llvm::stable_sort(FieldsWithOffset, [](const auto &LHS, const auto &RHS) {
return CGHLSLOffsetInfo::compareOffsets(LHS.second, RHS.second);
});

Comment on lines 75 to 76
// TODO: Do we need to diagnose invalid packoffsets here, or will they have
// already been taken care of?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping packoffsets are already diagnosed here:

S.Diag(NextVar->getLocation(), diag::err_hlsl_packoffset_overlap)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense. Dropped the TODO, but left the assert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:DirectX backend:SPIR-V clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category HLSL HLSL Language Support llvm:ir

Projects

None yet

5 participants