[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. #145125

adam-yang · 2025-06-21T00:08:50Z

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure in SIPreAllocateWWMRegs with the $noreg operand in DBG_VALUE.

github-actions · 2025-06-21T00:10:23Z

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

…te DWARF correctly in AMDGPU

slinder1

Thank you for the patch!

I have some small nits, and one larger concern around changing AMDPAL relocation behavior. If I understand the intent of the change, I think that part of the change should be removed and instead support for the REL relocations added in LogicalView or whatever library it depends on

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUAsmBackend.cpp

llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll

llvmbot · 2025-07-25T19:03:45Z

@llvm/pr-subscribers-debuginfo

Author: Adam Yang (adam-yang)

Changes

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure with the $noreg operand in DBG_VALUE, and writing relocation with AMDPAL.

Patch is 24.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145125.diff

7 Files Affected:

(modified) llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h (+2-1)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp (+3-3)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp (+8-2)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp (+7-5)
(modified) llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp (+5)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll (+101)
(added) llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir (+210)

diff --git a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
index 1847fa8323480..2cf4a8ec6a37f 100644
--- a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
+++ b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
@@ -159,7 +159,8 @@ class LVBinaryReader : public LVReader {
   LVAddress WasmCodeSectionOffset = 0;
 
   // Loads all info for the architecture of the provided object file.
-  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);
+  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures,
+                              StringRef TheCPU);
 
   virtual void mapRangeAddress(const object::ObjectFile &Obj) {}
   virtual void mapRangeAddress(const object::ObjectFile &Obj,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
index 80b4185b7c600..0df9137a3bd37 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
@@ -275,7 +275,8 @@ void LVBinaryReader::mapVirtualAddress(const object::COFFObjectFile &COFFObj) {
 }
 
 Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
-                                            StringRef TheFeatures) {
+                                            StringRef TheFeatures,
+                                            StringRef TheCPU) {
   std::string TargetLookupError;
   const Target *TheTarget =
       TargetRegistry::lookupTarget(TheTriple, TargetLookupError);
@@ -298,9 +299,8 @@ Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
   MAI.reset(AsmInfo);
 
   // Target subtargets.
-  StringRef CPU;
   MCSubtargetInfo *SubtargetInfo(
-      TheTarget->createMCSubtargetInfo(TheTriple, CPU, TheFeatures));
+      TheTarget->createMCSubtargetInfo(TheTriple, TheCPU, TheFeatures));
   if (!SubtargetInfo)
     return createStringError(errc::invalid_argument,
                              "no subtarget info for target " + TheTriple);
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
index e5895516b5e77..2ff70816b4bf1 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
@@ -1190,7 +1190,12 @@ Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
@@ -1200,8 +1205,9 @@ Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
   TT.setOS(Triple::Win32);
 
   StringRef TheFeature = "";
+  StringRef TheCPU = "";
 
-  return loadGenericTargetInfo(TT.str(), TheFeature);
+  return loadGenericTargetInfo(TT.str(), TheFeature, TheCPU);
 }
 
 std::string LVCodeViewReader::getRegisterName(LVSmall Opcode,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
index 696e2bc948a2e..62134dfdadf46 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
@@ -956,10 +956,7 @@ LVElement *LVDWARFReader::getElementForOffset(LVOffset Offset,
 Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
   // Detect the architecture from the object file. We usually don't need OS
   // info to lookup a target and create register info.
-  Triple TT;
-  TT.setArch(Triple::ArchType(Obj.getArch()));
-  TT.setVendor(Triple::UnknownVendor);
-  TT.setOS(Triple::UnknownOS);
+  Triple TT = Obj.makeTriple();
 
   // Features to be passed to target/subtarget
   Expected<SubtargetFeatures> Features = Obj.getFeatures();
@@ -969,7 +966,12 @@ Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 void LVDWARFReader::mapRangeAddress(const ObjectFile &Obj) {
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 205a45a045a42..f807c567efa2f 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -130,6 +130,11 @@ void SIPreAllocateWWMRegs::rewriteRegs(MachineFunction &MF) {
         if (VirtReg.isPhysical())
           continue;
 
+        if (!VirtReg.isValid()) {
+          assert(MI.isDebugInstr() && "non-debug use of noreg");
+          continue;
+        }
+
         if (!VRM->hasPhys(VirtReg))
           continue;
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
new file mode 100644
index 0000000000000..2cff21c66172d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
@@ -0,0 +1,101 @@
+; RUN: llc %s -o %t.o -mcpu=gfx1030 -filetype=obj -O0
+; RUN: llvm-debuginfo-analyzer %t.o --print=all --attribute=all | FileCheck %s
+
+; This test compiles this module with AMDGPU backend under -O0,
+; and makes sure llvm-debuginfo-analyzer works for it.
+
+; Simple checks to make sure llvm-debuginfo-analzyer didn't fail early.
+; CHECK: Logical View:
+; CHECK: {CompileUnit}
+; CHECK-DAG: {Parameter} 'dtid' -> [0x{{[a-f0-9]+}}]'uint3'
+; CHECK-DAG: {Variable} 'my_var2' -> [0x{{[a-f0-9]+}}]'float'
+; CHECK-DAG: {Line} {{.+}}basic_var.hlsl
+; CHECK: {Code} 's_endpgm'
+
+source_filename = "module"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-p10:32:32-p11:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-p32:32:32-v8:8-v16:16-v32:32-v48:32-v64:32-v80:32-v96:32-v112:32-v128:32-v144:32-v160:32-v176:32-v192:32-v208:32-v224:32-v240:32-v256:32-i1:32-i8:8-i16:16-i32:32-i64:32-f16:16-f32:32-f64:32"
+target triple = "amdgcn-amd-amdpal"
+
+%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+  %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+  %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+  %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+  %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+  %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+    #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+  %4 = and i64 %1, -4294967296, !dbg !30
+  %5 = zext i32 %userdata4 to i64, !dbg !30
+  %6 = or disjoint i64 %4, %5, !dbg !30
+  %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !30
+  %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+  %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+    #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+  %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+    #dbg_value(float %10, !34, !DIExpression(), !35)
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !36
+  %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36
+  %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+  %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+  %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+  call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+  ret void, !dbg !37
+}
+
+declare noundef i64 @llvm.amdgcn.s.getpc() #1
+
+declare void @llvm.assume(i1 noundef) #2
+
+declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+
+declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!12, !13}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+!1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+!2 = !{}
+!3 = !{!4, !10}
+!4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+!5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+!6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+!7 = !{!8}
+!8 = !DITemplateTypeParameter(name: "element", type: !9)
+!9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+!10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+!11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+!12 = !{i32 2, !"Dwarf Version", i32 5}
+!13 = !{i32 2, !"Debug Info Version", i32 3}
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DISubroutineType(types: !16)
+!16 = !{null, !17}
+!17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+!18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+!19 = !{!20, !22, !23}
+!20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+!21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+!22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+!23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+!24 = !{!25, !26}
+!25 = !DITemplateTypeParameter(name: "element", type: !21)
+!26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+!27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+!28 = !DILocation(line: 7, column: 17, scope: !14)
+!29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+!30 = !DILocation(line: 11, column: 18, scope: !14)
+!31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+!32 = !DILocation(line: 11, column: 9, scope: !14)
+!33 = !DILocation(line: 14, column: 26, scope: !14)
+!34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+!35 = !DILocation(line: 14, column: 9, scope: !14)
+!36 = !DILocation(line: 17, column: 14, scope: !14)
+!37 = !DILocation(line: 19, column: 1, scope: !14)
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
new file mode 100644
index 0000000000000..4b5fea863289b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
@@ -0,0 +1,210 @@
+# RUN: llc %s -o - -mcpu=gfx1030 -O0 -run-pass=si-pre-allocate-wwm-regs | FileCheck %s
+
+# Simple regression test to make sure DBG_VALUE $noreg does not assert in the pass
+
+# CHECK: S_ENDPGM
+
+--- |
+  source_filename = "module"
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+  target triple = "amdgcn-amd-amdpal"
+
+  %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+  define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+    %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+    %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+    %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+    %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+    %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+      #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+    %4 = and i64 %1, -4294967296, !dbg !30
+    %5 = zext i32 %userdata4 to i64, !dbg !30
+    %6 = or disjoint i64 %4, %5, !dbg !30
+    %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30, !amdgpu.uniform !2
+    %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+    %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+      #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+    %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+      #dbg_value(float %10, !34, !DIExpression(), !35)
+    %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36, !amdgpu.uniform !2
+    %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+    %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+    %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+    call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+    ret void, !dbg !37
+  }
+
+  declare noundef i64 @llvm.amdgcn.s.getpc() #1
+  declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+  declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+  attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+  attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
+  attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
+  attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
+  attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }
+
+  !llvm.dbg.cu = !{!0}
+  !llvm.module.flags = !{!12, !13}
+
+  !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+  !1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+  !2 = !{}
+  !3 = !{!4, !10}
+  !4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+  !5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+  !6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+  !7 = !{!8}
+  !8 = !DITemplateTypeParameter(name: "element", type: !9)
+  !9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+  !10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+  !11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+  !12 = !{i32 2, !"Dwarf Version", i32 5}
+  !13 = !{i32 2, !"Debug Info Version", i32 3}
+  !14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+  !15 = !DISubroutineType(types: !16)
+  !16 = !{null, !17}
+  !17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+  !18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+  !19 = !{!20, !22, !23}
+  !20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+  !21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+  !22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+  !23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+  !24 = !{!25, !26}
+  !25 = !DITemplateTypeParameter(name: "element", type: !21)
+  !26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+  !27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+  !28 = !DILocation(line: 7, column: 17, scope: !14)
+  !29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+  !30 = !DILocation(line: 11, column: 18, scope: !14)
+  !31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+  !32 = !DILocation(line: 11, column: 9, scope: !14)
+  !33 = !DILocation(line: 14, column: 26, scope: !14)
+  !34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+  !35 = !DILocation(line: 14, column: 9, scope: !14)
+  !36 = !DILocation(line: 17, column: 14, scope: !14)
+  !37 = !DILocation(line: 19, column: 1, scope: !14)
+...
+---
+name:            _amdgpu_cs_main
+alignment:       1
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+noPhis:          true
+isSSA:           false
+noVRegs:         false
+hasFakeUses:     false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHContTarget: false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  explicitKernArgSize: 0
+  maxKernArgAlign: 4
+  ldsSize:         0
+  gdsSize: ...
[truncated]

llvmbot · 2025-07-25T19:03:46Z

@llvm/pr-subscribers-backend-amdgpu

Author: Adam Yang (adam-yang)

Changes

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure with the $noreg operand in DBG_VALUE, and writing relocation with AMDPAL.

Patch is 24.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145125.diff

7 Files Affected:

(modified) llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h (+2-1)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp (+3-3)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp (+8-2)
(modified) llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp (+7-5)
(modified) llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp (+5)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll (+101)
(added) llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir (+210)

diff --git a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
index 1847fa8323480..2cf4a8ec6a37f 100644
--- a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
+++ b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
@@ -159,7 +159,8 @@ class LVBinaryReader : public LVReader {
   LVAddress WasmCodeSectionOffset = 0;
 
   // Loads all info for the architecture of the provided object file.
-  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);
+  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures,
+                              StringRef TheCPU);
 
   virtual void mapRangeAddress(const object::ObjectFile &Obj) {}
   virtual void mapRangeAddress(const object::ObjectFile &Obj,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
index 80b4185b7c600..0df9137a3bd37 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
@@ -275,7 +275,8 @@ void LVBinaryReader::mapVirtualAddress(const object::COFFObjectFile &COFFObj) {
 }
 
 Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
-                                            StringRef TheFeatures) {
+                                            StringRef TheFeatures,
+                                            StringRef TheCPU) {
   std::string TargetLookupError;
   const Target *TheTarget =
       TargetRegistry::lookupTarget(TheTriple, TargetLookupError);
@@ -298,9 +299,8 @@ Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
   MAI.reset(AsmInfo);
 
   // Target subtargets.
-  StringRef CPU;
   MCSubtargetInfo *SubtargetInfo(
-      TheTarget->createMCSubtargetInfo(TheTriple, CPU, TheFeatures));
+      TheTarget->createMCSubtargetInfo(TheTriple, TheCPU, TheFeatures));
   if (!SubtargetInfo)
     return createStringError(errc::invalid_argument,
                              "no subtarget info for target " + TheTriple);
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
index e5895516b5e77..2ff70816b4bf1 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
@@ -1190,7 +1190,12 @@ Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
@@ -1200,8 +1205,9 @@ Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
   TT.setOS(Triple::Win32);
 
   StringRef TheFeature = "";
+  StringRef TheCPU = "";
 
-  return loadGenericTargetInfo(TT.str(), TheFeature);
+  return loadGenericTargetInfo(TT.str(), TheFeature, TheCPU);
 }
 
 std::string LVCodeViewReader::getRegisterName(LVSmall Opcode,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
index 696e2bc948a2e..62134dfdadf46 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
@@ -956,10 +956,7 @@ LVElement *LVDWARFReader::getElementForOffset(LVOffset Offset,
 Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
   // Detect the architecture from the object file. We usually don't need OS
   // info to lookup a target and create register info.
-  Triple TT;
-  TT.setArch(Triple::ArchType(Obj.getArch()));
-  TT.setVendor(Triple::UnknownVendor);
-  TT.setOS(Triple::UnknownOS);
+  Triple TT = Obj.makeTriple();
 
   // Features to be passed to target/subtarget
   Expected<SubtargetFeatures> Features = Obj.getFeatures();
@@ -969,7 +966,12 @@ Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 void LVDWARFReader::mapRangeAddress(const ObjectFile &Obj) {
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 205a45a045a42..f807c567efa2f 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -130,6 +130,11 @@ void SIPreAllocateWWMRegs::rewriteRegs(MachineFunction &MF) {
         if (VirtReg.isPhysical())
           continue;
 
+        if (!VirtReg.isValid()) {
+          assert(MI.isDebugInstr() && "non-debug use of noreg");
+          continue;
+        }
+
         if (!VRM->hasPhys(VirtReg))
           continue;
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
new file mode 100644
index 0000000000000..2cff21c66172d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
@@ -0,0 +1,101 @@
+; RUN: llc %s -o %t.o -mcpu=gfx1030 -filetype=obj -O0
+; RUN: llvm-debuginfo-analyzer %t.o --print=all --attribute=all | FileCheck %s
+
+; This test compiles this module with AMDGPU backend under -O0,
+; and makes sure llvm-debuginfo-analyzer works for it.
+
+; Simple checks to make sure llvm-debuginfo-analzyer didn't fail early.
+; CHECK: Logical View:
+; CHECK: {CompileUnit}
+; CHECK-DAG: {Parameter} 'dtid' -> [0x{{[a-f0-9]+}}]'uint3'
+; CHECK-DAG: {Variable} 'my_var2' -> [0x{{[a-f0-9]+}}]'float'
+; CHECK-DAG: {Line} {{.+}}basic_var.hlsl
+; CHECK: {Code} 's_endpgm'
+
+source_filename = "module"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-p10:32:32-p11:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-p32:32:32-v8:8-v16:16-v32:32-v48:32-v64:32-v80:32-v96:32-v112:32-v128:32-v144:32-v160:32-v176:32-v192:32-v208:32-v224:32-v240:32-v256:32-i1:32-i8:8-i16:16-i32:32-i64:32-f16:16-f32:32-f64:32"
+target triple = "amdgcn-amd-amdpal"
+
+%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+  %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+  %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+  %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+  %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+  %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+    #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+  %4 = and i64 %1, -4294967296, !dbg !30
+  %5 = zext i32 %userdata4 to i64, !dbg !30
+  %6 = or disjoint i64 %4, %5, !dbg !30
+  %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !30
+  %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+  %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+    #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+  %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+    #dbg_value(float %10, !34, !DIExpression(), !35)
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !36
+  %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36
+  %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+  %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+  %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+  call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+  ret void, !dbg !37
+}
+
+declare noundef i64 @llvm.amdgcn.s.getpc() #1
+
+declare void @llvm.assume(i1 noundef) #2
+
+declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+
+declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!12, !13}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+!1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+!2 = !{}
+!3 = !{!4, !10}
+!4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+!5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+!6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+!7 = !{!8}
+!8 = !DITemplateTypeParameter(name: "element", type: !9)
+!9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+!10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+!11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+!12 = !{i32 2, !"Dwarf Version", i32 5}
+!13 = !{i32 2, !"Debug Info Version", i32 3}
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DISubroutineType(types: !16)
+!16 = !{null, !17}
+!17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+!18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+!19 = !{!20, !22, !23}
+!20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+!21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+!22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+!23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+!24 = !{!25, !26}
+!25 = !DITemplateTypeParameter(name: "element", type: !21)
+!26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+!27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+!28 = !DILocation(line: 7, column: 17, scope: !14)
+!29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+!30 = !DILocation(line: 11, column: 18, scope: !14)
+!31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+!32 = !DILocation(line: 11, column: 9, scope: !14)
+!33 = !DILocation(line: 14, column: 26, scope: !14)
+!34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+!35 = !DILocation(line: 14, column: 9, scope: !14)
+!36 = !DILocation(line: 17, column: 14, scope: !14)
+!37 = !DILocation(line: 19, column: 1, scope: !14)
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
new file mode 100644
index 0000000000000..4b5fea863289b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
@@ -0,0 +1,210 @@
+# RUN: llc %s -o - -mcpu=gfx1030 -O0 -run-pass=si-pre-allocate-wwm-regs | FileCheck %s
+
+# Simple regression test to make sure DBG_VALUE $noreg does not assert in the pass
+
+# CHECK: S_ENDPGM
+
+--- |
+  source_filename = "module"
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+  target triple = "amdgcn-amd-amdpal"
+
+  %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+  define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+    %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+    %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+    %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+    %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+    %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+      #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+    %4 = and i64 %1, -4294967296, !dbg !30
+    %5 = zext i32 %userdata4 to i64, !dbg !30
+    %6 = or disjoint i64 %4, %5, !dbg !30
+    %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30, !amdgpu.uniform !2
+    %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+    %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+      #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+    %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+      #dbg_value(float %10, !34, !DIExpression(), !35)
+    %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36, !amdgpu.uniform !2
+    %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+    %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+    %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+    call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+    ret void, !dbg !37
+  }
+
+  declare noundef i64 @llvm.amdgcn.s.getpc() #1
+  declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+  declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+  attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+  attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
+  attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
+  attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
+  attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }
+
+  !llvm.dbg.cu = !{!0}
+  !llvm.module.flags = !{!12, !13}
+
+  !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+  !1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+  !2 = !{}
+  !3 = !{!4, !10}
+  !4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+  !5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+  !6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+  !7 = !{!8}
+  !8 = !DITemplateTypeParameter(name: "element", type: !9)
+  !9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+  !10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+  !11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+  !12 = !{i32 2, !"Dwarf Version", i32 5}
+  !13 = !{i32 2, !"Debug Info Version", i32 3}
+  !14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+  !15 = !DISubroutineType(types: !16)
+  !16 = !{null, !17}
+  !17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+  !18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+  !19 = !{!20, !22, !23}
+  !20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+  !21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+  !22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+  !23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+  !24 = !{!25, !26}
+  !25 = !DITemplateTypeParameter(name: "element", type: !21)
+  !26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+  !27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+  !28 = !DILocation(line: 7, column: 17, scope: !14)
+  !29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+  !30 = !DILocation(line: 11, column: 18, scope: !14)
+  !31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+  !32 = !DILocation(line: 11, column: 9, scope: !14)
+  !33 = !DILocation(line: 14, column: 26, scope: !14)
+  !34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+  !35 = !DILocation(line: 14, column: 9, scope: !14)
+  !36 = !DILocation(line: 17, column: 14, scope: !14)
+  !37 = !DILocation(line: 19, column: 1, scope: !14)
+...
+---
+name:            _amdgpu_cs_main
+alignment:       1
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+noPhis:          true
+isSSA:           false
+noVRegs:         false
+hasFakeUses:     false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHContTarget: false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  explicitKernArgSize: 0
+  maxKernArgAlign: 4
+  ldsSize:         0
+  gdsSize: ...
[truncated]

adam-yang · 2025-07-25T19:05:48Z

Thank you for the patch!

I have some small nits, and one larger concern around changing AMDPAL relocation behavior. If I understand the intent of the change, I think that part of the change should be removed and instead support for the REL relocations added in LogicalView or whatever library it depends on

Thanks for the review @slinder1! This change was the precursor for #149429, which is intended to be a replacement for this change. Did you have a chance to take a look at that? I'm open to get this change in first, and reduce #149429 to a smaller PR if that makes things easier.

llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll

arsenm · 2025-07-28T23:23:49Z

llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp

          continue;

+        if (!VirtReg.isValid()) {
+          assert(MI.isDebugInstr() && "non-debug use of noreg");


This assert probably isn't true in general, but would belong in the machine verifier

Are there AMDGPU instructions that take noreg? I could see other targets failing verifier if the assert is run on all targets.

A generic instruction easily could. I think we use this as a placeholder in a few places. In any case you should just drop the assert

Also I thought noreg counted as a physical register, so won't the continue above catch it?

FirstPhysicalRegister is defined as 1, so NoReg is excluded. In any case, the assert is removed.

… be a debug inst

llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

arsenm · 2025-08-08T03:59:37Z

llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

+  attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+  attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
+  attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
+  attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
+  attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }


Remove unnecessary attributes

llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

slinder1

Sorry for the delay, LGTM

arsenm

lgtm with unnecessary mir fields dropped

llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

Co-authored-by: Matt Arsenault <[email protected]>

adam-yang added 2 commits July 8, 2025 13:13

Made llvm-debuginfo-analyzer work for AMDGPU. A few changes to genera…

3254815

…te DWARF correctly in AMDGPU

Moved the test to amdgpu target tests

c6bacae

adam-yang force-pushed the amdgpu_o0_debug_analyzer branch from 4dfaf21 to c6bacae Compare July 8, 2025 20:15

slinder1 requested changes Jul 24, 2025

View reviewed changes

Addressed feedback

a4277f4

adam-yang marked this pull request as ready for review July 25, 2025 19:03

llvmbot added backend:AMDGPU debuginfo labels Jul 25, 2025

arsenm reviewed Jul 26, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll Outdated Show resolved Hide resolved

Addressed feedback

c938125

arsenm reviewed Jul 28, 2025

View reviewed changes

Removed the assert that the instruction that $noreg is part of has to…

dea0f42

… be a debug inst

arsenm reviewed Aug 8, 2025

View reviewed changes

Further reduced test.

af4f40e

slinder1 approved these changes Aug 11, 2025

View reviewed changes

arsenm approved these changes Aug 12, 2025

View reviewed changes

adam-yang and others added 5 commits August 12, 2025 14:28

Update llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

bce027a

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

bdcbdc2

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

70dd953

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

9e014b8

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir

f54cb3d

Co-authored-by: Matt Arsenault <[email protected]>

adam-yang enabled auto-merge (squash) August 12, 2025 21:44

adam-yang merged commit 8710571 into llvm:main Aug 12, 2025
9 checks passed

[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. #145125

[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. #145125

Uh oh!

Conversation

adam-yang commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 21, 2025

Uh oh!

slinder1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jul 25, 2025

Uh oh!

llvmbot commented Jul 25, 2025

Uh oh!

adam-yang commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

adam-yang Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

adam-yang Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arsenm Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

slinder1 left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adam-yang commented Jun 21, 2025 •

edited

Loading