Skip to content

[AMDGPU] Fixed llvm-debuginfo-analyzer for AMDGPU. #145125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

adam-yang
Copy link
Contributor

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure with the $noreg operand in DBG_VALUE, and writing relocation with AMDPAL.

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

@adam-yang adam-yang force-pushed the amdgpu_o0_debug_analyzer branch from 4dfaf21 to c6bacae Compare July 8, 2025 20:15
Copy link
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch!

I have some small nits, and one larger concern around changing AMDPAL relocation behavior. If I understand the intent of the change, I think that part of the change should be removed and instead support for the REL relocations added in LogicalView or whatever library it depends on

@adam-yang adam-yang marked this pull request as ready for review July 25, 2025 19:03
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-debuginfo

Author: Adam Yang (adam-yang)

Changes

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure with the $noreg operand in DBG_VALUE, and writing relocation with AMDPAL.


Patch is 24.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145125.diff

7 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h (+2-1)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp (+3-3)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp (+8-2)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp (+7-5)
  • (modified) llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp (+5)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll (+101)
  • (added) llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir (+210)
diff --git a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
index 1847fa8323480..2cf4a8ec6a37f 100644
--- a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
+++ b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
@@ -159,7 +159,8 @@ class LVBinaryReader : public LVReader {
   LVAddress WasmCodeSectionOffset = 0;
 
   // Loads all info for the architecture of the provided object file.
-  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);
+  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures,
+                              StringRef TheCPU);
 
   virtual void mapRangeAddress(const object::ObjectFile &Obj) {}
   virtual void mapRangeAddress(const object::ObjectFile &Obj,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
index 80b4185b7c600..0df9137a3bd37 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
@@ -275,7 +275,8 @@ void LVBinaryReader::mapVirtualAddress(const object::COFFObjectFile &COFFObj) {
 }
 
 Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
-                                            StringRef TheFeatures) {
+                                            StringRef TheFeatures,
+                                            StringRef TheCPU) {
   std::string TargetLookupError;
   const Target *TheTarget =
       TargetRegistry::lookupTarget(TheTriple, TargetLookupError);
@@ -298,9 +299,8 @@ Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
   MAI.reset(AsmInfo);
 
   // Target subtargets.
-  StringRef CPU;
   MCSubtargetInfo *SubtargetInfo(
-      TheTarget->createMCSubtargetInfo(TheTriple, CPU, TheFeatures));
+      TheTarget->createMCSubtargetInfo(TheTriple, TheCPU, TheFeatures));
   if (!SubtargetInfo)
     return createStringError(errc::invalid_argument,
                              "no subtarget info for target " + TheTriple);
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
index e5895516b5e77..2ff70816b4bf1 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
@@ -1190,7 +1190,12 @@ Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
@@ -1200,8 +1205,9 @@ Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
   TT.setOS(Triple::Win32);
 
   StringRef TheFeature = "";
+  StringRef TheCPU = "";
 
-  return loadGenericTargetInfo(TT.str(), TheFeature);
+  return loadGenericTargetInfo(TT.str(), TheFeature, TheCPU);
 }
 
 std::string LVCodeViewReader::getRegisterName(LVSmall Opcode,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
index 696e2bc948a2e..62134dfdadf46 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
@@ -956,10 +956,7 @@ LVElement *LVDWARFReader::getElementForOffset(LVOffset Offset,
 Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
   // Detect the architecture from the object file. We usually don't need OS
   // info to lookup a target and create register info.
-  Triple TT;
-  TT.setArch(Triple::ArchType(Obj.getArch()));
-  TT.setVendor(Triple::UnknownVendor);
-  TT.setOS(Triple::UnknownOS);
+  Triple TT = Obj.makeTriple();
 
   // Features to be passed to target/subtarget
   Expected<SubtargetFeatures> Features = Obj.getFeatures();
@@ -969,7 +966,12 @@ Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 void LVDWARFReader::mapRangeAddress(const ObjectFile &Obj) {
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 205a45a045a42..f807c567efa2f 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -130,6 +130,11 @@ void SIPreAllocateWWMRegs::rewriteRegs(MachineFunction &MF) {
         if (VirtReg.isPhysical())
           continue;
 
+        if (!VirtReg.isValid()) {
+          assert(MI.isDebugInstr() && "non-debug use of noreg");
+          continue;
+        }
+
         if (!VRM->hasPhys(VirtReg))
           continue;
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
new file mode 100644
index 0000000000000..2cff21c66172d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
@@ -0,0 +1,101 @@
+; RUN: llc %s -o %t.o -mcpu=gfx1030 -filetype=obj -O0
+; RUN: llvm-debuginfo-analyzer %t.o --print=all --attribute=all | FileCheck %s
+
+; This test compiles this module with AMDGPU backend under -O0,
+; and makes sure llvm-debuginfo-analyzer works for it.
+
+; Simple checks to make sure llvm-debuginfo-analzyer didn't fail early.
+; CHECK: Logical View:
+; CHECK: {CompileUnit}
+; CHECK-DAG: {Parameter} 'dtid' -> [0x{{[a-f0-9]+}}]'uint3'
+; CHECK-DAG: {Variable} 'my_var2' -> [0x{{[a-f0-9]+}}]'float'
+; CHECK-DAG: {Line} {{.+}}basic_var.hlsl
+; CHECK: {Code} 's_endpgm'
+
+source_filename = "module"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-p10:32:32-p11:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-p32:32:32-v8:8-v16:16-v32:32-v48:32-v64:32-v80:32-v96:32-v112:32-v128:32-v144:32-v160:32-v176:32-v192:32-v208:32-v224:32-v240:32-v256:32-i1:32-i8:8-i16:16-i32:32-i64:32-f16:16-f32:32-f64:32"
+target triple = "amdgcn-amd-amdpal"
+
+%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+  %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+  %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+  %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+  %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+  %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+    #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+  %4 = and i64 %1, -4294967296, !dbg !30
+  %5 = zext i32 %userdata4 to i64, !dbg !30
+  %6 = or disjoint i64 %4, %5, !dbg !30
+  %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !30
+  %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+  %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+    #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+  %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+    #dbg_value(float %10, !34, !DIExpression(), !35)
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !36
+  %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36
+  %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+  %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+  %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+  call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+  ret void, !dbg !37
+}
+
+declare noundef i64 @llvm.amdgcn.s.getpc() #1
+
+declare void @llvm.assume(i1 noundef) #2
+
+declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+
+declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!12, !13}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+!1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+!2 = !{}
+!3 = !{!4, !10}
+!4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+!5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+!6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+!7 = !{!8}
+!8 = !DITemplateTypeParameter(name: "element", type: !9)
+!9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+!10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+!11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+!12 = !{i32 2, !"Dwarf Version", i32 5}
+!13 = !{i32 2, !"Debug Info Version", i32 3}
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DISubroutineType(types: !16)
+!16 = !{null, !17}
+!17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+!18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+!19 = !{!20, !22, !23}
+!20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+!21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+!22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+!23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+!24 = !{!25, !26}
+!25 = !DITemplateTypeParameter(name: "element", type: !21)
+!26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+!27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+!28 = !DILocation(line: 7, column: 17, scope: !14)
+!29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+!30 = !DILocation(line: 11, column: 18, scope: !14)
+!31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+!32 = !DILocation(line: 11, column: 9, scope: !14)
+!33 = !DILocation(line: 14, column: 26, scope: !14)
+!34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+!35 = !DILocation(line: 14, column: 9, scope: !14)
+!36 = !DILocation(line: 17, column: 14, scope: !14)
+!37 = !DILocation(line: 19, column: 1, scope: !14)
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
new file mode 100644
index 0000000000000..4b5fea863289b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
@@ -0,0 +1,210 @@
+# RUN: llc %s -o - -mcpu=gfx1030 -O0 -run-pass=si-pre-allocate-wwm-regs | FileCheck %s
+
+# Simple regression test to make sure DBG_VALUE $noreg does not assert in the pass
+
+# CHECK: S_ENDPGM
+
+--- |
+  source_filename = "module"
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+  target triple = "amdgcn-amd-amdpal"
+
+  %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+  define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+    %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+    %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+    %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+    %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+    %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+      #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+    %4 = and i64 %1, -4294967296, !dbg !30
+    %5 = zext i32 %userdata4 to i64, !dbg !30
+    %6 = or disjoint i64 %4, %5, !dbg !30
+    %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30, !amdgpu.uniform !2
+    %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+    %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+      #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+    %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+      #dbg_value(float %10, !34, !DIExpression(), !35)
+    %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36, !amdgpu.uniform !2
+    %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+    %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+    %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+    call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+    ret void, !dbg !37
+  }
+
+  declare noundef i64 @llvm.amdgcn.s.getpc() #1
+  declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+  declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+  attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+  attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
+  attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
+  attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
+  attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }
+
+  !llvm.dbg.cu = !{!0}
+  !llvm.module.flags = !{!12, !13}
+
+  !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+  !1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+  !2 = !{}
+  !3 = !{!4, !10}
+  !4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+  !5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+  !6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+  !7 = !{!8}
+  !8 = !DITemplateTypeParameter(name: "element", type: !9)
+  !9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+  !10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+  !11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+  !12 = !{i32 2, !"Dwarf Version", i32 5}
+  !13 = !{i32 2, !"Debug Info Version", i32 3}
+  !14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+  !15 = !DISubroutineType(types: !16)
+  !16 = !{null, !17}
+  !17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+  !18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+  !19 = !{!20, !22, !23}
+  !20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+  !21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+  !22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+  !23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+  !24 = !{!25, !26}
+  !25 = !DITemplateTypeParameter(name: "element", type: !21)
+  !26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+  !27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+  !28 = !DILocation(line: 7, column: 17, scope: !14)
+  !29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+  !30 = !DILocation(line: 11, column: 18, scope: !14)
+  !31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+  !32 = !DILocation(line: 11, column: 9, scope: !14)
+  !33 = !DILocation(line: 14, column: 26, scope: !14)
+  !34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+  !35 = !DILocation(line: 14, column: 9, scope: !14)
+  !36 = !DILocation(line: 17, column: 14, scope: !14)
+  !37 = !DILocation(line: 19, column: 1, scope: !14)
+...
+---
+name:            _amdgpu_cs_main
+alignment:       1
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+noPhis:          true
+isSSA:           false
+noVRegs:         false
+hasFakeUses:     false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHContTarget: false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  explicitKernArgSize: 0
+  maxKernArgAlign: 4
+  ldsSize:         0
+  gdsSize: ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Adam Yang (adam-yang)

Changes

Constructing Target triple with ObjectFile::makeTriple instead of just with Arch and leaving the rest unknown. Also creating the subtarget with the CPU. AMDGPU needs the full triple and CPU to disassemble correctly.

To run a full test, also fixed a failure with the $noreg operand in DBG_VALUE, and writing relocation with AMDPAL.


Patch is 24.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145125.diff

7 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h (+2-1)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp (+3-3)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp (+8-2)
  • (modified) llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp (+7-5)
  • (modified) llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp (+5)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll (+101)
  • (added) llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir (+210)
diff --git a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
index 1847fa8323480..2cf4a8ec6a37f 100644
--- a/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
+++ b/llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
@@ -159,7 +159,8 @@ class LVBinaryReader : public LVReader {
   LVAddress WasmCodeSectionOffset = 0;
 
   // Loads all info for the architecture of the provided object file.
-  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);
+  Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures,
+                              StringRef TheCPU);
 
   virtual void mapRangeAddress(const object::ObjectFile &Obj) {}
   virtual void mapRangeAddress(const object::ObjectFile &Obj,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
index 80b4185b7c600..0df9137a3bd37 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
@@ -275,7 +275,8 @@ void LVBinaryReader::mapVirtualAddress(const object::COFFObjectFile &COFFObj) {
 }
 
 Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
-                                            StringRef TheFeatures) {
+                                            StringRef TheFeatures,
+                                            StringRef TheCPU) {
   std::string TargetLookupError;
   const Target *TheTarget =
       TargetRegistry::lookupTarget(TheTriple, TargetLookupError);
@@ -298,9 +299,8 @@ Error LVBinaryReader::loadGenericTargetInfo(StringRef TheTriple,
   MAI.reset(AsmInfo);
 
   // Target subtargets.
-  StringRef CPU;
   MCSubtargetInfo *SubtargetInfo(
-      TheTarget->createMCSubtargetInfo(TheTriple, CPU, TheFeatures));
+      TheTarget->createMCSubtargetInfo(TheTriple, TheCPU, TheFeatures));
   if (!SubtargetInfo)
     return createStringError(errc::invalid_argument,
                              "no subtarget info for target " + TheTriple);
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
index e5895516b5e77..2ff70816b4bf1 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
@@ -1190,7 +1190,12 @@ Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
@@ -1200,8 +1205,9 @@ Error LVCodeViewReader::loadTargetInfo(const PDBFile &Pdb) {
   TT.setOS(Triple::Win32);
 
   StringRef TheFeature = "";
+  StringRef TheCPU = "";
 
-  return loadGenericTargetInfo(TT.str(), TheFeature);
+  return loadGenericTargetInfo(TT.str(), TheFeature, TheCPU);
 }
 
 std::string LVCodeViewReader::getRegisterName(LVSmall Opcode,
diff --git a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
index 696e2bc948a2e..62134dfdadf46 100644
--- a/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
+++ b/llvm/lib/DebugInfo/LogicalView/Readers/LVDWARFReader.cpp
@@ -956,10 +956,7 @@ LVElement *LVDWARFReader::getElementForOffset(LVOffset Offset,
 Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
   // Detect the architecture from the object file. We usually don't need OS
   // info to lookup a target and create register info.
-  Triple TT;
-  TT.setArch(Triple::ArchType(Obj.getArch()));
-  TT.setVendor(Triple::UnknownVendor);
-  TT.setOS(Triple::UnknownOS);
+  Triple TT = Obj.makeTriple();
 
   // Features to be passed to target/subtarget
   Expected<SubtargetFeatures> Features = Obj.getFeatures();
@@ -969,7 +966,12 @@ Error LVDWARFReader::loadTargetInfo(const ObjectFile &Obj) {
     FeaturesValue = SubtargetFeatures();
   }
   FeaturesValue = *Features;
-  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString());
+
+  StringRef CPU;
+  if (auto OptCPU = Obj.tryGetCPUName())
+    CPU = *OptCPU;
+
+  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
 }
 
 void LVDWARFReader::mapRangeAddress(const ObjectFile &Obj) {
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 205a45a045a42..f807c567efa2f 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -130,6 +130,11 @@ void SIPreAllocateWWMRegs::rewriteRegs(MachineFunction &MF) {
         if (VirtReg.isPhysical())
           continue;
 
+        if (!VirtReg.isValid()) {
+          assert(MI.isDebugInstr() && "non-debug use of noreg");
+          continue;
+        }
+
         if (!VRM->hasPhys(VirtReg))
           continue;
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
new file mode 100644
index 0000000000000..2cff21c66172d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-llvm-debuginfo-analyzer.ll
@@ -0,0 +1,101 @@
+; RUN: llc %s -o %t.o -mcpu=gfx1030 -filetype=obj -O0
+; RUN: llvm-debuginfo-analyzer %t.o --print=all --attribute=all | FileCheck %s
+
+; This test compiles this module with AMDGPU backend under -O0,
+; and makes sure llvm-debuginfo-analyzer works for it.
+
+; Simple checks to make sure llvm-debuginfo-analzyer didn't fail early.
+; CHECK: Logical View:
+; CHECK: {CompileUnit}
+; CHECK-DAG: {Parameter} 'dtid' -> [0x{{[a-f0-9]+}}]'uint3'
+; CHECK-DAG: {Variable} 'my_var2' -> [0x{{[a-f0-9]+}}]'float'
+; CHECK-DAG: {Line} {{.+}}basic_var.hlsl
+; CHECK: {Code} 's_endpgm'
+
+source_filename = "module"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-p10:32:32-p11:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-p32:32:32-v8:8-v16:16-v32:32-v48:32-v64:32-v80:32-v96:32-v112:32-v128:32-v144:32-v160:32-v176:32-v192:32-v208:32-v224:32-v240:32-v256:32-i1:32-i8:8-i16:16-i32:32-i64:32-f16:16-f32:32-f64:32"
+target triple = "amdgcn-amd-amdpal"
+
+%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+  %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+  %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+  %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+  %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+  %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+    #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+  %4 = and i64 %1, -4294967296, !dbg !30
+  %5 = zext i32 %userdata4 to i64, !dbg !30
+  %6 = or disjoint i64 %4, %5, !dbg !30
+  %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !30
+  %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+  %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+    #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+  %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+    #dbg_value(float %10, !34, !DIExpression(), !35)
+  call void @llvm.assume(i1 true) [ "align"(ptr addrspace(4) %7, i32 4), "dereferenceable"(ptr addrspace(4) %7, i32 -1) ], !dbg !36
+  %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36
+  %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+  %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+  %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+  call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+  ret void, !dbg !37
+}
+
+declare noundef i64 @llvm.amdgcn.s.getpc() #1
+
+declare void @llvm.assume(i1 noundef) #2
+
+declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+
+declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!12, !13}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+!1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+!2 = !{}
+!3 = !{!4, !10}
+!4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+!5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+!6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+!7 = !{!8}
+!8 = !DITemplateTypeParameter(name: "element", type: !9)
+!9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+!10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+!11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+!12 = !{i32 2, !"Dwarf Version", i32 5}
+!13 = !{i32 2, !"Debug Info Version", i32 3}
+!14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!15 = !DISubroutineType(types: !16)
+!16 = !{null, !17}
+!17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+!18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+!19 = !{!20, !22, !23}
+!20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+!21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+!22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+!23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+!24 = !{!25, !26}
+!25 = !DITemplateTypeParameter(name: "element", type: !21)
+!26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+!27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+!28 = !DILocation(line: 7, column: 17, scope: !14)
+!29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+!30 = !DILocation(line: 11, column: 18, scope: !14)
+!31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+!32 = !DILocation(line: 11, column: 9, scope: !14)
+!33 = !DILocation(line: 14, column: 26, scope: !14)
+!34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+!35 = !DILocation(line: 14, column: 9, scope: !14)
+!36 = !DILocation(line: 17, column: 14, scope: !14)
+!37 = !DILocation(line: 19, column: 1, scope: !14)
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
new file mode 100644
index 0000000000000..4b5fea863289b
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwwmregs-dbg-noreg.mir
@@ -0,0 +1,210 @@
+# RUN: llc %s -o - -mcpu=gfx1030 -O0 -run-pass=si-pre-allocate-wwm-regs | FileCheck %s
+
+# Simple regression test to make sure DBG_VALUE $noreg does not assert in the pass
+
+# CHECK: S_ENDPGM
+
+--- |
+  source_filename = "module"
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+  target triple = "amdgcn-amd-amdpal"
+
+  %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
+
+  define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg noundef %globalTable, i32 inreg noundef %userdata4, <3 x i32> inreg noundef %WorkgroupId, i32 inreg noundef %MultiDispatchInfo, <3 x i32> noundef %LocalInvocationId) #0 !dbg !14 {
+    %LocalInvocationId.i0 = extractelement <3 x i32> %LocalInvocationId, i64 0, !dbg !28
+    %WorkgroupId.i0 = extractelement <3 x i32> %WorkgroupId, i64 0, !dbg !28
+    %1 = call i64 @llvm.amdgcn.s.getpc(), !dbg !28
+    %2 = shl i32 %WorkgroupId.i0, 6, !dbg !28
+    %3 = add i32 %LocalInvocationId.i0, %2, !dbg !28
+      #dbg_value(i32 %3, !29, !DIExpression(DW_OP_LLVM_fragment, 0, 32), !28)
+    %4 = and i64 %1, -4294967296, !dbg !30
+    %5 = zext i32 %userdata4 to i64, !dbg !30
+    %6 = or disjoint i64 %4, %5, !dbg !30
+    %7 = inttoptr i64 %6 to ptr addrspace(4), !dbg !30, !amdgpu.uniform !2
+    %8 = load <4 x i32>, ptr addrspace(4) %7, align 4, !dbg !30, !invariant.load !2
+    %9 = call float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32> %8, i32 %3, i32 0, i32 0, i32 0), !dbg !30
+      #dbg_value(%dx.types.ResRet.f32 poison, !31, !DIExpression(), !32)
+    %10 = fmul reassoc arcp contract afn float %9, 2.000000e+00, !dbg !33
+      #dbg_value(float %10, !34, !DIExpression(), !35)
+    %11 = getelementptr i8, ptr addrspace(4) %7, i64 32, !dbg !36, !amdgpu.uniform !2
+    %.upto01 = insertelement <4 x float> poison, float %10, i64 0, !dbg !36
+    %12 = shufflevector <4 x float> %.upto01, <4 x float> poison, <4 x i32> zeroinitializer, !dbg !36
+    %13 = load <4 x i32>, ptr addrspace(4) %11, align 4, !dbg !36, !invariant.load !2
+    call void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float> %12, <4 x i32> %13, i32 %3, i32 0, i32 0, i32 0), !dbg !36
+    ret void, !dbg !37
+  }
+
+  declare noundef i64 @llvm.amdgcn.s.getpc() #1
+  declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg) #3
+  declare float @llvm.amdgcn.struct.buffer.load.format.f32(<4 x i32>, i32, i32, i32, i32 immarg) #4
+
+  attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
+  attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
+  attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
+  attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
+  attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }
+
+  !llvm.dbg.cu = !{!0}
+  !llvm.module.flags = !{!12, !13}
+
+  !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "dxcoob 1.7.2308.16 (52da17e29)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
+  !1 = !DIFile(filename: "tests\\basic_var.hlsl", directory: "")
+  !2 = !{}
+  !3 = !{!4, !10}
+  !4 = distinct !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
+  !5 = !DIGlobalVariable(name: "u0", linkageName: "\01?u0@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 2, type: !6, isLocal: false, isDefinition: true)
+  !6 = !DICompositeType(tag: DW_TAG_class_type, name: "RWBuffer<float>", file: !1, line: 2, size: 32, align: 32, elements: !2, templateParams: !7)
+  !7 = !{!8}
+  !8 = !DITemplateTypeParameter(name: "element", type: !9)
+  !9 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
+  !10 = distinct !DIGlobalVariableExpression(var: !11, expr: !DIExpression())
+  !11 = !DIGlobalVariable(name: "u1", linkageName: "\01?u1@@3V?$RWBuffer@M@@A", scope: !0, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true)
+  !12 = !{i32 2, !"Dwarf Version", i32 5}
+  !13 = !{i32 2, !"Debug Info Version", i32 3}
+  !14 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !15, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+  !15 = !DISubroutineType(types: !16)
+  !16 = !{null, !17}
+  !17 = !DIDerivedType(tag: DW_TAG_typedef, name: "uint3", file: !1, baseType: !18)
+  !18 = !DICompositeType(tag: DW_TAG_class_type, name: "vector<unsigned int, 3>", file: !1, size: 96, align: 32, elements: !19, templateParams: !24)
+  !19 = !{!20, !22, !23}
+  !20 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !18, file: !1, baseType: !21, size: 32, align: 32, flags: DIFlagPublic)
+  !21 = !DIBasicType(name: "unsigned int", size: 32, align: 32, encoding: DW_ATE_unsigned)
+  !22 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 32, flags: DIFlagPublic)
+  !23 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !18, file: !1, baseType: !21, size: 32, align: 32, offset: 64, flags: DIFlagPublic)
+  !24 = !{!25, !26}
+  !25 = !DITemplateTypeParameter(name: "element", type: !21)
+  !26 = !DITemplateValueParameter(name: "element_count", type: !27, value: i32 3)
+  !27 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+  !28 = !DILocation(line: 7, column: 17, scope: !14)
+  !29 = !DILocalVariable(name: "dtid", arg: 1, scope: !14, file: !1, line: 7, type: !17)
+  !30 = !DILocation(line: 11, column: 18, scope: !14)
+  !31 = !DILocalVariable(name: "my_var", scope: !14, file: !1, line: 11, type: !9)
+  !32 = !DILocation(line: 11, column: 9, scope: !14)
+  !33 = !DILocation(line: 14, column: 26, scope: !14)
+  !34 = !DILocalVariable(name: "my_var2", scope: !14, file: !1, line: 14, type: !9)
+  !35 = !DILocation(line: 14, column: 9, scope: !14)
+  !36 = !DILocation(line: 17, column: 14, scope: !14)
+  !37 = !DILocation(line: 19, column: 1, scope: !14)
+...
+---
+name:            _amdgpu_cs_main
+alignment:       1
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+noPhis:          true
+isSSA:           false
+noVRegs:         false
+hasFakeUses:     false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHContTarget: false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  explicitKernArgSize: 0
+  maxKernArgAlign: 4
+  ldsSize:         0
+  gdsSize: ...
[truncated]

@adam-yang
Copy link
Contributor Author

Thank you for the patch!

I have some small nits, and one larger concern around changing AMDPAL relocation behavior. If I understand the intent of the change, I think that part of the change should be removed and instead support for the REL relocations added in LogicalView or whatever library it depends on

Thanks for the review @slinder1! This change was the precursor for #149429, which is intended to be a replacement for this change. Did you have a chance to take a look at that? I'm open to get this change in first, and reduce #149429 to a smaller PR if that makes things easier.

@@ -130,6 +130,11 @@ void SIPreAllocateWWMRegs::rewriteRegs(MachineFunction &MF) {
if (VirtReg.isPhysical())
continue;

if (!VirtReg.isValid()) {
assert(MI.isDebugInstr() && "non-debug use of noreg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert probably isn't true in general, but would belong in the machine verifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there AMDGPU instructions that take noreg? I could see other targets failing verifier if the assert is run on all targets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic instruction easily could. I think we use this as a placeholder in a few places. In any case you should just drop the assert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I thought noreg counted as a physical register, so won't the continue above catch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstPhysicalRegister is defined as 1, so NoReg is excluded. In any case, the assert is removed.

Comment on lines 42 to 46
attributes #0 = { memory(readwrite) "amdgpu-flat-work-group-size"="64,64" "amdgpu-memory-bound"="false" "amdgpu-num-sgpr"="4294967295" "amdgpu-num-vgpr"="4294967295" "amdgpu-prealloc-sgpr-spill-vgprs" "amdgpu-unroll-threshold"="1200" "amdgpu-wave-limiter"="false" "amdgpu-work-group-info-arg-no"="3" "denormal-fp-math"="ieee" "denormal-fp-math-f32"="preserve-sign" "target-cpu"="gfx1030" "target-features"=",+wavefrontsize64,+cumode,+enable-flat-scratch" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) "target-cpu"="gfx1030" }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) "target-cpu"="gfx1030" }
attributes #4 = { nocallback nofree nosync nounwind willreturn memory(read) "target-cpu"="gfx1030" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary attributes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants