Skip to content

Conversation

@rampitec
Copy link
Collaborator

Now it is placed after the kernel descriptor, even the section
is .rodata, which is wrong. This allows proper code size calculation
in MC.

@rampitec rampitec requested a review from arsenm February 17, 2025 23:01
Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@rampitec rampitec marked this pull request as ready for review February 17, 2025 23:02
@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

Now it is placed after the kernel descriptor, even the section
is .rodata, which is wrong. This allows proper code size calculation
in MC.


Full diff: https://github.com/llvm/llvm-project/pull/127549.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+23-3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/hsa.ll (+6-3)
  • (modified) llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll (+12)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 031d8f0560ff2..950e9d125763e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -33,6 +33,7 @@
 #include "Utils/SIDefinesUtils.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/CodeGen/AsmPrinterHandler.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
@@ -90,6 +91,24 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUAsmPrinter() {
                                      createAMDGPUAsmPrinterPass);
 }
 
+namespace {
+class AMDGPUAsmPrinterHandler : public AsmPrinterHandler {
+protected:
+  AMDGPUAsmPrinter *Asm;
+
+public:
+  AMDGPUAsmPrinterHandler(AMDGPUAsmPrinter *A) : Asm(A) {}
+
+  virtual void beginFunction(const MachineFunction *MF) override {}
+
+  virtual void endFunction(const MachineFunction *MF) override {
+    Asm->endFunction(MF);
+  }
+
+  virtual void endModule() override {}
+};
+} // End anonymous namespace
+
 AMDGPUAsmPrinter::AMDGPUAsmPrinter(TargetMachine &TM,
                                    std::unique_ptr<MCStreamer> Streamer)
     : AsmPrinter(TM, std::move(Streamer)) {
@@ -209,13 +228,12 @@ void AMDGPUAsmPrinter::emitFunctionBodyStart() {
     HSAMetadataStream->emitKernel(*MF, CurrentProgramInfo);
 }
 
-void AMDGPUAsmPrinter::emitFunctionBodyEnd() {
+void AMDGPUAsmPrinter::endFunction(const MachineFunction *MF) {
   const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
   if (!MFI.isEntryFunction())
     return;
 
-  if (TM.getTargetTriple().getOS() != Triple::AMDHSA)
-    return;
+  assert(TM.getTargetTriple().getOS() == Triple::AMDHSA);
 
   auto &Streamer = getTargetStreamer()->getStreamer();
   auto &Context = Streamer.getContext();
@@ -351,6 +369,8 @@ bool AMDGPUAsmPrinter::doInitialization(Module &M) {
     default:
       report_fatal_error("Unexpected code object version");
     }
+
+    addAsmPrinterHandler(std::make_unique<AMDGPUAsmPrinterHandler>(this));
   }
 
   return AsmPrinter::doInitialization(M);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index cc8c4411805e2..3a0da0dc33d9d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -120,7 +120,7 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   void emitFunctionBodyStart() override;
 
-  void emitFunctionBodyEnd() override;
+  void endFunction(const MachineFunction *MF);
 
   void emitImplicitDef(const MachineInstr *MI) const override;
 
diff --git a/llvm/test/CodeGen/AMDGPU/hsa.ll b/llvm/test/CodeGen/AMDGPU/hsa.ll
index 37476203fbfad..6e5f16feb0773 100644
--- a/llvm/test/CodeGen/AMDGPU/hsa.ll
+++ b/llvm/test/CodeGen/AMDGPU/hsa.ll
@@ -96,6 +96,12 @@
 ; PRE-GFX10: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}
 ; GFX10: global_store_{{dword|b32}} v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, off
 
+; HSA: s_endpgm
+; HSA-NEXT: .Lfunc_end0:
+; HSA-NEXT: .size   simple, .Lfunc_end0-simple
+
+; HSA: .section .rodata,"a",@progbits
+
 ; HSA: .amdhsa_user_sgpr_private_segment_buffer 1
 ; HSA: .amdhsa_user_sgpr_kernarg_segment_ptr 1
 
@@ -103,9 +109,6 @@
 ; GFX10-W32: .amdhsa_wavefront_size32 1
 ; GFX10-W64: .amdhsa_wavefront_size32 0
 
-; HSA: .Lfunc_end0:
-; HSA: .size   simple, .Lfunc_end0-simple
-
 define amdgpu_kernel void @simple(ptr addrspace(1) %out) #0 {
 entry:
   store i32 0, ptr addrspace(1) %out
diff --git a/llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll b/llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll
index 6ddf0986755f9..2d34169f9f34c 100644
--- a/llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll
+++ b/llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll
@@ -15,6 +15,8 @@ define amdgpu_kernel void @max_alignment_128() #0 {
 ; VI-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:128
 ; VI-NEXT:    s_waitcnt vmcnt(0)
 ; VI-NEXT:    s_endpgm
+; VI-NEXT:    .Lfunc_end0:
+; VI-NEXT:    .size max_alignment_128, .Lfunc_end0-max_alignment_128
 ; VI-NEXT:    .section .rodata,"a"
 ; VI-NEXT:    .p2align 6
 ; VI-NEXT:    .amdhsa_kernel max_alignment_128
@@ -66,6 +68,8 @@ define amdgpu_kernel void @max_alignment_128() #0 {
 ; GFX9-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:128
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_endpgm
+; GFX9-NEXT:    .Lfunc_end0:
+; GFX9-NEXT:    .size max_alignment_128, .Lfunc_end0-max_alignment_128
 ; GFX9-NEXT:    .section .rodata,"a"
 ; GFX9-NEXT:    .p2align 6
 ; GFX9-NEXT:    .amdhsa_kernel max_alignment_128
@@ -126,6 +130,8 @@ define amdgpu_kernel void @stackrealign_attr() #1 {
 ; VI-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:4
 ; VI-NEXT:    s_waitcnt vmcnt(0)
 ; VI-NEXT:    s_endpgm
+; VI-NEXT:    .Lfunc_end1:
+; VI-NEXT:    .size stackrealign_attr, .Lfunc_end1-stackrealign_attr
 ; VI-NEXT:    .section .rodata,"a"
 ; VI-NEXT:    .p2align 6
 ; VI-NEXT:    .amdhsa_kernel stackrealign_attr
@@ -177,6 +183,8 @@ define amdgpu_kernel void @stackrealign_attr() #1 {
 ; GFX9-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:4
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_endpgm
+; GFX9-NEXT:    .Lfunc_end1:
+; GFX9-NEXT:    .size stackrealign_attr, .Lfunc_end1-stackrealign_attr
 ; GFX9-NEXT:    .section .rodata,"a"
 ; GFX9-NEXT:    .p2align 6
 ; GFX9-NEXT:    .amdhsa_kernel stackrealign_attr
@@ -237,6 +245,8 @@ define amdgpu_kernel void @alignstack_attr() #2 {
 ; VI-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:4
 ; VI-NEXT:    s_waitcnt vmcnt(0)
 ; VI-NEXT:    s_endpgm
+; VI-NEXT:    .Lfunc_end2:
+; VI-NEXT:    .size alignstack_attr, .Lfunc_end2-alignstack_attr
 ; VI-NEXT:    .section .rodata,"a"
 ; VI-NEXT:    .p2align 6
 ; VI-NEXT:    .amdhsa_kernel alignstack_attr
@@ -288,6 +298,8 @@ define amdgpu_kernel void @alignstack_attr() #2 {
 ; GFX9-NEXT:    buffer_store_dword v0, off, s[0:3], 0 offset:4
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_endpgm
+; GFX9-NEXT:    .Lfunc_end2:
+; GFX9-NEXT:    .size alignstack_attr, .Lfunc_end2-alignstack_attr
 ; GFX9-NEXT:    .section .rodata,"a"
 ; GFX9-NEXT:    .p2align 6
 ; GFX9-NEXT:    .amdhsa_kernel alignstack_attr

@rampitec rampitec requested a review from kzhuravl February 17, 2025 23:42
}

namespace {
class AMDGPUAsmPrinterHandler : public AsmPrinterHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need this fancy handler thing, I don't see other targets using it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other targets do not print kernel descriptor and do not switch sections inside emitFunctionBodyEnd too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., the best thing I could do is to create a custom target hook to emit the descriptor. The hook was already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already emitFunctionDescriptor, except that's artificially depends on AIX. It is also before the function. I guess it doesn't matter if we print these before or after?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, do you really expect me to consider AIX here? The whole this thing is enclosed inside the AMDGPU target. I see this AIX comment irrelevant. Just because you see similar words, it does not mean it is relevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am giving you a clean way to print this size. What are you trying to achieve?

@arsenm arsenm requested a review from jhuber6 February 18, 2025 02:08
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic printer also has EmitFunctionSize, should we just be using that instead?

Now it is placed after the kernel descriptor, even the section
is .rodata, which is wrong. This allows proper code size calculation
in MC.
@rampitec rampitec force-pushed the users/rampitec/02-17-_amdgpu_fix_.lfunc_end_label_placement branch from cf668cc to c77efd9 Compare February 18, 2025 07:50
@rampitec
Copy link
Collaborator Author

EmitFunctionSize

EmitFunctionSize is a local bool variable inside AsmPrinter::emitFunctionBody() and it is set to true. This is how this label is emitted.

@rampitec
Copy link
Collaborator Author

I wish to even understand how is that possible to get a sub of an address in .rodata and .code. These may be parsecs apart, and we only would know after it is mapped to memory after it is actually loaded.

@rampitec
Copy link
Collaborator Author

That said, I am not sure, but I'd expect this .size to constant fold. When I moved it in the same section it must constant evaluate. So all the other uses of it must eventually properly accept MCExpr and eventually constant evaluate too, including CodeSize comment too.

@rampitec
Copy link
Collaborator Author

That said, I am not sure, but I'd expect this .size to constant fold. When I moved it in the same section it must constant evaluate. So all the other uses of it must eventually properly accept MCExpr and eventually constant evaluate too, including CodeSize comment too.

Heh. Unfortunately not. As a relocatable this expression is deffered until linking. Which is pretty dumb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants