Skip to content

Conversation

aengelke
Copy link
Contributor

@aengelke aengelke commented Jun 12, 2024

Previously, the MCAsmInfo specified ELF mangling whereas the data layout specified no mangling. Avoid this inconsistency by adjusting the data layout to use ELF mangling.


For context: there's a getPrivateGlobalPrefix in the DataLayout and in the MCAsmInfo. The MAI version is mainly used in MCContext, which has no DataLayout, but determines whether a symbol gets added to the symbol table in the output. I think both versions of the prefix should be identical so that private symbols never end up in the symbol table, but they weren't identical for AMDGPU.

I'm not sure whether this is correct, causes problems down the line, etc., but I stumbled across this inconsistency and just wanted to let you know it exists. :)

Previously, the MCAsmInfo specified ELF mangling whereas the data
layout specified no mangling. Avoid this inconsistency by adjusting
the data layout to use ELF mangling.
@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: None (aengelke)

Changes

Previously, the MCAsmInfo specified ELF mangling whereas the data layout specified no mangling. Avoid this inconsistency by adjusting the data layout to use ELF mangling.


For context: there's a getPrivateGlobalPrefix in the DataLayout and in the MCAsmInfo. The MAI version is mainly used in MCContext, which has no DataLayout, but determines whether a symbol get's added to the symbol table in the output. I think both versions of the prefix should be identical so that private symbols never end up in the symbol table, but they weren't identical for AMDGPU.

I'm not sure whether this is correct, causes problems down the line, etc., but I stumbled across this inconsistency and just wanted to let you know it exists. :)


Full diff: https://github.com/llvm/llvm-project/pull/95219.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/global-constant.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll (+1-1)
  • (modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected (+4-4)
  • (modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index ce997c659094a..d7574e57b3533 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -547,7 +547,7 @@ static MachineSchedRegistry GCNILPSchedRegistry(
 static StringRef computeDataLayout(const Triple &TT) {
   if (TT.getArch() == Triple::r600) {
     // 32-bit pointers.
-    return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
+    return "e-m:e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
            "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";
   }
 
@@ -557,7 +557,7 @@ static StringRef computeDataLayout(const Triple &TT) {
   // (address space 7), and 128-bit non-integral buffer resourcees (address
   // space 8) which cannot be non-trivilally accessed by LLVM memory operations
   // like getelementptr.
-  return "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
+  return "e-m:e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
          "-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-"
          "v32:32-v48:64-v96:"
          "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-"
diff --git a/llvm/test/CodeGen/AMDGPU/global-constant.ll b/llvm/test/CodeGen/AMDGPU/global-constant.ll
index 38b9c5df7faa1..2d5021782c5b8 100644
--- a/llvm/test/CodeGen/AMDGPU/global-constant.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-constant.ll
@@ -12,21 +12,21 @@
 
 ; Non-R600 OSes use relocations.
 ; GCN-DEFAULT: s_getpc_b64 s[[[PC0_LO:[0-9]+]]:[[PC0_HI:[0-9]+]]]
-; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC0_LO]], private1@rel32@lo+4
-; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC0_HI]], private1@rel32@hi+12
+; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC0_LO]], .Lprivate1@rel32@lo+4
+; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC0_HI]], .Lprivate1@rel32@hi+12
 ; GCN-DEFAULT: s_getpc_b64 s[[[PC1_LO:[0-9]+]]:[[PC1_HI:[0-9]+]]]
-; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC1_LO]], private2@rel32@lo+4
-; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC1_HI]], private2@rel32@hi+12
+; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC1_LO]], .Lprivate2@rel32@lo+4
+; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC1_HI]], .Lprivate2@rel32@hi+12
 
 ; MESA uses absolute relocations.
-; GCN-MESA: s_add_u32 s2, private1@abs32@lo, s4
-; GCN-MESA: s_addc_u32 s3, private1@abs32@hi, s5
+; GCN-MESA: s_add_u32 s2, .Lprivate1@abs32@lo, s4
+; GCN-MESA: s_addc_u32 s3, .Lprivate1@abs32@hi, s5
 
 ; PAL uses absolute relocations.
-; GCN-PAL:    s_add_u32 s2, private1@abs32@lo, s4
-; GCN-PAL:    s_addc_u32 s3, private1@abs32@hi, s5
-; GCN-PAL:    s_add_u32 s4, private2@abs32@lo, s4
-; GCN-PAL:    s_addc_u32 s5, private2@abs32@hi, s5
+; GCN-PAL:    s_add_u32 s2, .Lprivate1@abs32@lo, s4
+; GCN-PAL:    s_addc_u32 s3, .Lprivate1@abs32@hi, s5
+; GCN-PAL:    s_add_u32 s4, .Lprivate2@abs32@lo, s4
+; GCN-PAL:    s_addc_u32 s5, .Lprivate2@abs32@hi, s5
 
 ; R600-LABEL: private_test
 define amdgpu_kernel void @private_test(i32 %index, ptr addrspace(1) %out) {
diff --git a/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll b/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
index b8cfcbf2d2665..6d55e79edbef6 100644
--- a/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-variable-relocs.ll
@@ -14,8 +14,8 @@
 
 ; CHECK-LABEL: private_test:
 ; CHECK: s_getpc_b64 s[[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]]
-; CHECK: s_add_u32 s[[ADDR_LO:[0-9]+]], s[[PC_LO]], private@rel32@lo+8
-; CHECK: s_addc_u32 s[[ADDR_HI:[0-9]+]], s[[PC_HI]], private@rel32@hi+16
+; CHECK: s_add_u32 s[[ADDR_LO:[0-9]+]], s[[PC_LO]], .Lprivate@rel32@lo+8
+; CHECK: s_addc_u32 s[[ADDR_HI:[0-9]+]], s[[PC_HI]], .Lprivate@rel32@hi+16
 ; CHECK: s_load_dword s{{[0-9]+}}, s[[[ADDR_LO]]:[[ADDR_HI]]]
 define amdgpu_kernel void @private_test(ptr addrspace(1) %out) {
   %ptr = getelementptr [256 x i32], ptr addrspace(1) @private, i32 0, i32 1
@@ -153,7 +153,7 @@ define amdgpu_kernel void @external_w_init_test(ptr addrspace(1) %out) {
   ret void
 }
 
-; CHECK: .local private
+; CHECK: .local .Lprivate
 ; CHECK: .local internal
 ; CHECK: .weak linkonce
 ; CHECK: .weak weak
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
index 41e8762311306..ac64d19043a3a 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
@@ -315,7 +315,7 @@ define amdgpu_kernel void @test_small_memcpy_i64_global_to_global_align16(ptr ad
 
 ; FUNC-LABEL: {{^}}test_memcpy_const_string_align4:
 ; SI: s_getpc_b64
-; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, hello.align4@rel32@lo+4
+; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, .Lhello.align4@rel32@lo+4
 ; SI: s_addc_u32
 ; SI-DAG: s_load_dwordx8
 ; SI-DAG: s_load_dwordx2
diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
index d1500e002d7e9..6d5ad3704119a 100644
--- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
@@ -65,8 +65,8 @@ define dso_local i32 @main() #0 {
 
 attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
 ; CHECK-LABEL: check_boundaries:
-; CHECK:       check_boundaries$local:
-; CHECK-NEXT:    .type check_boundaries$local,@function
+; CHECK:       .Lcheck_boundaries$local:
+; CHECK-NEXT:    .type .Lcheck_boundaries$local,@function
 ; CHECK-NEXT:    .cfi_startproc
 ; CHECK-NEXT:  ; %bb.0:
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
@@ -107,8 +107,8 @@ attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
 ; CHECK-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; CHECK-LABEL: main:
-; CHECK:       main$local:
-; CHECK-NEXT:    .type main$local,@function
+; CHECK:       .Lmain$local:
+; CHECK-NEXT:    .type .Lmain$local,@function
 ; CHECK-NEXT:    .cfi_startproc
 ; CHECK-NEXT:  ; %bb.0:
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
diff --git a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected
index deadc4adb02c5..0d5afc6c36b60 100644
--- a/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected
@@ -6,8 +6,8 @@
 
 define dso_local i32 @check_boundaries() #0 {
 ; CHECK-LABEL: check_boundaries:
-; CHECK:       check_boundaries$local:
-; CHECK-NEXT:    .type check_boundaries$local,@function
+; CHECK:       .Lcheck_boundaries$local:
+; CHECK-NEXT:    .type .Lcheck_boundaries$local,@function
 ; CHECK-NEXT:    .cfi_startproc
 ; CHECK-NEXT:  ; %bb.0:
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
@@ -84,8 +84,8 @@ define dso_local i32 @check_boundaries() #0 {
 
 define dso_local i32 @main() #0 {
 ; CHECK-LABEL: main:
-; CHECK:       main$local:
-; CHECK-NEXT:    .type main$local,@function
+; CHECK:       .Lmain$local:
+; CHECK-NEXT:    .type .Lmain$local,@function
 ; CHECK-NEXT:    .cfi_startproc
 ; CHECK-NEXT:  ; %bb.0:
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

@arsenm
Copy link
Contributor

arsenm commented Jun 12, 2024

What is the datalayout default, if not elf?

@aengelke
Copy link
Contributor Author

The mangling mode currently is MM_None.

@ssahasra
Copy link
Collaborator

Looks straightforward enough to me, but I don't really have an opinion. @kzhuravl, do we need to check with the loader or the runtime about the default mangling?

Copy link
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I wonder if we want to mass update any tests with a datalayout, or if that isn't worth the churn?

I don't imagine this will have an impact on the loader or anything else downstream, or at least if it does it represents a bug there rather than in the compiler.

@arsenm
Copy link
Contributor

arsenm commented Oct 11, 2025

This should be resolved for the move of the DL specification to TargetParser

arsenm added a commit that referenced this pull request Oct 11, 2025
@aengelke
Copy link
Contributor Author

Closing in favor of #163011. Modifying data layout strings remains a mess.

@aengelke aengelke closed this Oct 11, 2025
arsenm added a commit that referenced this pull request Oct 12, 2025
arsenm added a commit that referenced this pull request Oct 12, 2025
arsenm added a commit that referenced this pull request Oct 12, 2025
arsenm added a commit that referenced this pull request Oct 13, 2025
arsenm added a commit that referenced this pull request Oct 13, 2025
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants