[Attributor] Use `getAssumedAddrSpace` to get address space for `AllocaInst` #136865

shiltian · 2025-04-23T13:40:25Z

No description provided.

shiltian · 2025-04-23T13:40:59Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-04-23T13:41:07Z

@llvm/pr-subscribers-llvm-transforms

Author: Shilei Tian (shiltian)

Changes

Patch is 251.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136865.diff

5 Files Affected:

(modified) llvm/lib/Transforms/IPO/AttributorAttributes.cpp (+7)
(modified) llvm/test/Transforms/OpenMP/custom_state_machines.ll (+295-95)
(modified) llvm/test/Transforms/OpenMP/custom_state_machines_pre_lto.ll (+387-87)
(modified) llvm/test/Transforms/OpenMP/spmdization.ll (+144-60)
(modified) llvm/test/Transforms/OpenMP/spmdization_indirect.ll (+60-44)

diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index ac56df3823e20..fce482dd9d5ba 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12603,6 +12603,13 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
     auto CheckAddressSpace = [&](Value &Obj) {
       if (isa<UndefValue>(&Obj))
         return true;
+      // Some targets relax the requirement for alloca to be in an exact address
+      // space, allowing it in certain other address spaces instead. These
+      // targets later lower alloca to the correct address space in the
+      // pipeline. Therefore, we need to query the data layout to determine the
+      // appropriate address space.
+      if (isa<AllocaInst>(&Obj))
+        return takeAddressSpace(A.getDataLayout().getAllocaAddrSpace());
       // If an argument in flat address space only has addrspace cast uses, and
       // those casts are same, then we take the dst addrspace.
       if (auto *Arg = dyn_cast<Argument>(&Obj)) {
diff --git a/llvm/test/Transforms/OpenMP/custom_state_machines.ll b/llvm/test/Transforms/OpenMP/custom_state_machines.ll
index 10e521bbfcc10..ef9004b081984 100644
--- a/llvm/test/Transforms/OpenMP/custom_state_machines.ll
+++ b/llvm/test/Transforms/OpenMP/custom_state_machines.ll
@@ -906,11 +906,13 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:  entry:
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT:    [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
 ; AMDGPU:       user_code.entry:
-; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3:[0-9]+]]
+; AMDGPU-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3:[0-9]+]]
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__omp_outlined__(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    call void @__kmpc_target_deinit()
 ; AMDGPU-NEXT:    ret void
@@ -929,6 +931,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:  entry:
 ; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @no_parallel_region_in_here.internalized() #[[ATTR9:[0-9]+]]
 ; AMDGPU-NEXT:    call void @unknown_no_openmp() #[[ATTR10:[0-9]+]]
 ; AMDGPU-NEXT:    ret void
@@ -975,17 +979,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT:    [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
 ; AMDGPU:       is_worker_check:
 ; AMDGPU-NEXT:    [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
 ; AMDGPU-NEXT:    [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
 ; AMDGPU-NEXT:    [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[WARP_SIZE]]
-; AMDGPU-NEXT:    [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP0]], [[BLOCK_SIZE]]
+; AMDGPU-NEXT:    [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP1]], [[BLOCK_SIZE]]
 ; AMDGPU-NEXT:    br i1 [[THREAD_IS_MAIN_OR_WORKER]], label [[WORKER_STATE_MACHINE_BEGIN:%.*]], label [[WORKER_STATE_MACHINE_FINISHED:%.*]]
 ; AMDGPU:       worker_state_machine.begin:
-; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN_ADDR_GENERIC:%.*]] = addrspacecast ptr addrspace(5) [[WORKER_WORK_FN_ADDR]] to ptr
 ; AMDGPU-NEXT:    [[WORKER_IS_ACTIVE:%.*]] = call i1 @__kmpc_kernel_parallel(ptr [[WORKER_WORK_FN_ADDR_GENERIC]])
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN:%.*]] = load ptr, ptr [[WORKER_WORK_FN_ADDR_GENERIC]], align 8
@@ -999,12 +1004,12 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[WORKER_CHECK_PARALLEL_REGION:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__2_wrapper.ID
 ; AMDGPU-NEXT:    br i1 [[WORKER_CHECK_PARALLEL_REGION]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK1:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.execute:
-; AMDGPU-NEXT:    call void @__omp_outlined__2_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__omp_outlined__2_wrapper(i16 0, i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.check1:
 ; AMDGPU-NEXT:    br i1 true, label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE2:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK3:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.execute2:
-; AMDGPU-NEXT:    call void @__omp_outlined__3_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__omp_outlined__3_wrapper(i16 0, i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
 ; AMDGPU:       worker_state_machine.parallel_region.check3:
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
@@ -1012,13 +1017,14 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    call void @__kmpc_kernel_end_parallel()
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_DONE_BARRIER]]
 ; AMDGPU:       worker_state_machine.done.barrier:
-; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_BEGIN]]
 ; AMDGPU:       thread.user_code.check:
-; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
 ; AMDGPU:       user_code.entry:
-; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__omp_outlined__1(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    call void @__kmpc_target_deinit()
 ; AMDGPU-NEXT:    ret void
@@ -1030,9 +1036,12 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-LABEL: define {{[^@]+}}@__omp_outlined__1
 ; AMDGPU-SAME: (ptr noalias [[DOTGLOBAL_TID_:%.*]], ptr noalias [[DOTBOUND_TID_:%.*]]) #[[ATTR0]] {
 ; AMDGPU-NEXT:  entry:
+; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [0 x ptr], align 8
 ; AMDGPU-NEXT:    [[CAPTURED_VARS_ADDRS1:%.*]] = alloca [0 x ptr], align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @unknown_no_openmp() #[[ATTR10]]
 ; AMDGPU-NEXT:    call void @__kmpc_parallel_51(ptr @[[GLOB1]], i32 undef, i32 1, i32 -1, i32 -1, ptr @__omp_outlined__2, ptr @__omp_outlined__2_wrapper.ID, ptr [[CAPTURED_VARS_ADDRS]], i64 0)
 ; AMDGPU-NEXT:    call void @no_parallel_region_in_here.internalized() #[[ATTR9]]
@@ -1046,6 +1055,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:  entry:
 ; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @p0() #[[ATTR11:[0-9]+]]
 ; AMDGPU-NEXT:    ret void
 ;
@@ -1058,6 +1069,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[DOTADDR1:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
 ; AMDGPU-NEXT:    call void @__omp_outlined__2(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    ret void
@@ -1069,6 +1083,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:  entry:
 ; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @p1() #[[ATTR11]]
 ; AMDGPU-NEXT:    ret void
 ;
@@ -1081,6 +1097,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[DOTADDR1:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
 ; AMDGPU-NEXT:    call void @__omp_outlined__3(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    ret void
@@ -1093,17 +1112,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT:    [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
 ; AMDGPU:       is_worker_check:
 ; AMDGPU-NEXT:    [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
 ; AMDGPU-NEXT:    [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
 ; AMDGPU-NEXT:    [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[WARP_SIZE]]
-; AMDGPU-NEXT:    [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP0]], [[BLOCK_SIZE]]
+; AMDGPU-NEXT:    [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP1]], [[BLOCK_SIZE]]
 ; AMDGPU-NEXT:    br i1 [[THREAD_IS_MAIN_OR_WORKER]], label [[WORKER_STATE_MACHINE_BEGIN:%.*]], label [[WORKER_STATE_MACHINE_FINISHED:%.*]]
 ; AMDGPU:       worker_state_machine.begin:
-; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN_ADDR_GENERIC:%.*]] = addrspacecast ptr addrspace(5) [[WORKER_WORK_FN_ADDR]] to ptr
 ; AMDGPU-NEXT:    [[WORKER_IS_ACTIVE:%.*]] = call i1 @__kmpc_kernel_parallel(ptr [[WORKER_WORK_FN_ADDR_GENERIC]])
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN:%.*]] = load ptr, ptr [[WORKER_WORK_FN_ADDR_GENERIC]], align 8
@@ -1117,18 +1137,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[WORKER_CHECK_PARALLEL_REGION:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__17_wrapper
 ; AMDGPU-NEXT:    br i1 [[WORKER_CHECK_PARALLEL_REGION]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK1:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.execute:
-; AMDGPU-NEXT:    call void @__omp_outlined__17_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__omp_outlined__17_wrapper(i16 0, i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.check1:
 ; AMDGPU-NEXT:    [[WORKER_CHECK_PARALLEL_REGION4:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__5_wrapper.ID
 ; AMDGPU-NEXT:    br i1 [[WORKER_CHECK_PARALLEL_REGION4]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE2:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK3:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.execute2:
-; AMDGPU-NEXT:    call void @__omp_outlined__5_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__omp_outlined__5_wrapper(i16 0, i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
 ; AMDGPU:       worker_state_machine.parallel_region.check3:
 ; AMDGPU-NEXT:    br i1 true, label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE5:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK6:%.*]]
 ; AMDGPU:       worker_state_machine.parallel_region.execute5:
-; AMDGPU-NEXT:    call void @__omp_outlined__18_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__omp_outlined__18_wrapper(i16 0, i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
 ; AMDGPU:       worker_state_machine.parallel_region.check6:
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
@@ -1136,13 +1156,14 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    call void @__kmpc_kernel_end_parallel()
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_DONE_BARRIER]]
 ; AMDGPU:       worker_state_machine.done.barrier:
-; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT:    call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
 ; AMDGPU-NEXT:    br label [[WORKER_STATE_MACHINE_BEGIN]]
 ; AMDGPU:       thread.user_code.check:
-; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
 ; AMDGPU:       user_code.entry:
-; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__omp_outlined__4(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    call void @__kmpc_target_deinit()
 ; AMDGPU-NEXT:    ret void
@@ -1154,8 +1175,11 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-LABEL: define {{[^@]+}}@__omp_outlined__4
 ; AMDGPU-SAME: (ptr noalias [[DOTGLOBAL_TID_:%.*]], ptr noalias [[DOTBOUND_TID_:%.*]]) #[[ATTR0]] {
 ; AMDGPU-NEXT:  entry:
+; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [0 x ptr], align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @unknown_no_openmp() #[[ATTR10]]
 ; AMDGPU-NEXT:    call void @simple_state_machine_interprocedural_before.internalized() #[[ATTR9]]
 ; AMDGPU-NEXT:    call void @no_parallel_region_in_here.internalized() #[[ATTR9]]
@@ -1190,6 +1214,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:  entry:
 ; AMDGPU-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
 ; AMDGPU-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @p1() #[[ATTR11]]
 ; AMDGPU-NEXT:    ret void
 ;
@@ -1202,6 +1228,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[DOTADDR1:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT:    [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
 ; AMDGPU-NEXT:    call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
 ; AMDGPU-NEXT:    call void @__omp_outlined__5(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
 ; AMDGPU-NEXT:    ret void
@@ -1234,17 +1263,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
 ; AMDGPU-NEXT:    [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
 ; AMDGPU-NEXT:    [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
 ; AMDGPU-NEXT:    [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT:    [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT:    [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT:    [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
 ; AMDGPU-NEXT:    br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
 ; AMDGPU:       is_worker_check:
 ; AMDGPU-NEXT:    [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
 ; AMDGPU-NEXT:    [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
 ; AMDGPU-NEXT:    [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[...
[truncated]

arsenm

The address space should just come directly from the alloca. You don't know if it's correct to just replace the addrspace with whatever the datalayout says is the alloca addrspace. The datalayout value is for new allocas where the code has no additional context

shiltian · 2025-04-23T14:10:58Z

The address space should just come directly from the alloca. You don't know if it's correct to just replace the addrspace with whatever the datalayout says is the alloca addrspace. The datalayout value is for new allocas where the code has no additional context

If the data layout doesn't match the target, the module is already broken to begin with, and any optimization that relies on data layout information can't be expected to work correctly. If that's the case, what's the point of having a data layout at all? Why don't just pull every piece of information from the backend anyway?

arsenm · 2025-04-23T14:15:56Z

If the data layout doesn't match the target, the module is already broken to begin with, and any optimization that relies on data layout information can't be expected to work correctly.

This is not how the semantics of the datalayout works, particularly with the alloca addrspace. The datalayout is not asserting allocas must use this address space. It tells you nothing other than if you are creating a temporary variable, generic code should use this address space.

If that's the case, what's the point of having a data layout at all? Why don't just pull every piece of information from the backend anyway?

Part of the point of the datalayout is to have semantics independent of the backend. The IR has standalone semantics.

shiltian · 2025-04-23T14:27:05Z

The LLVM Lang Ref says:

The function of the data layout string may not be what you expect. Notably, this is not a specification from the frontend of what alignment the code generator should use.

Instead, if specified, the target data layout is required to match what the ultimate code generator expects. This string is used by the mid-level optimizers to improve code, and this only works if it matches what the ultimate code generator uses.

My reading is, it has to match the ultimate code generator. Middle end optimization relies on it to improve the code.

arsenm · 2025-04-23T14:31:18Z

My reading is, it has to match the ultimate code generator. Middle end optimization relies on it to improve the code.

The definition of "match" leaves room for interpretation, and it would be a better system if we allowed more dynamic configuration for some fields. However this conversation is off topic. This is not about whether the datalayout matches the target or not, but the interpretation of the datalayout. The A field does not assert anything about the content of the module. It does not assert that any alloca with a non-A valued alloca can be replaced with an A address space alloca. An alloca that does not match this address space is not invalid, and you cannot say anything about it

shiltian · 2025-04-23T14:56:04Z

The A field does not assert anything about the content of the module. It does not assert that any alloca with a non-A valued alloca can be replaced with an A address space alloca. An alloca that does not match this address space is not invalid, and you cannot say anything about it

If I understand correctly, you're suggesting that there's no reliable way for the middle end to determine which address space an alloca will ultimately end up in, aside from cases where it's already in, unless it pulls that information directly from the backend, like what InferAddressSpacePass does. The data layout itself doesn't assert anything: first, it doesn't necessarily match the final code generator; and second, even if it does, the A field in the data layout doesn't necessarily guarantee or assert it.

@nikic @efriedma-quic what do you think?

nikic · 2025-04-23T15:06:00Z

@shiltian I'm not entirely sure what you're asking here. As @arsenm said, the alloca address space in the data layout is merely a hint on the address space to use when materializing allocas "out of thin air". There are targets that use multiple alloca address spaces, this just specifies a default one.

shiltian · 2025-04-23T16:24:43Z

I've updated the PR to use getAssumedAddrSpace, which is same as what InferAddressSpacePass does. @arsenm @nikic

nikic · 2025-04-23T16:31:39Z

I think I'm missing some pretty big context on this whole patch stack. Why are you even trying to do #136584? This looks like an attempt to fix up broken IR producers, but I guess that's not it?

shiltian · 2025-04-23T16:47:19Z

This looks like an attempt to fix up broken IR producers, but I guess that's not it?

Yeah, I initially thought that was broken IR too. At first, I was in favor of not allowing alloca in AS0 at all and just making it a verifier error, like what was done in c9c1eef. But @arsenm convinced me otherwise in #135820 (comment), where the idea is to relax the restriction and fix things up later in the backend.

What do you think about alloca in AS0 @nikic?

arsenm · 2025-04-23T17:39:06Z

I think I'm missing some pretty big context on this whole patch stack. Why are you even trying to do #136584? This looks like an attempt to fix up broken IR producers, but I guess that's not it?

Depends on your definition of "broken". Ideally we would never see IR using an address space other than 5. In the real world, people emit address space 0 allocas all over the place and then report backend bugs when it fails in codegen. We can interpret it and emit code for it, so at this point I'm surrendering and think we should just tolerate it

arsenm · 2025-04-23T17:40:50Z

llvm/lib/Transforms/IPO/AttributorAttributes.cpp

+        Function *Fn = AI->getFunction();
+        auto *TTI =
+            A.getInfoCache().getAnalysisResultForFunction<TargetIRAnalysis>(
+                *Fn);
+        return takeAddressSpace(TTI->getAssumedAddrSpace(AI));


Shouldn't need to do this dance to get the function contextual TTI? getAssumedAddrSpace is a module level TargetMachine hook?

This file is in middle end, so we can't access target machine, same as InferAddressSpacePass.

shiltian · 2025-04-23T18:17:03Z

In the real world, people emit address space 0 allocas all over the place and then report backend bugs when it fails in codegen

Technically we can avoid that by just hard error in verifier with a clear error message. At least in this way the fuzzer is not gonna generate code like that.

nikic · 2025-04-23T18:40:26Z

I think I'm missing some pretty big context on this whole patch stack. Why are you even trying to do #136584? This looks like an attempt to fix up broken IR producers, but I guess that's not it?

Depends on your definition of "broken". Ideally we would never see IR using an address space other than 5. In the real world, people emit address space 0 allocas all over the place and then report backend bugs when it fails in codegen. We can interpret it and emit code for it, so at this point I'm surrendering and think we should just tolerate it

I don't really get what you get out of that though? Wouldn't it be better all around to emit a backend error (as in, without crash dialog, making it clear it is user error) than to introduce additional code to deal with it, especially if that code has to leak into other components as well?

Taking IR for one target and then running the backend for a different target on it just isn't something that's going to work. I don't think we should expend effort trying to make that work -- beyond generating some better errors, possibly.

…aInst`

jdoerfert · 2025-04-25T03:13:55Z

FWIW, nvptx backend unfortunately works by "fixing stuff up" late. It shouldn't, but it does. I'd prefer to not to fix stuff up at all and maybe the best way is to have proper assertions in the creation of allocas/globals/... and/or the verifier.

llvmbot added llvm:transforms clang:openmp OpenMP related changes to Clang labels Apr 23, 2025

shiltian requested review from arsenm and jdoerfert April 23, 2025 13:40

This was referenced Apr 23, 2025

[AMDGPU] Support alloca in AS0 #136584

Closed

[AMDGPU] Make AllocaInst return AS5 in getAssumedAddrSpace #136798

Closed

arsenm requested changes Apr 23, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/use-get-assumed-as-in-aa branch from 1f14ec2 to 0063330 Compare April 23, 2025 16:23

shiltian changed the title ~~[Attributor] Use getAllocaAddrSpace to get address space for AllocaInst~~ [Attributor] Use getAssumedAddrSpace to get address space for AllocaInst Apr 23, 2025

arsenm reviewed Apr 23, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/use-get-assumed-as-in-aa branch 2 times, most recently from c0e6a62 to b467610 Compare April 23, 2025 20:48

shiltian force-pushed the users/shiltian/alloca-as0-inferas branch from 8b975d2 to 9d2612c Compare April 23, 2025 20:48

[Attributor] Use getAllocaAddrSpace to get address space for `Alloc…

e1b0dc5

…aInst`

shiltian force-pushed the users/shiltian/alloca-as0-inferas branch from 9d2612c to 2d75ec2 Compare April 23, 2025 20:51

shiltian force-pushed the users/shiltian/use-get-assumed-as-in-aa branch from b467610 to e1b0dc5 Compare April 23, 2025 20:51

shiltian mentioned this pull request Apr 25, 2025

[AMDGPU][Verifier] Check address space of alloca instruction #135820

Merged

shiltian closed this Apr 25, 2025

shiltian deleted the users/shiltian/use-get-assumed-as-in-aa branch April 25, 2025 14:33

[Attributor] Use getAssumedAddrSpace to get address space for AllocaInst #136865

[Attributor] Use getAssumedAddrSpace to get address space for AllocaInst #136865

Uh oh!

Conversation

shiltian commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

llvmbot commented Apr 23, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

nikic commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

nikic commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

arsenm Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Apr 23, 2025

Uh oh!

jdoerfert commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Attributor] Use `getAssumedAddrSpace` to get address space for `AllocaInst` #136865

[Attributor] Use `getAssumedAddrSpace` to get address space for `AllocaInst` #136865

shiltian commented Apr 23, 2025 •

edited

Loading

shiltian commented Apr 23, 2025 •

edited

Loading