-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Attributor] Use getAssumedAddrSpace to get address space for AllocaInst
#136865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Attributor] Use getAssumedAddrSpace to get address space for AllocaInst
#136865
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-llvm-transforms Author: Shilei Tian (shiltian) ChangesPatch is 251.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136865.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index ac56df3823e20..fce482dd9d5ba 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -12603,6 +12603,13 @@ struct AAAddressSpaceImpl : public AAAddressSpace {
auto CheckAddressSpace = [&](Value &Obj) {
if (isa<UndefValue>(&Obj))
return true;
+ // Some targets relax the requirement for alloca to be in an exact address
+ // space, allowing it in certain other address spaces instead. These
+ // targets later lower alloca to the correct address space in the
+ // pipeline. Therefore, we need to query the data layout to determine the
+ // appropriate address space.
+ if (isa<AllocaInst>(&Obj))
+ return takeAddressSpace(A.getDataLayout().getAllocaAddrSpace());
// If an argument in flat address space only has addrspace cast uses, and
// those casts are same, then we take the dst addrspace.
if (auto *Arg = dyn_cast<Argument>(&Obj)) {
diff --git a/llvm/test/Transforms/OpenMP/custom_state_machines.ll b/llvm/test/Transforms/OpenMP/custom_state_machines.ll
index 10e521bbfcc10..ef9004b081984 100644
--- a/llvm/test/Transforms/OpenMP/custom_state_machines.ll
+++ b/llvm/test/Transforms/OpenMP/custom_state_machines.ll
@@ -906,11 +906,13 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: entry:
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT: [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
; AMDGPU: user_code.entry:
-; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3:[0-9]+]]
+; AMDGPU-NEXT: [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3:[0-9]+]]
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__omp_outlined__(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: call void @__kmpc_target_deinit()
; AMDGPU-NEXT: ret void
@@ -929,6 +931,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: entry:
; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @no_parallel_region_in_here.internalized() #[[ATTR9:[0-9]+]]
; AMDGPU-NEXT: call void @unknown_no_openmp() #[[ATTR10:[0-9]+]]
; AMDGPU-NEXT: ret void
@@ -975,17 +979,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT: [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
; AMDGPU: is_worker_check:
; AMDGPU-NEXT: [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
; AMDGPU-NEXT: [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
; AMDGPU-NEXT: [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[WARP_SIZE]]
-; AMDGPU-NEXT: [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP0]], [[BLOCK_SIZE]]
+; AMDGPU-NEXT: [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP1]], [[BLOCK_SIZE]]
; AMDGPU-NEXT: br i1 [[THREAD_IS_MAIN_OR_WORKER]], label [[WORKER_STATE_MACHINE_BEGIN:%.*]], label [[WORKER_STATE_MACHINE_FINISHED:%.*]]
; AMDGPU: worker_state_machine.begin:
-; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
; AMDGPU-NEXT: [[WORKER_WORK_FN_ADDR_GENERIC:%.*]] = addrspacecast ptr addrspace(5) [[WORKER_WORK_FN_ADDR]] to ptr
; AMDGPU-NEXT: [[WORKER_IS_ACTIVE:%.*]] = call i1 @__kmpc_kernel_parallel(ptr [[WORKER_WORK_FN_ADDR_GENERIC]])
; AMDGPU-NEXT: [[WORKER_WORK_FN:%.*]] = load ptr, ptr [[WORKER_WORK_FN_ADDR_GENERIC]], align 8
@@ -999,12 +1004,12 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[WORKER_CHECK_PARALLEL_REGION:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__2_wrapper.ID
; AMDGPU-NEXT: br i1 [[WORKER_CHECK_PARALLEL_REGION]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK1:%.*]]
; AMDGPU: worker_state_machine.parallel_region.execute:
-; AMDGPU-NEXT: call void @__omp_outlined__2_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__omp_outlined__2_wrapper(i16 0, i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END:%.*]]
; AMDGPU: worker_state_machine.parallel_region.check1:
; AMDGPU-NEXT: br i1 true, label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE2:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK3:%.*]]
; AMDGPU: worker_state_machine.parallel_region.execute2:
-; AMDGPU-NEXT: call void @__omp_outlined__3_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__omp_outlined__3_wrapper(i16 0, i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
; AMDGPU: worker_state_machine.parallel_region.check3:
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
@@ -1012,13 +1017,14 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: call void @__kmpc_kernel_end_parallel()
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_DONE_BARRIER]]
; AMDGPU: worker_state_machine.done.barrier:
-; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_BEGIN]]
; AMDGPU: thread.user_code.check:
-; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
; AMDGPU: user_code.entry:
-; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT: [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__omp_outlined__1(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: call void @__kmpc_target_deinit()
; AMDGPU-NEXT: ret void
@@ -1030,9 +1036,12 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-LABEL: define {{[^@]+}}@__omp_outlined__1
; AMDGPU-SAME: (ptr noalias [[DOTGLOBAL_TID_:%.*]], ptr noalias [[DOTBOUND_TID_:%.*]]) #[[ATTR0]] {
; AMDGPU-NEXT: entry:
+; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [0 x ptr], align 8
; AMDGPU-NEXT: [[CAPTURED_VARS_ADDRS1:%.*]] = alloca [0 x ptr], align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @unknown_no_openmp() #[[ATTR10]]
; AMDGPU-NEXT: call void @__kmpc_parallel_51(ptr @[[GLOB1]], i32 undef, i32 1, i32 -1, i32 -1, ptr @__omp_outlined__2, ptr @__omp_outlined__2_wrapper.ID, ptr [[CAPTURED_VARS_ADDRS]], i64 0)
; AMDGPU-NEXT: call void @no_parallel_region_in_here.internalized() #[[ATTR9]]
@@ -1046,6 +1055,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: entry:
; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @p0() #[[ATTR11:[0-9]+]]
; AMDGPU-NEXT: ret void
;
@@ -1058,6 +1069,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[DOTADDR1:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
; AMDGPU-NEXT: call void @__omp_outlined__2(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: ret void
@@ -1069,6 +1083,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: entry:
; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @p1() #[[ATTR11]]
; AMDGPU-NEXT: ret void
;
@@ -1081,6 +1097,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[DOTADDR1:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
; AMDGPU-NEXT: call void @__omp_outlined__3(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: ret void
@@ -1093,17 +1112,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT: [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
; AMDGPU: is_worker_check:
; AMDGPU-NEXT: [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
; AMDGPU-NEXT: [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
; AMDGPU-NEXT: [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[WARP_SIZE]]
-; AMDGPU-NEXT: [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP0]], [[BLOCK_SIZE]]
+; AMDGPU-NEXT: [[THREAD_IS_MAIN_OR_WORKER:%.*]] = icmp slt i32 [[TMP1]], [[BLOCK_SIZE]]
; AMDGPU-NEXT: br i1 [[THREAD_IS_MAIN_OR_WORKER]], label [[WORKER_STATE_MACHINE_BEGIN:%.*]], label [[WORKER_STATE_MACHINE_FINISHED:%.*]]
; AMDGPU: worker_state_machine.begin:
-; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
; AMDGPU-NEXT: [[WORKER_WORK_FN_ADDR_GENERIC:%.*]] = addrspacecast ptr addrspace(5) [[WORKER_WORK_FN_ADDR]] to ptr
; AMDGPU-NEXT: [[WORKER_IS_ACTIVE:%.*]] = call i1 @__kmpc_kernel_parallel(ptr [[WORKER_WORK_FN_ADDR_GENERIC]])
; AMDGPU-NEXT: [[WORKER_WORK_FN:%.*]] = load ptr, ptr [[WORKER_WORK_FN_ADDR_GENERIC]], align 8
@@ -1117,18 +1137,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[WORKER_CHECK_PARALLEL_REGION:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__17_wrapper
; AMDGPU-NEXT: br i1 [[WORKER_CHECK_PARALLEL_REGION]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK1:%.*]]
; AMDGPU: worker_state_machine.parallel_region.execute:
-; AMDGPU-NEXT: call void @__omp_outlined__17_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__omp_outlined__17_wrapper(i16 0, i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END:%.*]]
; AMDGPU: worker_state_machine.parallel_region.check1:
; AMDGPU-NEXT: [[WORKER_CHECK_PARALLEL_REGION4:%.*]] = icmp eq ptr [[WORKER_WORK_FN]], @__omp_outlined__5_wrapper.ID
; AMDGPU-NEXT: br i1 [[WORKER_CHECK_PARALLEL_REGION4]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE2:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK3:%.*]]
; AMDGPU: worker_state_machine.parallel_region.execute2:
-; AMDGPU-NEXT: call void @__omp_outlined__5_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__omp_outlined__5_wrapper(i16 0, i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
; AMDGPU: worker_state_machine.parallel_region.check3:
; AMDGPU-NEXT: br i1 true, label [[WORKER_STATE_MACHINE_PARALLEL_REGION_EXECUTE5:%.*]], label [[WORKER_STATE_MACHINE_PARALLEL_REGION_CHECK6:%.*]]
; AMDGPU: worker_state_machine.parallel_region.execute5:
-; AMDGPU-NEXT: call void @__omp_outlined__18_wrapper(i16 0, i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__omp_outlined__18_wrapper(i16 0, i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
; AMDGPU: worker_state_machine.parallel_region.check6:
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_PARALLEL_REGION_END]]
@@ -1136,13 +1156,14 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: call void @__kmpc_kernel_end_parallel()
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_DONE_BARRIER]]
; AMDGPU: worker_state_machine.done.barrier:
-; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP0]])
+; AMDGPU-NEXT: call void @__kmpc_barrier_simple_generic(ptr @[[GLOB1]], i32 [[TMP1]])
; AMDGPU-NEXT: br label [[WORKER_STATE_MACHINE_BEGIN]]
; AMDGPU: thread.user_code.check:
-; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
; AMDGPU: user_code.entry:
-; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT: [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1]]) #[[ATTR3]]
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTTHREADID_TEMP_]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__omp_outlined__4(ptr [[DOTTHREADID_TEMP_]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: call void @__kmpc_target_deinit()
; AMDGPU-NEXT: ret void
@@ -1154,8 +1175,11 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-LABEL: define {{[^@]+}}@__omp_outlined__4
; AMDGPU-SAME: (ptr noalias [[DOTGLOBAL_TID_:%.*]], ptr noalias [[DOTBOUND_TID_:%.*]]) #[[ATTR0]] {
; AMDGPU-NEXT: entry:
+; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [0 x ptr], align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @unknown_no_openmp() #[[ATTR10]]
; AMDGPU-NEXT: call void @simple_state_machine_interprocedural_before.internalized() #[[ATTR9]]
; AMDGPU-NEXT: call void @no_parallel_region_in_here.internalized() #[[ATTR9]]
@@ -1190,6 +1214,8 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: entry:
; AMDGPU-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
; AMDGPU-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTGLOBAL_TID__ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr [[DOTBOUND_TID__ADDR]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @p1() #[[ATTR11]]
; AMDGPU-NEXT: ret void
;
@@ -1202,6 +1228,9 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[DOTADDR1:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[GLOBAL_ARGS:%.*]] = alloca ptr, align 8
+; AMDGPU-NEXT: [[TMP2:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP3:%.*]] = addrspacecast ptr [[DOTADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP4:%.*]] = addrspacecast ptr [[DOTADDR1]] to ptr addrspace(5)
; AMDGPU-NEXT: call void @__kmpc_get_shared_variables(ptr [[GLOBAL_ARGS]])
; AMDGPU-NEXT: call void @__omp_outlined__5(ptr [[DOTADDR1]], ptr [[DOTZERO_ADDR]]) #[[ATTR3]]
; AMDGPU-NEXT: ret void
@@ -1234,17 +1263,18 @@ attributes #9 = { convergent nounwind readonly willreturn }
; AMDGPU-NEXT: [[WORKER_WORK_FN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
; AMDGPU-NEXT: [[DOTZERO_ADDR:%.*]] = alloca i32, align 4
; AMDGPU-NEXT: [[DOTTHREADID_TEMP_:%.*]] = alloca i32, align 4
-; AMDGPU-NEXT: [[TMP0:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment, ptr [[DYN]])
-; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP0]], -1
+; AMDGPU-NEXT: [[TMP0:%.*]] = addrspacecast ptr [[DOTZERO_ADDR]] to ptr addrspace(5)
+; AMDGPU-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr @__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment, ptr [[DYN]])
+; AMDGPU-NEXT: [[THREAD_IS_WORKER:%.*]] = icmp ne i32 [[TMP1]], -1
; AMDGPU-NEXT: br i1 [[THREAD_IS_WORKER]], label [[IS_WORKER_CHECK:%.*]], label [[THREAD_USER_CODE_CHECK:%.*]]
; AMDGPU: is_worker_check:
; AMDGPU-NEXT: [[BLOCK_HW_SIZE:%.*]] = call i32 @__kmpc_get_hardware_num_threads_in_block()
; AMDGPU-NEXT: [[WARP_SIZE:%.*]] = call i32 @__kmpc_get_warp_size()
; AMDGPU-NEXT: [[BLOCK_SIZE:%.*]] = sub i32 [[BLOCK_HW_SIZE]], [[...
[truncated]
|
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address space should just come directly from the alloca. You don't know if it's correct to just replace the addrspace with whatever the datalayout says is the alloca addrspace. The datalayout value is for new allocas where the code has no additional context
If the data layout doesn't match the target, the module is already broken to begin with, and any optimization that relies on data layout information can't be expected to work correctly. If that's the case, what's the point of having a data layout at all? Why don't just pull every piece of information from the backend anyway? |
This is not how the semantics of the datalayout works, particularly with the alloca addrspace. The datalayout is not asserting allocas must use this address space. It tells you nothing other than if you are creating a temporary variable, generic code should use this address space.
Part of the point of the datalayout is to have semantics independent of the backend. The IR has standalone semantics. |
|
The LLVM Lang Ref says:
My reading is, it has to match the ultimate code generator. Middle end optimization relies on it to improve the code. |
The definition of "match" leaves room for interpretation, and it would be a better system if we allowed more dynamic configuration for some fields. However this conversation is off topic. This is not about whether the datalayout matches the target or not, but the interpretation of the datalayout. The A field does not assert anything about the content of the module. It does not assert that any alloca with a non-A valued alloca can be replaced with an A address space alloca. An alloca that does not match this address space is not invalid, and you cannot say anything about it |
If I understand correctly, you're suggesting that there's no reliable way for the middle end to determine which address space an alloca will ultimately end up in, aside from cases where it's already in, unless it pulls that information directly from the backend, like what @nikic @efriedma-quic what do you think? |
|
@shiltian I'm not entirely sure what you're asking here. As @arsenm said, the alloca address space in the data layout is merely a hint on the address space to use when materializing allocas "out of thin air". There are targets that use multiple alloca address spaces, this just specifies a default one. |
1f14ec2 to
0063330
Compare
getAllocaAddrSpace to get address space for AllocaInstgetAssumedAddrSpace to get address space for AllocaInst
|
I think I'm missing some pretty big context on this whole patch stack. Why are you even trying to do #136584? This looks like an attempt to fix up broken IR producers, but I guess that's not it? |
Yeah, I initially thought that was broken IR too. At first, I was in favor of not allowing alloca in AS0 at all and just making it a verifier error, like what was done in c9c1eef. But @arsenm convinced me otherwise in #135820 (comment), where the idea is to relax the restriction and fix things up later in the backend. What do you think about alloca in AS0 @nikic? |
Depends on your definition of "broken". Ideally we would never see IR using an address space other than 5. In the real world, people emit address space 0 allocas all over the place and then report backend bugs when it fails in codegen. We can interpret it and emit code for it, so at this point I'm surrendering and think we should just tolerate it |
| Function *Fn = AI->getFunction(); | ||
| auto *TTI = | ||
| A.getInfoCache().getAnalysisResultForFunction<TargetIRAnalysis>( | ||
| *Fn); | ||
| return takeAddressSpace(TTI->getAssumedAddrSpace(AI)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to do this dance to get the function contextual TTI? getAssumedAddrSpace is a module level TargetMachine hook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is in middle end, so we can't access target machine, same as InferAddressSpacePass.
Technically we can avoid that by just hard error in verifier with a clear error message. At least in this way the fuzzer is not gonna generate code like that. |
I don't really get what you get out of that though? Wouldn't it be better all around to emit a backend error (as in, without crash dialog, making it clear it is user error) than to introduce additional code to deal with it, especially if that code has to leak into other components as well? Taking IR for one target and then running the backend for a different target on it just isn't something that's going to work. I don't think we should expend effort trying to make that work -- beyond generating some better errors, possibly. |
c0e6a62 to
b467610
Compare
8b975d2 to
9d2612c
Compare
9d2612c to
2d75ec2
Compare
b467610 to
e1b0dc5
Compare
|
FWIW, nvptx backend unfortunately works by "fixing stuff up" late. It shouldn't, but it does. I'd prefer to not to fix stuff up at all and maybe the best way is to have proper assertions in the creation of allocas/globals/... and/or the verifier. |

No description provided.