Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions llvm/include/llvm/IR/IntrinsicsDirectX.td
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ def int_dx_handle_fromBinding
[IntrNoMem]>;

def int_dx_typedBufferLoad
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty]>;
: DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
: DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
[llvm_any_ty, llvm_i32_ty]>;
[llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty]>;
: DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty], [IntrWriteMem]>;
Copy link
Contributor

@bogner bogner Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't exactly clang-format clean, but it would be nice to keep this formatting for this part to what clang-format says, ie:

def int_dx_typedBufferLoad
    : DefaultAttrsIntrinsic<[llvm_any_ty], [llvm_any_ty, llvm_i32_ty],
                            [IntrReadMem]>;
def int_dx_typedBufferLoad_checkbit
    : DefaultAttrsIntrinsic<[llvm_any_ty, llvm_i1_ty],
                            [llvm_any_ty, llvm_i32_ty], [IntrReadMem]>;
def int_dx_typedBufferStore
    : DefaultAttrsIntrinsic<[], [llvm_any_ty, llvm_i32_ty, llvm_anyvector_ty],
                            [IntrWriteMem]>;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't know how to apply clang-format-diff to a specific file that's not included by the existing rules


// Cast between target extension handle types and dxil-style opaque handles
def int_dx_cast_handle : Intrinsic<[llvm_any_ty], [llvm_any_ty]>;
Expand Down
20 changes: 20 additions & 0 deletions llvm/lib/Target/DirectX/DXILOpLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,22 @@ class OpLowerer {
CleanupCasts.clear();
}

// Remove the resource global associated with the handleFromBinding call
// instruction and their uses as they aren't needed anymore.
void removeResourceGlobals(CallInst *CI) {
for (User *User : make_early_inc_range(CI->users())) {
if (StoreInst *Store = dyn_cast<StoreInst>(User)) {
Value *V = Store->getOperand(1);
Store->eraseFromParent();
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V))
if (GV->use_empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth leaving a TODO comment around here somewhere that says we should really validate that all of the globals do eventually get removed, since otherwise we'll generate a broken module. Actually implementing that validation can probably be left for later for now, since it would be quite difficult to do locally here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

GV->removeDeadConstantUsers();
GV->eraseFromParent();
}
}
}
}

[[nodiscard]] bool lowerToCreateHandle(Function &F) {
IRBuilder<> &IRB = OpBuilder.getIRB();
Type *Int8Ty = IRB.getInt8Ty();
Expand All @@ -228,6 +244,8 @@ class OpLowerer {

Value *Cast = createTmpHandleCast(*OpCall, CI->getType());

removeResourceGlobals(CI);

CI->replaceAllUsesWith(Cast);
CI->eraseFromParent();
return Error::success();
Expand Down Expand Up @@ -272,6 +290,8 @@ class OpLowerer {

Value *Cast = createTmpHandleCast(*OpAnnotate, CI->getType());

removeResourceGlobals(CI);

CI->replaceAllUsesWith(Cast);
CI->eraseFromParent();

Expand Down
94 changes: 94 additions & 0 deletions llvm/test/CodeGen/DirectX/ResourceGlobalElimination.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
; RUN: opt -S -passes='early-cse<memssa>' %s -o %t
; RUN: FileCheck --check-prefixes=CSE,CHECK %s < %t
; finish compiling to verify that dxil-op-lower removes the globals entirely
; RUN: opt -S -dxil-op-lower %t -o - | FileCheck --check-prefixes=LLC,CHECK %s
; RUN: llc -mtriple=dxil-pc-shadermodel6.0-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both shader model 6.0 and 6.6 used here? Is there some special difference?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the difference is whether the handle is created or created and annotated. We have separate paths for them, so even though they call the same utility function, I thought it better to test both paths.

; RUN: llc -mtriple=dxil-pc-shadermodel6.6-compute --filetype=asm -o - %t | FileCheck --check-prefixes=LLC,CHECK %s

; Ensure that EarlyCSE is able to eliminate unneeded loads of resource globals across typedBufferLoad.
; Also that DXILOpLowering eliminates the globals entirely.

target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default datalayout for the triple. Best to remove it since it doesn't really affect anything

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

target triple = "dxilv1.6-unknown-shadermodel6.6-compute"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this triple get overridden by the -mtriple set above in the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. We should only set the triple one way or the other here, so I would probably drop the truple from the llc command line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be able to run both options above, I've removed this too.


%"class.hlsl::RWBuffer" = type { target("dx.TypedBuffer", <4 x float>, 1, 0, 0) }

; LLC-NOT: @In = global
; LLC-NOT: @Out = global
@In = global %"class.hlsl::RWBuffer" zeroinitializer, align 4
@Out = global %"class.hlsl::RWBuffer" zeroinitializer, align 4

; Function Attrs: convergent noinline norecurse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally drop these function attr comments in tests - they're mostly just noise and can easily get out of sync with the actual attributes if we update those for one reason or another.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, removed.

; CHECK-LABEL define void @main()
define void @main() local_unnamed_addr #0 {
entry:
; LLC: %In_h.i1 = call %dx.types.Handle @dx.op.createHandle
; LLC: %Out_h.i2 = call %dx.types.Handle @dx.op.createHandle
%In_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 0, i32 0, i32 1, i32 0, i1 false)
store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %In_h.i, ptr @In, align 4
%Out_h.i = call target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32 4, i32 1, i32 1, i32 0, i1 false)
store target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, ptr @Out, align 4
; CSE: call i32 @llvm.dx.flattened.thread.id.in.group()
%0 = call i32 @llvm.dx.flattened.thread.id.in.group()
; CHECK-NOT: load {{.*}} ptr @In
%1 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
; CSE: call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t
%2 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %1, i32 %0)
; CHECK-NOT: load {{.*}} ptr @In
%3 = load target("dx.TypedBuffer", <4 x float>, 1, 0, 0), ptr @In, align 4
%4 = call noundef <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %3, i32 %0)
%add.i = fadd <4 x float> %2, %4
call void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0) %Out_h.i, i32 %0, <4 x float> %add.i)
; CHECK: ret void
ret void
}

; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
declare i32 @llvm.dx.flattened.thread.id.in.group() #1

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
; CSE: declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) [[ROAttr:#[0-9]+]]
declare <4 x float> @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_1_0_0t(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32) #2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're declaring the intrinsic with the wrong attributes here and then testing that it gets overwritten with the correct ones? That's kind of a weird thing to do...

I'd probably leave the declarations of the intrinsics out entirely since they'll be declared implicitly anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way I have at the moment to directly test the attributes applied in IntrinsicsDirectX.td. That does go better with a CodeGen test when these can be generated. I'm happy to leave that for that change and limit this testing to the indirect effects of the attribute here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the LLVM IR parser has a feature where if you use an intrinsic without declaring it, it will find the declaration implicitly. So what I'm saying is that you could omit the declare <4 x float> @llvm.dx.typedBufferLoad... line entirely and still check the generated attributes, rather than having a declaration whose attributes are intentionally wrong. The only potential pitfall is that the order of the declarations might not be guaranteed - it probably depends on the traversal order of some pass or other over the IR.

For example, consider https://hlsl.godbolt.org/z/6bohv41s5 - the source doesn't declare any intrinsics, but they're all present in the output of opt running no passes. Bonus - you don't even have to spell out the overload parts of the intrinsic names if you don't want to.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored the tests for intrinsic attributes as you suggested.


; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
; CSE: declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) [[WOAttr:#[0-9]+]]
declare void @llvm.dx.typedBufferStore.tdx.TypedBuffer_v4f32_1_0_0t.v4f32(target("dx.TypedBuffer", <4 x float>, 1, 0, 0), i32, <4 x float>) #2

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
declare target("dx.TypedBuffer", <4 x float>, 1, 0, 0) @llvm.dx.handle.fromBinding.tdx.TypedBuffer_v4f32_1_0_0t(i32, i32, i32, i32, i1) #3

; CSE: attributes [[ROAttr]] = { {{.*}} memory(read) }
; CSE: attributes [[WOAttr]] = { {{.*}} memory(write) }

attributes #0 = { convergent noinline norecurse "frame-pointer"="all" "hlsl.numthreads"="8,1,1" "hlsl.shader"="compute" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #1 = { mustprogress nofree nosync nounwind willreturn memory(none) }
attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn }
attributes #3 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }

!llvm.module.flags = !{!0, !1}
!dx.valver = !{!2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"frame-pointer", i32 2}
!2 = !{i32 1, i32 8}
!3 = !{!"clang version 20.0.0git ([email protected]:llvm/llvm-project.git 54dc966bd3d375d7c1604fac5fdac20989c1072a)"}
!4 = !{!5}
!5 = distinct !{!5, !6, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!6 = distinct !{!6, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!7 = !{!8, !9, i64 0}
!8 = !{!"_ZTSN4hlsl8RWBufferIDv4_fEE", !9, i64 0}
!9 = !{!"omnipotent char", !10, i64 0}
!10 = !{!"Simple C++ TBAA"}
!11 = !{!12}
!12 = distinct !{!12, !13, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!13 = distinct !{!13, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!14 = !{!15}
!15 = distinct !{!15, !16, !"_ZN4hlsl8RWBufferIDv4_fEixEi: %agg.result"}
!16 = distinct !{!16, !"_ZN4hlsl8RWBufferIDv4_fEixEi"}
!17 = !{!18, !9, i64 0}
!18 = !{!"_ZTSN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EE", !9, i64 0, !19, i64 4}
!19 = !{!"int", !9, i64 0}
!20 = !{!21}
!21 = distinct !{!21, !22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_: %agg.result"}
!22 = distinct !{!22, !"_ZN4hlsl8__detail18TypedResourceProxyIU9_Res_u_CTDv4_fu17__hlsl_resource_tS2_EaSES2_"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove all of this metadata. Most of it isn't even referenced currently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Funny that this shows as commenting on a large number of lines, but besides the text at the top, there's no visual indication of it.

Loading