-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Clang] Assume unaligned in maksed load / store builtins #156063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: Right now these enformce alignment, which isn't convenient for the user on platforms that support unaligned accesses. The options are to either permit passing the alignment manually, or just assume it's unaligned unless the user specifies it. I've added llvm#156057 which should make the requiested alignment show up on the intrinsic if the user passed `__builtin_assume_aligned`, however that's only with optimizations. This shouldn't cause issues unless the backend categorically decides to reject an unaligned access.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Joseph Huber (jhuber6) ChangesSummary: I've added #156057 which should Full diff: https://github.com/llvm/llvm-project/pull/156063.diff 3 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index cbe59124d5b99..3c280c7ea3a1d 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -948,7 +948,8 @@ Each builtin accesses memory according to a provided boolean mask. These are
provided as ``__builtin_masked_load`` and ``__builtin_masked_store``. The first
argument is always boolean mask vector. The ``__builtin_masked_load`` builtin
takes an optional third vector argument that will be used for the result of the
-masked-off lanes. These builtins assume the memory is always aligned.
+masked-off lanes. These builtins assume the memory is unaligned, use
+``__builtin_assume_aligned`` if alignment is desired.
The ``__builtin_masked_expand_load`` and ``__builtin_masked_compress_store``
builtins have the same interface but store the result in consecutive indices.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 172a521e63c17..b00b9dddd75fb 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -4277,10 +4277,6 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
llvm::Value *Ptr = EmitScalarExpr(E->getArg(1));
llvm::Type *RetTy = CGM.getTypes().ConvertType(E->getType());
- CharUnits Align = CGM.getNaturalTypeAlignment(E->getType(), nullptr);
- llvm::Value *AlignVal =
- llvm::ConstantInt::get(Int32Ty, Align.getQuantity());
-
llvm::Value *PassThru = llvm::PoisonValue::get(RetTy);
if (E->getNumArgs() > 2)
PassThru = EmitScalarExpr(E->getArg(2));
@@ -4289,8 +4285,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
if (BuiltinID == Builtin::BI__builtin_masked_load) {
Function *F =
CGM.getIntrinsic(Intrinsic::masked_load, {RetTy, UnqualPtrTy});
- Result =
- Builder.CreateCall(F, {Ptr, AlignVal, Mask, PassThru}, "masked_load");
+ Result = Builder.CreateCall(
+ F, {Ptr, llvm::ConstantInt::get(Int32Ty, 1), Mask, PassThru},
+ "masked_load");
} else {
Function *F = CGM.getIntrinsic(Intrinsic::masked_expandload, {RetTy});
Result =
@@ -4308,14 +4305,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
llvm::Type *ValLLTy = CGM.getTypes().ConvertType(ValTy);
llvm::Type *PtrTy = Ptr->getType();
- CharUnits Align = CGM.getNaturalTypeAlignment(ValTy, nullptr);
- llvm::Value *AlignVal =
- llvm::ConstantInt::get(Int32Ty, Align.getQuantity());
-
if (BuiltinID == Builtin::BI__builtin_masked_store) {
llvm::Function *F =
CGM.getIntrinsic(llvm::Intrinsic::masked_store, {ValLLTy, PtrTy});
- Builder.CreateCall(F, {Val, Ptr, AlignVal, Mask});
+ Builder.CreateCall(F,
+ {Val, Ptr, llvm::ConstantInt::get(Int32Ty, 1), Mask});
} else {
llvm::Function *F =
CGM.getIntrinsic(llvm::Intrinsic::masked_compressstore, {ValLLTy});
diff --git a/clang/test/CodeGen/builtin-masked.c b/clang/test/CodeGen/builtin-masked.c
index 579cf5c413c9b..1e0f63d280840 100644
--- a/clang/test/CodeGen/builtin-masked.c
+++ b/clang/test/CodeGen/builtin-masked.c
@@ -19,7 +19,7 @@ typedef _Bool v8b __attribute__((ext_vector_type(8)));
// CHECK-NEXT: [[LOAD_BITS2:%.*]] = load i8, ptr [[M_ADDR]], align 1
// CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[P_ADDR]], align 8
-// CHECK-NEXT: [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP2]], i32 32, <8 x i1> [[TMP1]], <8 x i32> poison)
+// CHECK-NEXT: [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP2]], i32 1, <8 x i1> [[TMP1]], <8 x i32> poison)
// CHECK-NEXT: ret <8 x i32> [[MASKED_LOAD]]
//
v8i test_load(v8b m, v8i *p) {
@@ -45,7 +45,7 @@ v8i test_load(v8b m, v8i *p) {
// CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[P_ADDR]], align 8
// CHECK-NEXT: [[TMP4:%.*]] = load <8 x i32>, ptr [[T_ADDR]], align 32
-// CHECK-NEXT: [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP3]], i32 32, <8 x i1> [[TMP2]], <8 x i32> [[TMP4]])
+// CHECK-NEXT: [[MASKED_LOAD:%.*]] = call <8 x i32> @llvm.masked.load.v8i32.p0(ptr [[TMP3]], i32 1, <8 x i1> [[TMP2]], <8 x i32> [[TMP4]])
// CHECK-NEXT: ret <8 x i32> [[MASKED_LOAD]]
//
v8i test_load_passthru(v8b m, v8i *p, v8i t) {
@@ -97,7 +97,7 @@ v8i test_load_expand(v8b m, v8i *p, v8i t) {
// CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[LOAD_BITS2]] to <8 x i1>
// CHECK-NEXT: [[TMP3:%.*]] = load <8 x i32>, ptr [[V_ADDR]], align 32
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[P_ADDR]], align 8
-// CHECK-NEXT: call void @llvm.masked.store.v8i32.p0(<8 x i32> [[TMP3]], ptr [[TMP4]], i32 32, <8 x i1> [[TMP2]])
+// CHECK-NEXT: call void @llvm.masked.store.v8i32.p0(<8 x i32> [[TMP3]], ptr [[TMP4]], i32 1, <8 x i1> [[TMP2]])
// CHECK-NEXT: ret void
//
void test_store(v8b m, v8i v, v8i *p) {
|
In theory we can lower masked load/store on any target; it's just a question of how terrible the resulting code is. And on a target that doesn't support unaligned load/store, the answer is, pretty terrible. But basically everything that supports vectors has unaligned access support these days. I'm a little concerned the argument types aren't what we want; it isn't great that We could use EmitPointerWithAlignment() here, and infer whatever alignment that implies. Which allows users to access an unaligned load if they really want. But unaligned vector types are weird: we allow using an "aligned" attribute on typedefs, but we don't represent it as a qualifier internally, so bad things happen with type canonicalization. Maybe we could consider changing the interface so it takes a pointer to the element type of the returned vector, instead of a pointer to the returned vector type. |
I don't know why these intrinsics take alignment in the first place, the compress / expand variants don't. I think it'd be reasonable to change the intrinsics to derive alignment from the pointer and not an extra argument. |
Summary:
Right now these enformce alignment, which isn't convenient for the user
on platforms that support unaligned accesses. The options are to either
permit passing the alignment manually, or just assume it's unaligned
unless the user specifies it.
I've added #156057 which should
make the requiested alignment show up on the intrinsic if the user
passed
__builtin_assume_aligned
, however that's only withoptimizations. This shouldn't cause issues unless the backend
categorically decides to reject an unaligned access.