Skip to content

Commit d35931c

Browse files
authored
[Clang][CodeGen][X86] don't coerce int128 into {i64,i64} for SysV-like ABIs (#135230)
Currently, clang coerces (u)int128_t to two i64 IR parameters when they are passed in registers. This leads to broken debug info for them after applying SROA+InstCombine. SROA generates IR like this ([godbolt](https://godbolt.org/z/YrTa4chfc)): ```llvm define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1) { entry: %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128 %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64 %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128 %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18) // ... !17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14) // ... ``` and InstCombine then removes the `or`, moving it into the `DIExpression`, and the `shl` at which point the debug info salvaging in `Transforms/Local` replaces the arguments with `poison` as it does not allow constants larger than 64 bit in `DIExpression`s. I'm working under the assumption that there is interest in fixing this. If not, please tell me. By not coercing `int128_t`s into `{i64, i64}` but keeping them as `i128`, the debug info stays intact and SelectionDAG then generates two `DW_OP_LLVM_fragment` expressions for the two corresponding argument registers. Given that the ABI code for x64 seems to not coerce the argument when it is passed on the stack, it should not lead to any problems keeping it as an `i128` when it is passed in registers. Alternatively, this could be fixed by checking if a constant value fits in 64 bits in the debug info salvaging code and then extending the value on the expression stack to the necessary width. This fixes InstCombine breaking the debug info but then SelectionDAG removes the expression and that seems significantly more complex to debug. Another fix may be to generate `DW_OP_LLVM_fragment` expressions when removing the `or` as it gets marked as disjoint by InstCombine. However, I don't know if the KnownBits information is still available at the time the `or` gets removed and it would probably require refactoring of the debug info salvaging code as that currently only seems to replace single expressions and is not designed to support generating new debug records. Converting `(u)int128_t` arguments to `i128` in the IR seems like the simpler solution, if it doesn't cause any ABI issues.
1 parent e4a3541 commit d35931c

File tree

7 files changed

+96
-67
lines changed

7 files changed

+96
-67
lines changed

clang/lib/CodeGen/Targets/X86.cpp

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2470,13 +2470,12 @@ GetSSETypeAtOffset(llvm::Type *IRType, unsigned IROffset,
24702470
return llvm::Type::getDoubleTy(getVMContext());
24712471
}
24722472

2473-
24742473
/// GetINTEGERTypeAtOffset - The ABI specifies that a value should be passed in
2475-
/// an 8-byte GPR. This means that we either have a scalar or we are talking
2476-
/// about the high or low part of an up-to-16-byte struct. This routine picks
2477-
/// the best LLVM IR type to represent this, which may be i64 or may be anything
2478-
/// else that the backend will pass in a GPR that works better (e.g. i8, %foo*,
2479-
/// etc).
2474+
/// one or more 8-byte GPRs. This means that we either have a scalar or we are
2475+
/// talking about the high and/or low part of an up-to-16-byte struct. This
2476+
/// routine picks the best LLVM IR type to represent this, which may be i64 or
2477+
/// may be anything else that the backend will pass in GPRs that works better
2478+
/// (e.g. i8, %foo*, etc).
24802479
///
24812480
/// PrefType is an LLVM IR type that corresponds to (part of) the IR type for
24822481
/// the source type. IROffset is an offset in bytes into the LLVM IR type that
@@ -2534,6 +2533,13 @@ GetINTEGERTypeAtOffset(llvm::Type *IRType, unsigned IROffset,
25342533
SourceOffset);
25352534
}
25362535

2536+
// if we have a 128-bit integer, we can pass it safely using an i128
2537+
// so we return that
2538+
if (IRType->isIntegerTy(128)) {
2539+
assert(IROffset == 0);
2540+
return IRType;
2541+
}
2542+
25372543
// Okay, we don't have any better idea of what to pass, so we pass this in an
25382544
// integer register that isn't too big to fit the rest of the struct.
25392545
unsigned TySizeInBytes =
@@ -2591,8 +2597,7 @@ GetX86_64ByValArgumentPair(llvm::Type *Lo, llvm::Type *Hi,
25912597
return Result;
25922598
}
25932599

2594-
ABIArgInfo X86_64ABIInfo::
2595-
classifyReturnType(QualType RetTy) const {
2600+
ABIArgInfo X86_64ABIInfo::classifyReturnType(QualType RetTy) const {
25962601
// AMD64-ABI 3.2.3p4: Rule 1. Classify the return type with the
25972602
// classification algorithm.
25982603
X86_64ABIInfo::Class Lo, Hi;
@@ -2638,6 +2643,12 @@ classifyReturnType(QualType RetTy) const {
26382643
isPromotableIntegerTypeForABI(RetTy))
26392644
return ABIArgInfo::getExtend(RetTy);
26402645
}
2646+
2647+
if (ResType->isIntegerTy(128)) {
2648+
// i128 are passed directly
2649+
assert(Hi == Integer);
2650+
return ABIArgInfo::getDirect(ResType);
2651+
}
26412652
break;
26422653

26432654
// AMD64-ABI 3.2.3p4: Rule 4. If the class is SSE, the next
@@ -2783,6 +2794,11 @@ X86_64ABIInfo::classifyArgumentType(QualType Ty, unsigned freeIntRegs,
27832794
return ABIArgInfo::getExtend(Ty, CGT.ConvertType(Ty));
27842795
}
27852796

2797+
if (ResType->isIntegerTy(128)) {
2798+
assert(Hi == Integer);
2799+
++neededInt;
2800+
return ABIArgInfo::getDirect(ResType);
2801+
}
27862802
break;
27872803

27882804
// AMD64-ABI 3.2.3p3: Rule 3. If the class is SSE, the next
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// no autogeneration since update_cc_test_checks does not support -g
2+
// RUN: %clang_cc1 -triple x86_64-pc-linux -O1 -debug-info-kind=limited -emit-llvm -o - %s | FileCheck %s
3+
4+
// CHECK-LABEL: define{{.*}} i128 @add(i128 noundef %a)
5+
// CHECK: #dbg_value(i128 %a, ![[DI:.*]], !DIExpression()
6+
__int128_t add(__int128_t a) {
7+
return a + a;
8+
}
9+
10+
// CHECK: ![[DI]] = !DILocalVariable(name: "a", arg: 1

clang/test/CodeGen/X86/x86_64-arguments.c

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,45 @@ struct s68 {
551551
void f68(struct s68 x) {
552552
}
553553

554+
// CHECK-LABEL: define{{.*}} i128 @f69(i128 noundef %a)
555+
__int128_t f69(__int128_t a) {
556+
return a;
557+
}
558+
559+
// CHECK-LABEL: define{{.*}} i128 @f70(i128 noundef %a)
560+
__uint128_t f70(__uint128_t a) {
561+
return a;
562+
}
563+
564+
// check that registers are correctly counted for (u)int128_t arguments
565+
struct s71 {
566+
long long a, b;
567+
};
568+
// CHECK-LABEL: define{{.*}} void @f71(i128 noundef %a, i128 noundef %b, i64 noundef %c, ptr noundef byval(%struct.s71) align 8 %d)
569+
void f71(__int128_t a, __int128_t b, long long c, struct s71 d) {
570+
}
571+
// CHECK-LABEL: define{{.*}} void @f72(i128 noundef %a, i128 noundef %b, i64 %d.coerce0, i64 %d.coerce1)
572+
void f72(__int128_t a, __int128_t b, struct s71 d) {
573+
}
574+
575+
// check that structs containing (u)int128_t are passed correctly
576+
struct s73 {
577+
struct inner {
578+
__uint128_t a;
579+
};
580+
struct inner in;
581+
};
582+
// CHECK-LABEL: define{{.*}} i128 @f73(i128 %a.coerce)
583+
struct s73 f73(struct s73 a) {
584+
return a;
585+
}
586+
587+
// check that _BitInt(128) is still passed correctly on the stack
588+
// CHECK-LABEL: define{{.*}} i128 @f74(i128 noundef %b, i128 noundef %c, i128 noundef %d, i64 noundef %e, ptr noundef byval(i128) align 8 %0)
589+
_BitInt(128) f74(__uint128_t b, __uint128_t c, __uint128_t d, long e, _BitInt(128) a) {
590+
return a;
591+
}
592+
554593
/// The synthesized __va_list_tag does not have file/line fields.
555594
// CHECK: = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "__va_list_tag",
556595
// CHECK-NOT: file:

clang/test/CodeGen/alloc-align-attr.c

Lines changed: 17 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -70,66 +70,42 @@ __INT32_TYPE__ test4(__SIZE_TYPE__ a) {
7070

7171
struct Empty {};
7272
struct MultiArgs { __INT64_TYPE__ a, b;};
73-
// Struct parameter doesn't take up an IR parameter, 'i' takes up 2.
73+
// Struct parameter doesn't take up an IR parameter, 'i' takes up 1.
7474
// Truncation to i64 is permissible, since alignments of greater than 2^64 are insane.
7575
__INT32_TYPE__ *m3(struct Empty s, __int128_t i) __attribute__((alloc_align(2)));
7676
// CHECK-LABEL: @test5(
7777
// CHECK-NEXT: entry:
78-
// CHECK-NEXT: [[A:%.*]] = alloca i128, align 16
7978
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i128, align 16
8079
// CHECK-NEXT: [[E:%.*]] = alloca [[STRUCT_EMPTY:%.*]], align 1
81-
// CHECK-NEXT: [[COERCE:%.*]] = alloca i128, align 16
82-
// CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 0
83-
// CHECK-NEXT: store i64 [[A_COERCE0:%.*]], ptr [[TMP0]], align 16
84-
// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 1
85-
// CHECK-NEXT: store i64 [[A_COERCE1:%.*]], ptr [[TMP1]], align 8
86-
// CHECK-NEXT: [[A1:%.*]] = load i128, ptr [[A]], align 16
87-
// CHECK-NEXT: store i128 [[A1]], ptr [[A_ADDR]], align 16
88-
// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[A_ADDR]], align 16
89-
// CHECK-NEXT: store i128 [[TMP2]], ptr [[COERCE]], align 16
90-
// CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 0
91-
// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP3]], align 16
92-
// CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 1
93-
// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP5]], align 8
94-
// CHECK-NEXT: [[CALL:%.*]] = call ptr @m3(i64 noundef [[TMP4]], i64 noundef [[TMP6]])
95-
// CHECK-NEXT: [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP2]] to i64
80+
// CHECK-NEXT: store i128 [[A:%.*]], ptr [[A_ADDR]], align 16
81+
// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr [[A_ADDR]], align 16
82+
// CHECK-NEXT: [[CALL:%.*]] = call ptr @m3(i128 noundef [[TMP0]])
83+
// CHECK-NEXT: [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP0]] to i64
9684
// CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(ptr [[CALL]], i64 [[CASTED_ALIGN]]) ]
97-
// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[CALL]], align 4
98-
// CHECK-NEXT: ret i32 [[TMP7]]
85+
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[CALL]], align 4
86+
// CHECK-NEXT: ret i32 [[TMP1]]
9987
//
10088
__INT32_TYPE__ test5(__int128_t a) {
10189
struct Empty e;
10290
return *m3(e, a);
10391
}
104-
// Struct parameter takes up 2 parameters, 'i' takes up 2.
92+
// Struct parameter takes up 2 parameters, 'i' takes up 1.
10593
__INT32_TYPE__ *m4(struct MultiArgs s, __int128_t i) __attribute__((alloc_align(2)));
10694
// CHECK-LABEL: @test6(
10795
// CHECK-NEXT: entry:
108-
// CHECK-NEXT: [[A:%.*]] = alloca i128, align 16
10996
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i128, align 16
11097
// CHECK-NEXT: [[E:%.*]] = alloca [[STRUCT_MULTIARGS:%.*]], align 8
111-
// CHECK-NEXT: [[COERCE:%.*]] = alloca i128, align 16
112-
// CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 0
113-
// CHECK-NEXT: store i64 [[A_COERCE0:%.*]], ptr [[TMP0]], align 16
114-
// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 1
115-
// CHECK-NEXT: store i64 [[A_COERCE1:%.*]], ptr [[TMP1]], align 8
116-
// CHECK-NEXT: [[A1:%.*]] = load i128, ptr [[A]], align 16
117-
// CHECK-NEXT: store i128 [[A1]], ptr [[A_ADDR]], align 16
118-
// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[A_ADDR]], align 16
119-
// CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 0
98+
// CHECK-NEXT: store i128 [[A:%.*]], ptr [[A_ADDR]], align 16
99+
// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr [[A_ADDR]], align 16
100+
// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 0
101+
// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[TMP1]], align 8
102+
// CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 1
120103
// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP3]], align 8
121-
// CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 1
122-
// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP5]], align 8
123-
// CHECK-NEXT: store i128 [[TMP2]], ptr [[COERCE]], align 16
124-
// CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 0
125-
// CHECK-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP7]], align 16
126-
// CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 1
127-
// CHECK-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP9]], align 8
128-
// CHECK-NEXT: [[CALL:%.*]] = call ptr @m4(i64 [[TMP4]], i64 [[TMP6]], i64 noundef [[TMP8]], i64 noundef [[TMP10]])
129-
// CHECK-NEXT: [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP2]] to i64
104+
// CHECK-NEXT: [[CALL:%.*]] = call ptr @m4(i64 [[TMP2]], i64 [[TMP4]], i128 noundef [[TMP0]])
105+
// CHECK-NEXT: [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP0]] to i64
130106
// CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(ptr [[CALL]], i64 [[CASTED_ALIGN]]) ]
131-
// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[CALL]], align 4
132-
// CHECK-NEXT: ret i32 [[TMP11]]
107+
// CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[CALL]], align 4
108+
// CHECK-NEXT: ret i32 [[TMP5]]
133109
//
134110
__INT32_TYPE__ test6(__int128_t a) {
135111
struct MultiArgs e;

clang/test/CodeGen/builtins.c

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -956,36 +956,24 @@ void test_builtin_os_log_errno(void) {
956956
void test_builtin_os_log_long_double(void *buf, long double ld) {
957957
// CHECK: %[[BUF_ADDR:.*]] = alloca ptr, align 8
958958
// CHECK: %[[LD_ADDR:.*]] = alloca x86_fp80, align 16
959-
// CHECK: %[[COERCE:.*]] = alloca i128, align 16
960959
// CHECK: store ptr %[[BUF]], ptr %[[BUF_ADDR]], align 8
961960
// CHECK: store x86_fp80 %[[LD]], ptr %[[LD_ADDR]], align 16
962961
// CHECK: %[[V0:.*]] = load ptr, ptr %[[BUF_ADDR]], align 8
963962
// CHECK: %[[V1:.*]] = load x86_fp80, ptr %[[LD_ADDR]], align 16
964963
// CHECK: %[[V2:.*]] = bitcast x86_fp80 %[[V1]] to i80
965964
// CHECK: %[[V3:.*]] = zext i80 %[[V2]] to i128
966-
// CHECK: store i128 %[[V3]], ptr %[[COERCE]], align 16
967-
// CHECK: %[[V5:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[COERCE]], i32 0, i32 0
968-
// CHECK: %[[V6:.*]] = load i64, ptr %[[V5]], align 16
969-
// CHECK: %[[V7:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[COERCE]], i32 0, i32 1
970-
// CHECK: %[[V8:.*]] = load i64, ptr %[[V7]], align 8
971-
// CHECK: call void @__os_log_helper_1_0_1_16_0(ptr noundef %[[V0]], i64 noundef %[[V6]], i64 noundef %[[V8]])
965+
// CHECK: call void @__os_log_helper_1_0_1_16_0(ptr noundef %[[V0]], i128 noundef %[[V3]])
972966

973967
__builtin_os_log_format(buf, "%Lf", ld);
974968
}
975969

976970
// CHECK-LABEL: define linkonce_odr hidden void @__os_log_helper_1_0_1_16_0
977-
// CHECK: (ptr noundef %[[BUFFER:.*]], i64 noundef %[[ARG0_COERCE0:.*]], i64 noundef %[[ARG0_COERCE1:.*]])
971+
// CHECK: (ptr noundef %[[BUFFER:.*]], i128 noundef %[[ARG0:.*]])
978972

979-
// CHECK: %[[ARG0:.*]] = alloca i128, align 16
980973
// CHECK: %[[BUFFER_ADDR:.*]] = alloca ptr, align 8
981974
// CHECK: %[[ARG0_ADDR:.*]] = alloca i128, align 16
982-
// CHECK: %[[V1:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[ARG0]], i32 0, i32 0
983-
// CHECK: store i64 %[[ARG0_COERCE0]], ptr %[[V1]], align 16
984-
// CHECK: %[[V2:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[ARG0]], i32 0, i32 1
985-
// CHECK: store i64 %[[ARG0_COERCE1]], ptr %[[V2]], align 8
986-
// CHECK: %[[ARG01:.*]] = load i128, ptr %[[ARG0]], align 16
987975
// CHECK: store ptr %[[BUFFER]], ptr %[[BUFFER_ADDR]], align 8
988-
// CHECK: store i128 %[[ARG01]], ptr %[[ARG0_ADDR]], align 16
976+
// CHECK: store i128 %[[ARG0]], ptr %[[ARG0_ADDR]], align 16
989977
// CHECK: %[[BUF:.*]] = load ptr, ptr %[[BUFFER_ADDR]], align 8
990978
// CHECK: %[[SUMMARY:.*]] = getelementptr i8, ptr %[[BUF]], i64 0
991979
// CHECK: store i8 0, ptr %[[SUMMARY]], align 1

clang/test/CodeGen/ext-int-cc.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
// Make sure 128 and 64 bit versions are passed like integers.
3434
void ParamPassing(_BitInt(128) b, _BitInt(64) c) {}
35-
// LIN64: define{{.*}} void @ParamPassing(i64 %{{.+}}, i64 %{{.+}}, i64 %{{.+}})
35+
// LIN64: define{{.*}} void @ParamPassing(i128 %{{.+}}, i64 %{{.+}})
3636
// WIN64: define dso_local void @ParamPassing(ptr %{{.+}}, i64 %{{.+}})
3737
// LIN32: define{{.*}} void @ParamPassing(ptr %{{.+}}, i64 %{{.+}})
3838
// WIN32: define dso_local void @ParamPassing(ptr %{{.+}}, i64 %{{.+}})
@@ -251,7 +251,7 @@ _BitInt(127) ReturnPassing3(void) { return 0; }
251251
// LA32: define{{.*}} void @ReturnPassing3(ptr dead_on_unwind noalias writable sret
252252

253253
_BitInt(128) ReturnPassing4(void) { return 0; }
254-
// LIN64: define{{.*}} { i64, i64 } @ReturnPassing4(
254+
// LIN64: define{{.*}} i128 @ReturnPassing4(
255255
// WIN64: define dso_local void @ReturnPassing4(ptr dead_on_unwind noalias writable sret
256256
// LIN32: define{{.*}} void @ReturnPassing4(ptr dead_on_unwind noalias writable sret
257257
// WIN32: define dso_local void @ReturnPassing4(ptr dead_on_unwind noalias writable sret

clang/test/CodeGen/extend-arg-64.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ int test(void) {
8484
#ifdef D128
8585
knr(i128);
8686
// CHECKEXT: load i128
87-
// CHECKEXT: call{{.*}} void (i64, i64, ...) @knr
87+
// CHECKEXT: call{{.*}} void (i128, ...) @knr
8888
#endif
8989

9090
knr(u32, s32, u16, s16, u8, s8);

0 commit comments

Comments
 (0)