-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 #158853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
e9b71e5
to
3028c5f
Compare
9412543
to
1292451
Compare
Hi @RKSimon, could you please take a look at this PR when you have time? |
@SeongjaeP Sorry but I missed this in my notifications - please can you resolves the merge conflicts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove test.o and fold-nested-max.ll
pushInteger(S, N == Value.getBitWidth() ? 0 : N + 1, Call->getType()); | ||
return true; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two functions were removed upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a merge went wrong somewhere, causing these issues.
I’ll resolve the conflicts and fix it soon — thanks for the review!
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp,c -- clang/lib/AST/ByteCode/InterpBuiltin.cpp clang/lib/AST/ExprConstant.cpp clang/lib/Headers/avx512dqintrin.h clang/lib/Headers/avx512fintrin.h clang/lib/Headers/avx512vldqintrin.h clang/lib/Headers/avx512vlintrin.h clang/test/CodeGen/X86/avx-builtins.c clang/test/CodeGen/X86/avx2-builtins.c clang/test/CodeGen/X86/avx512dq-builtins.c clang/test/CodeGen/X86/avx512f-builtins.c clang/test/CodeGen/X86/avx512vl-builtins.c clang/test/CodeGen/X86/avx512vldq-builtins.c
View the diff from clang-format here.diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index c8479b9b0..da5cd25ec 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -736,7 +736,6 @@ static bool interp__builtin_expect(InterpState &S, CodePtr OpPC,
return true;
}
-
/// rotateleft(value, amount)
static bool interp__builtin_rotate(InterpState &S, CodePtr OpPC,
const InterpFrame *Frame,
@@ -2839,8 +2838,8 @@ static bool interp__builtin_elementwise_triop(
//_builtin_extract
static bool interp__builtin_x86_extract_vector(InterpState &S, CodePtr OpPC,
- const CallExpr *Call,
- unsigned ID) {
+ const CallExpr *Call,
+ unsigned ID) {
assert(Call->getNumArgs() == 2);
APSInt ImmAPS = popToAPSInt(S, Call->getArg(1));
@@ -2878,7 +2877,8 @@ static bool interp__builtin_x86_extract_vector(InterpState &S, CodePtr OpPC,
return true;
}
-static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr OpPC,
+static bool interp__builtin_x86_extract_vector_masked(InterpState &S,
+ CodePtr OpPC,
const CallExpr *Call,
unsigned ID) {
assert(Call->getNumArgs() == 4);
@@ -2888,7 +2888,8 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
APSInt ImmAPS = popToAPSInt(S, Call->getArg(1));
const Pointer &Src = S.Stk.pop<Pointer>();
- if (!Src.getFieldDesc()->isPrimitiveArray() || !Merge.getFieldDesc()->isPrimitiveArray())
+ if (!Src.getFieldDesc()->isPrimitiveArray() ||
+ !Merge.getFieldDesc()->isPrimitiveArray())
return false;
const Pointer &Dst = S.Stk.peek<Pointer>();
@@ -2916,7 +2917,7 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
if ((Mask >> I) & 1)
Dst.elem<T>(I) = Src.elem<T>(Base + I);
else
- Dst.elem<T>(I) = Merge.elem<T>(I);
+ Dst.elem<T>(I) = Merge.elem<T>(I);
}
});
@@ -2924,7 +2925,6 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
return true;
}
-
static bool interp__builtin_x86_insert_subvector(InterpState &S, CodePtr OpPC,
const CallExpr *Call,
unsigned ID) {
@@ -3444,13 +3444,13 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
return LHS.isSigned() ? LHS.ssub_sat(RHS) : LHS.usub_sat(RHS);
});
- case X86::BI__builtin_ia32_extract128i256: // _mm256_extracti128
- case X86::BI__builtin_ia32_vextractf128_pd256: // _mm256_extractf128_ps
- case X86::BI__builtin_ia32_vextractf128_ps256: // _mm256_extractf128_pd
- case X86::BI__builtin_ia32_vextractf128_si256: // _mm256_extracti128_si256
+ case X86::BI__builtin_ia32_extract128i256: // _mm256_extracti128
+ case X86::BI__builtin_ia32_vextractf128_pd256: // _mm256_extractf128_ps
+ case X86::BI__builtin_ia32_vextractf128_ps256: // _mm256_extractf128_pd
+ case X86::BI__builtin_ia32_vextractf128_si256: // _mm256_extracti128_si256
return interp__builtin_x86_extract_vector(S, OpPC, Call, BuiltinID);
- // AVX-512 / AVX-512VL / AVX-512DQ
+ // AVX-512 / AVX-512VL / AVX-512DQ
case X86::BI__builtin_ia32_extractf32x4_256_mask:
case X86::BI__builtin_ia32_extractf32x4_mask:
case X86::BI__builtin_ia32_extractf32x8_mask:
@@ -3465,7 +3465,6 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
case X86::BI__builtin_ia32_extracti64x4_mask:
return interp__builtin_x86_extract_vector_masked(S, OpPC, Call, BuiltinID);
-
case clang::X86::BI__builtin_ia32_pavgb128:
case clang::X86::BI__builtin_ia32_pavgw128:
case clang::X86::BI__builtin_ia32_pavgb256:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 281f9a360..510011d0f 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -12037,56 +12037,59 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||
!EvaluateAsRValue(Info, E->getArg(1), SourceImm))
return false;
-
+
if (!SourceVec.isVector())
return false;
const auto *RetVT = E->getType()->castAs<VectorType>();
- if (!RetVT) return false;
+ if (!RetVT)
+ return false;
unsigned RetLen = RetVT->getNumElements();
unsigned SrcLen = SourceVec.getVectorLength();
- if (SrcLen != RetLen * 2)
+ if (SrcLen != RetLen * 2)
return false;
unsigned Idx = SourceImm.getInt().getZExtValue() & 1;
-
+
SmallVector<APValue, 32> ResultElements;
ResultElements.reserve(RetLen);
for (unsigned I = 0; I < RetLen; I++)
ResultElements.push_back(SourceVec.getVectorElt(Idx * RetLen + I));
-
+
return Success(APValue(ResultElements.data(), RetLen), E);
}
- case X86::BI__builtin_ia32_extracti32x4_256_mask:
+ case X86::BI__builtin_ia32_extracti32x4_256_mask:
case X86::BI__builtin_ia32_extractf32x4_256_mask:
- case X86::BI__builtin_ia32_extracti32x4_mask:
- case X86::BI__builtin_ia32_extractf32x4_mask:
- case X86::BI__builtin_ia32_extracti32x8_mask:
- case X86::BI__builtin_ia32_extractf32x8_mask:
- case X86::BI__builtin_ia32_extracti64x2_256_mask:
- case X86::BI__builtin_ia32_extractf64x2_256_mask:
- case X86::BI__builtin_ia32_extracti64x2_512_mask:
+ case X86::BI__builtin_ia32_extracti32x4_mask:
+ case X86::BI__builtin_ia32_extractf32x4_mask:
+ case X86::BI__builtin_ia32_extracti32x8_mask:
+ case X86::BI__builtin_ia32_extractf32x8_mask:
+ case X86::BI__builtin_ia32_extracti64x2_256_mask:
+ case X86::BI__builtin_ia32_extractf64x2_256_mask:
+ case X86::BI__builtin_ia32_extracti64x2_512_mask:
case X86::BI__builtin_ia32_extractf64x2_512_mask:
case X86::BI__builtin_ia32_extracti64x4_mask:
- case X86::BI__builtin_ia32_extractf64x4_mask:{
+ case X86::BI__builtin_ia32_extractf64x4_mask: {
APValue SourceVec, MergeVec;
APSInt Imm, MaskImm;
- if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||
- !EvaluateInteger(E->getArg(1), Imm, Info) ||
- !EvaluateAsRValue(Info, E->getArg(2), MergeVec) ||
- !EvaluateInteger(E->getArg(3), MaskImm, Info))
- return false;
+ if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||
+ !EvaluateInteger(E->getArg(1), Imm, Info) ||
+ !EvaluateAsRValue(Info, E->getArg(2), MergeVec) ||
+ !EvaluateInteger(E->getArg(3), MaskImm, Info))
+ return false;
const auto *RetVT = E->getType()->castAs<VectorType>();
unsigned RetLen = RetVT->getNumElements();
- if (!SourceVec.isVector() || !MergeVec.isVector()) return false;
+ if (!SourceVec.isVector() || !MergeVec.isVector())
+ return false;
unsigned SrcLen = SourceVec.getVectorLength();
- if (!SrcLen || !RetLen || (SrcLen % RetLen) != 0) return false;
+ if (!SrcLen || !RetLen || (SrcLen % RetLen) != 0)
+ return false;
unsigned Lanes = SrcLen / RetLen;
unsigned Lane = static_cast<unsigned>(Imm.getZExtValue() % Lanes);
@@ -12099,11 +12102,10 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
if ((Mask >> I) & 1)
ResultElements.push_back(SourceVec.getVectorElt(Base + I));
else
- ResultElements.push_back(MergeVec.getVectorElt(I));
+ ResultElements.push_back(MergeVec.getVectorElt(I));
}
return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
-
case X86::BI__builtin_ia32_vpshldd128:
case X86::BI__builtin_ia32_vpshldd256:
diff --git a/clang/lib/Headers/avx512dqintrin.h b/clang/lib/Headers/avx512dqintrin.h
index 0ff776b36..b2fb02ab1 100644
--- a/clang/lib/Headers/avx512dqintrin.h
+++ b/clang/lib/Headers/avx512dqintrin.h
@@ -1212,10 +1212,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
(__v8di)_mm512_setzero_si512());
}
-#define _mm512_extractf32x8_ps(A, imm) \
- ((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm), \
- (__v8sf)_mm256_setzero_ps(), \
- (__mmask8)-1))
+#define _mm512_extractf32x8_ps(A, imm) \
+ ((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm), \
+ (__v8sf)_mm256_setzero_ps(), \
+ (__mmask8) - 1))
#define _mm512_mask_extractf32x8_ps(W, U, A, imm) \
((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm), \
@@ -1227,11 +1227,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
(__v8sf)_mm256_setzero_ps(), \
(__mmask8)(U)))
-#define _mm512_extractf64x2_pd(A, imm) \
- ((__m128d)__builtin_ia32_extractf64x2_512_mask((__v8df)(__m512d)(A), \
- (int)(imm), \
- (__v2df)_mm_setzero_pd(), \
- (__mmask8)-1))
+#define _mm512_extractf64x2_pd(A, imm) \
+ ((__m128d)__builtin_ia32_extractf64x2_512_mask( \
+ (__v8df)(__m512d)(A), (int)(imm), (__v2df)_mm_setzero_pd(), \
+ (__mmask8) - 1))
#define _mm512_mask_extractf64x2_pd(W, U, A, imm) \
((__m128d)__builtin_ia32_extractf64x2_512_mask((__v8df)(__m512d)(A), \
@@ -1245,10 +1244,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
(__v2df)_mm_setzero_pd(), \
(__mmask8)(U)))
-#define _mm512_extracti32x8_epi32(A, imm) \
- ((__m256i)__builtin_ia32_extracti32x8_mask((__v16si)(__m512i)(A), (int)(imm), \
- (__v8si)_mm256_setzero_si256(), \
- (__mmask8)-1))
+#define _mm512_extracti32x8_epi32(A, imm) \
+ ((__m256i)__builtin_ia32_extracti32x8_mask( \
+ (__v16si)(__m512i)(A), (int)(imm), (__v8si)_mm256_setzero_si256(), \
+ (__mmask8) - 1))
#define _mm512_mask_extracti32x8_epi32(W, U, A, imm) \
((__m256i)__builtin_ia32_extracti32x8_mask((__v16si)(__m512i)(A), (int)(imm), \
@@ -1260,11 +1259,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
(__v8si)_mm256_setzero_si256(), \
(__mmask8)(U)))
-#define _mm512_extracti64x2_epi64(A, imm) \
- ((__m128i)__builtin_ia32_extracti64x2_512_mask((__v8di)(__m512i)(A), \
- (int)(imm), \
- (__v2di)_mm_setzero_si128(), \
- (__mmask8)-1))
+#define _mm512_extracti64x2_epi64(A, imm) \
+ ((__m128i)__builtin_ia32_extracti64x2_512_mask( \
+ (__v8di)(__m512i)(A), (int)(imm), (__v2di)_mm_setzero_si128(), \
+ (__mmask8) - 1))
#define _mm512_mask_extracti64x2_epi64(W, U, A, imm) \
((__m128i)__builtin_ia32_extracti64x2_512_mask((__v8di)(__m512i)(A), \
diff --git a/clang/lib/Headers/avx512fintrin.h b/clang/lib/Headers/avx512fintrin.h
index 2768a5bae..13370aaef 100644
--- a/clang/lib/Headers/avx512fintrin.h
+++ b/clang/lib/Headers/avx512fintrin.h
@@ -3164,10 +3164,10 @@ _mm512_maskz_permutex2var_epi64(__mmask8 __U, __m512i __A, __m512i __I,
(__v16si)_mm512_setzero_si512()))
/* Vector Extract */
-#define _mm512_extractf64x4_pd(A, I) \
- ((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(I), \
- (__v4df)_mm256_setzero_pd(), \
- (__mmask8)-1))
+#define _mm512_extractf64x4_pd(A, I) \
+ ((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(I), \
+ (__v4df)_mm256_setzero_pd(), \
+ (__mmask8) - 1))
#define _mm512_mask_extractf64x4_pd(W, U, A, imm) \
((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(imm), \
@@ -3179,10 +3179,10 @@ _mm512_maskz_permutex2var_epi64(__mmask8 __U, __m512i __A, __m512i __I,
(__v4df)_mm256_setzero_pd(), \
(__mmask8)(U)))
-#define _mm512_extractf32x4_ps(A, I) \
- ((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(I), \
- (__v4sf)_mm_setzero_ps(), \
- (__mmask8)-1))
+#define _mm512_extractf32x4_ps(A, I) \
+ ((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(I), \
+ (__v4sf)_mm_setzero_ps(), \
+ (__mmask8) - 1))
#define _mm512_mask_extractf32x4_ps(W, U, A, imm) \
((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(imm), \
@@ -7105,10 +7105,10 @@ _mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
__builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
}
-#define _mm512_extracti32x4_epi32(A, imm) \
- ((__m128i)__builtin_ia32_extracti32x4_mask((__v16si)(__m512i)(A), (int)(imm), \
- (__v4si)_mm_setzero_si128(), \
- (__mmask8)-1))
+#define _mm512_extracti32x4_epi32(A, imm) \
+ ((__m128i)__builtin_ia32_extracti32x4_mask( \
+ (__v16si)(__m512i)(A), (int)(imm), (__v4si)_mm_setzero_si128(), \
+ (__mmask8) - 1))
#define _mm512_mask_extracti32x4_epi32(W, U, A, imm) \
((__m128i)__builtin_ia32_extracti32x4_mask((__v16si)(__m512i)(A), (int)(imm), \
@@ -7120,10 +7120,10 @@ _mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
(__v4si)_mm_setzero_si128(), \
(__mmask8)(U)))
-#define _mm512_extracti64x4_epi64(A, imm) \
+#define _mm512_extracti64x4_epi64(A, imm) \
((__m256i)__builtin_ia32_extracti64x4_mask((__v8di)(__m512i)(A), (int)(imm), \
- (__v4di)_mm256_setzero_si256(), \
- (__mmask8)-1))
+ (__v4di)_mm256_setzero_si256(), \
+ (__mmask8) - 1))
#define _mm512_mask_extracti64x4_epi64(W, U, A, imm) \
((__m256i)__builtin_ia32_extracti64x4_mask((__v8di)(__m512i)(A), (int)(imm), \
diff --git a/clang/lib/Headers/avx512vldqintrin.h b/clang/lib/Headers/avx512vldqintrin.h
index 2d3c4b551..8aded1c47 100644
--- a/clang/lib/Headers/avx512vldqintrin.h
+++ b/clang/lib/Headers/avx512vldqintrin.h
@@ -1072,11 +1072,10 @@ _mm256_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
(__v4di)_mm256_setzero_si256());
}
-#define _mm256_extractf64x2_pd(A, imm) \
- ((__m128d)__builtin_ia32_extractf64x2_256_mask((__v4df)(__m256d)(A), \
- (int)(imm), \
- (__v2df)_mm_setzero_pd(), \
- (__mmask8)-1))
+#define _mm256_extractf64x2_pd(A, imm) \
+ ((__m128d)__builtin_ia32_extractf64x2_256_mask( \
+ (__v4df)(__m256d)(A), (int)(imm), (__v2df)_mm_setzero_pd(), \
+ (__mmask8) - 1))
#define _mm256_mask_extractf64x2_pd(W, U, A, imm) \
((__m128d)__builtin_ia32_extractf64x2_256_mask((__v4df)(__m256d)(A), \
@@ -1090,11 +1089,10 @@ _mm256_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
(__v2df)_mm_setzero_pd(), \
(__mmask8)(U)))
-#define _mm256_extracti64x2_epi64(A, imm) \
- ((__m128i)__builtin_ia32_extracti64x2_256_mask((__v4di)(__m256i)(A), \
- (int)(imm), \
- (__v2di)_mm_setzero_si128(), \
- (__mmask8)-1))
+#define _mm256_extracti64x2_epi64(A, imm) \
+ ((__m128i)__builtin_ia32_extracti64x2_256_mask( \
+ (__v4di)(__m256i)(A), (int)(imm), (__v2di)_mm_setzero_si128(), \
+ (__mmask8) - 1))
#define _mm256_mask_extracti64x2_epi64(W, U, A, imm) \
((__m128i)__builtin_ia32_extracti64x2_256_mask((__v4di)(__m256i)(A), \
diff --git a/clang/lib/Headers/avx512vlintrin.h b/clang/lib/Headers/avx512vlintrin.h
index 252fb1119..eefeb1dad 100644
--- a/clang/lib/Headers/avx512vlintrin.h
+++ b/clang/lib/Headers/avx512vlintrin.h
@@ -7606,11 +7606,10 @@ _mm256_mask_cvtepi64_storeu_epi16 (void * __P, __mmask8 __M, __m256i __A)
__builtin_ia32_pmovqw256mem_mask ((__v8hi *) __P, (__v4di) __A, __M);
}
-#define _mm256_extractf32x4_ps(A, imm) \
- ((__m128)__builtin_ia32_extractf32x4_256_mask((__v8sf)(__m256)(A), \
- (int)(imm), \
- (__v4sf)_mm_setzero_ps(), \
- (__mmask8)-1))
+#define _mm256_extractf32x4_ps(A, imm) \
+ ((__m128)__builtin_ia32_extractf32x4_256_mask( \
+ (__v8sf)(__m256)(A), (int)(imm), (__v4sf)_mm_setzero_ps(), \
+ (__mmask8) - 1))
#define _mm256_mask_extractf32x4_ps(W, U, A, imm) \
((__m128)__builtin_ia32_extractf32x4_256_mask((__v8sf)(__m256)(A), \
@@ -7624,11 +7623,10 @@ _mm256_mask_cvtepi64_storeu_epi16 (void * __P, __mmask8 __M, __m256i __A)
(__v4sf)_mm_setzero_ps(), \
(__mmask8)(U)))
-#define _mm256_extracti32x4_epi32(A, imm) \
- ((__m128i)__builtin_ia32_extracti32x4_256_mask((__v8si)(__m256i)(A), \
- (int)(imm), \
- (__v4si)_mm_setzero_si128(), \
- (__mmask8)-1))
+#define _mm256_extracti32x4_epi32(A, imm) \
+ ((__m128i)__builtin_ia32_extracti32x4_256_mask( \
+ (__v8si)(__m256i)(A), (int)(imm), (__v4si)_mm_setzero_si128(), \
+ (__mmask8) - 1))
#define _mm256_mask_extracti32x4_epi32(W, U, A, imm) \
((__m128i)__builtin_ia32_extracti32x4_256_mask((__v8si)(__m256i)(A), \
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like a merge has gone wrong at some point - as you've altered a lot of recent code changes by other commits - you might have to consider a rebase or start a new PR
APSInt Val = popToAPSInt(S, Call->getArg(0)); | ||
pushInteger(S, Val.reverseBits(), Call->getType()); | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these - they have been removed from a recent patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see — that makes sense. I must have messed up the merge at some point.
I’m still getting used to the rebase workflow, so mistakes like this happen sometimes.
I’ll fix it and clean up the patch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, when merge conflicts like this happen, what’s the usual approach that people take?
Do most contributors usually rebase onto the latest upstream, or do they start a new PR from upstream/main and cherry-pick their commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing the merge is the best approach, rebasing usually means a lot of the review comments get lost which can be terrible for continuing the review for some patches - often at the point its better not to rebase, create a new PR and refer to the old one for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance and your patience. This branch has become difficult to manage, so I'll open a fresh PR and try again.
Implements constexpr evaluation for: - _mm256_extracti128_si256 (AVX2, VEXTRACTI128) - _mm256_extractf128_ps - _mm256_extractf128_pd - _mm256_extractf128_si256 These now work correctly in constant expressions by extracting the appropriate 128-bit lane from a 256-bit vector.
std::realloc is declared there
…uiltins in InterpBuiltin - Route AVX/AVX2 vextractf128/ extract128i256 to 2-arg extract helper. - Route all AVX-512(VL/DQ) extract builtins to unified 4-arg masked helper: * extractf32x4_{256,_} * extractf32x8_ * extractf64x2_{256,512} * extractf64x4_ * extracti32x4_{256,_} * extracti32x8_ * extracti64x2_{256,512} * extracti64x4_ - Implement mask/merge/all-ones(mask=plain)/maskz semantics. - Initialize all elements in the destination vector. NOTE: Tests are not included yet. This patch wires up InterpBuiltin support only. A follow-up patch will add constexpr tests under clang/test/AST/Interp/.
21f366b
to
24e06be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InterpBuiltin.cpp is still really broken
… AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 (#162836) **This PR supersedes and replaces PR #158853** The original branch diverged too far from the main branch, resulting in significant merge conflicts that were difficult to resolve cleanly. To provide a clean and reviewable history, this new PR was created by cherry-picking the necessary commits onto a fresh branch based on the latest `main`. --- *(Original Description)* This patch enables the use of AVX/AVX512 subvector extraction intrinsics within `constexpr` functions. This is achieved by implementing the evaluation logic for these intrinsics in `VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`. The original discussion and review comments can be found in the previous pull request for context: #158853 Fixes #157712
AVX/AVX512 extract intrinsics now support constexpr evaluation in both the AST evaluator and bytecode interpreter paths.
Fixes #157712