Skip to content

Conversation

SeongjaeP
Copy link
Contributor

@SeongjaeP SeongjaeP commented Sep 16, 2025

AVX/AVX512 extract intrinsics now support constexpr evaluation in both the AST evaluator and bytecode interpreter paths.

Fixes #157712

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@SeongjaeP SeongjaeP force-pushed the issue157712-avx-extract branch from e9b71e5 to 3028c5f Compare September 30, 2025 02:04
@SeongjaeP SeongjaeP changed the title Issue157712 avx extract [Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 Sep 30, 2025
@SeongjaeP SeongjaeP marked this pull request as ready for review September 30, 2025 06:34
@SeongjaeP SeongjaeP force-pushed the issue157712-avx-extract branch from 9412543 to 1292451 Compare October 1, 2025 03:47
@SeongjaeP
Copy link
Contributor Author

Hi @RKSimon, could you please take a look at this PR when you have time?
Thanks!

@RKSimon RKSimon requested review from RKSimon and tbaederr October 4, 2025 17:35
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 6, 2025

@SeongjaeP Sorry but I missed this in my notifications - please can you resolves the merge conflicts?

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove test.o and fold-nested-max.ll

@SeongjaeP SeongjaeP requested a review from RKSimon October 8, 2025 17:19
pushInteger(S, N == Value.getBitWidth() ? 0 : N + 1, Call->getType());
return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two functions were removed upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a merge went wrong somewhere, causing these issues.
I’ll resolve the conflicts and fix it soon — thanks for the review!

@tbaederr tbaederr added clang:frontend Language frontend issues, e.g. anything involving "Sema" constexpr Anything related to constant evaluation clang:bytecode Issues for the clang bytecode constexpr interpreter labels Oct 9, 2025
Copy link

github-actions bot commented Oct 9, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,cpp,c -- clang/lib/AST/ByteCode/InterpBuiltin.cpp clang/lib/AST/ExprConstant.cpp clang/lib/Headers/avx512dqintrin.h clang/lib/Headers/avx512fintrin.h clang/lib/Headers/avx512vldqintrin.h clang/lib/Headers/avx512vlintrin.h clang/test/CodeGen/X86/avx-builtins.c clang/test/CodeGen/X86/avx2-builtins.c clang/test/CodeGen/X86/avx512dq-builtins.c clang/test/CodeGen/X86/avx512f-builtins.c clang/test/CodeGen/X86/avx512vl-builtins.c clang/test/CodeGen/X86/avx512vldq-builtins.c

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index c8479b9b0..da5cd25ec 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -736,7 +736,6 @@ static bool interp__builtin_expect(InterpState &S, CodePtr OpPC,
   return true;
 }
 
-
 /// rotateleft(value, amount)
 static bool interp__builtin_rotate(InterpState &S, CodePtr OpPC,
                                    const InterpFrame *Frame,
@@ -2839,8 +2838,8 @@ static bool interp__builtin_elementwise_triop(
 
 //_builtin_extract
 static bool interp__builtin_x86_extract_vector(InterpState &S, CodePtr OpPC,
-                                                 const CallExpr *Call,
-                                                 unsigned ID) {
+                                               const CallExpr *Call,
+                                               unsigned ID) {
   assert(Call->getNumArgs() == 2);
 
   APSInt ImmAPS = popToAPSInt(S, Call->getArg(1));
@@ -2878,7 +2877,8 @@ static bool interp__builtin_x86_extract_vector(InterpState &S, CodePtr OpPC,
   return true;
 }
 
-static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr OpPC,
+static bool interp__builtin_x86_extract_vector_masked(InterpState &S,
+                                                      CodePtr OpPC,
                                                       const CallExpr *Call,
                                                       unsigned ID) {
   assert(Call->getNumArgs() == 4);
@@ -2888,7 +2888,8 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
   APSInt ImmAPS = popToAPSInt(S, Call->getArg(1));
   const Pointer &Src = S.Stk.pop<Pointer>();
 
-  if (!Src.getFieldDesc()->isPrimitiveArray() || !Merge.getFieldDesc()->isPrimitiveArray())
+  if (!Src.getFieldDesc()->isPrimitiveArray() ||
+      !Merge.getFieldDesc()->isPrimitiveArray())
     return false;
 
   const Pointer &Dst = S.Stk.peek<Pointer>();
@@ -2916,7 +2917,7 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
       if ((Mask >> I) & 1)
         Dst.elem<T>(I) = Src.elem<T>(Base + I);
       else
-        Dst.elem<T>(I) = Merge.elem<T>(I);   
+        Dst.elem<T>(I) = Merge.elem<T>(I);
     }
   });
 
@@ -2924,7 +2925,6 @@ static bool interp__builtin_x86_extract_vector_masked(InterpState &S, CodePtr Op
   return true;
 }
 
-
 static bool interp__builtin_x86_insert_subvector(InterpState &S, CodePtr OpPC,
                                                  const CallExpr *Call,
                                                  unsigned ID) {
@@ -3444,13 +3444,13 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
           return LHS.isSigned() ? LHS.ssub_sat(RHS) : LHS.usub_sat(RHS);
         });
 
-  case X86::BI__builtin_ia32_extract128i256:       // _mm256_extracti128
-  case X86::BI__builtin_ia32_vextractf128_pd256:   // _mm256_extractf128_ps
-  case X86::BI__builtin_ia32_vextractf128_ps256:   // _mm256_extractf128_pd
-  case X86::BI__builtin_ia32_vextractf128_si256:   // _mm256_extracti128_si256
+  case X86::BI__builtin_ia32_extract128i256:     // _mm256_extracti128
+  case X86::BI__builtin_ia32_vextractf128_pd256: // _mm256_extractf128_ps
+  case X86::BI__builtin_ia32_vextractf128_ps256: // _mm256_extractf128_pd
+  case X86::BI__builtin_ia32_vextractf128_si256: // _mm256_extracti128_si256
     return interp__builtin_x86_extract_vector(S, OpPC, Call, BuiltinID);
 
-  // AVX-512 / AVX-512VL / AVX-512DQ 
+  // AVX-512 / AVX-512VL / AVX-512DQ
   case X86::BI__builtin_ia32_extractf32x4_256_mask:
   case X86::BI__builtin_ia32_extractf32x4_mask:
   case X86::BI__builtin_ia32_extractf32x8_mask:
@@ -3465,7 +3465,6 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
   case X86::BI__builtin_ia32_extracti64x4_mask:
     return interp__builtin_x86_extract_vector_masked(S, OpPC, Call, BuiltinID);
 
-
   case clang::X86::BI__builtin_ia32_pavgb128:
   case clang::X86::BI__builtin_ia32_pavgw128:
   case clang::X86::BI__builtin_ia32_pavgb256:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 281f9a360..510011d0f 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -12037,56 +12037,59 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
     if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||
         !EvaluateAsRValue(Info, E->getArg(1), SourceImm))
       return false;
-    
+
     if (!SourceVec.isVector())
       return false;
 
     const auto *RetVT = E->getType()->castAs<VectorType>();
-    if (!RetVT) return false;
+    if (!RetVT)
+      return false;
 
     unsigned RetLen = RetVT->getNumElements();
     unsigned SrcLen = SourceVec.getVectorLength();
-    if (SrcLen != RetLen * 2) 
+    if (SrcLen != RetLen * 2)
       return false;
 
     unsigned Idx = SourceImm.getInt().getZExtValue() & 1;
-    
+
     SmallVector<APValue, 32> ResultElements;
     ResultElements.reserve(RetLen);
 
     for (unsigned I = 0; I < RetLen; I++)
       ResultElements.push_back(SourceVec.getVectorElt(Idx * RetLen + I));
-    
+
     return Success(APValue(ResultElements.data(), RetLen), E);
   }
 
-  case X86::BI__builtin_ia32_extracti32x4_256_mask: 
+  case X86::BI__builtin_ia32_extracti32x4_256_mask:
   case X86::BI__builtin_ia32_extractf32x4_256_mask:
-  case X86::BI__builtin_ia32_extracti32x4_mask:     
-  case X86::BI__builtin_ia32_extractf32x4_mask:   
-  case X86::BI__builtin_ia32_extracti32x8_mask:      
-  case X86::BI__builtin_ia32_extractf32x8_mask:     
-  case X86::BI__builtin_ia32_extracti64x2_256_mask: 
-  case X86::BI__builtin_ia32_extractf64x2_256_mask: 
-  case X86::BI__builtin_ia32_extracti64x2_512_mask: 
+  case X86::BI__builtin_ia32_extracti32x4_mask:
+  case X86::BI__builtin_ia32_extractf32x4_mask:
+  case X86::BI__builtin_ia32_extracti32x8_mask:
+  case X86::BI__builtin_ia32_extractf32x8_mask:
+  case X86::BI__builtin_ia32_extracti64x2_256_mask:
+  case X86::BI__builtin_ia32_extractf64x2_256_mask:
+  case X86::BI__builtin_ia32_extracti64x2_512_mask:
   case X86::BI__builtin_ia32_extractf64x2_512_mask:
   case X86::BI__builtin_ia32_extracti64x4_mask:
-  case X86::BI__builtin_ia32_extractf64x4_mask:{ 
+  case X86::BI__builtin_ia32_extractf64x4_mask: {
     APValue SourceVec, MergeVec;
     APSInt Imm, MaskImm;
 
-    if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||  
-      !EvaluateInteger(E->getArg(1), Imm, Info) ||  
-      !EvaluateAsRValue(Info, E->getArg(2), MergeVec)  ||  
-      !EvaluateInteger(E->getArg(3), MaskImm, Info))      
-    return false;
+    if (!EvaluateAsRValue(Info, E->getArg(0), SourceVec) ||
+        !EvaluateInteger(E->getArg(1), Imm, Info) ||
+        !EvaluateAsRValue(Info, E->getArg(2), MergeVec) ||
+        !EvaluateInteger(E->getArg(3), MaskImm, Info))
+      return false;
 
     const auto *RetVT = E->getType()->castAs<VectorType>();
     unsigned RetLen = RetVT->getNumElements();
 
-    if (!SourceVec.isVector() || !MergeVec.isVector()) return false;
+    if (!SourceVec.isVector() || !MergeVec.isVector())
+      return false;
     unsigned SrcLen = SourceVec.getVectorLength();
-    if (!SrcLen || !RetLen || (SrcLen % RetLen) != 0) return false;
+    if (!SrcLen || !RetLen || (SrcLen % RetLen) != 0)
+      return false;
 
     unsigned Lanes = SrcLen / RetLen;
     unsigned Lane = static_cast<unsigned>(Imm.getZExtValue() % Lanes);
@@ -12099,11 +12102,10 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
       if ((Mask >> I) & 1)
         ResultElements.push_back(SourceVec.getVectorElt(Base + I));
       else
-        ResultElements.push_back(MergeVec.getVectorElt(I)); 
+        ResultElements.push_back(MergeVec.getVectorElt(I));
     }
     return Success(APValue(ResultElements.data(), ResultElements.size()), E);
   }
-  
 
   case X86::BI__builtin_ia32_vpshldd128:
   case X86::BI__builtin_ia32_vpshldd256:
diff --git a/clang/lib/Headers/avx512dqintrin.h b/clang/lib/Headers/avx512dqintrin.h
index 0ff776b36..b2fb02ab1 100644
--- a/clang/lib/Headers/avx512dqintrin.h
+++ b/clang/lib/Headers/avx512dqintrin.h
@@ -1212,10 +1212,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
                                             (__v8di)_mm512_setzero_si512());
 }
 
-#define _mm512_extractf32x8_ps(A, imm) \
-  ((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm), \
-                                            (__v8sf)_mm256_setzero_ps(), \
-                                            (__mmask8)-1))
+#define _mm512_extractf32x8_ps(A, imm)                                         \
+  ((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm),  \
+                                            (__v8sf)_mm256_setzero_ps(),       \
+                                            (__mmask8) - 1))
 
 #define _mm512_mask_extractf32x8_ps(W, U, A, imm) \
   ((__m256)__builtin_ia32_extractf32x8_mask((__v16sf)(__m512)(A), (int)(imm), \
@@ -1227,11 +1227,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
                                             (__v8sf)_mm256_setzero_ps(), \
                                             (__mmask8)(U)))
 
-#define _mm512_extractf64x2_pd(A, imm) \
-  ((__m128d)__builtin_ia32_extractf64x2_512_mask((__v8df)(__m512d)(A), \
-                                                 (int)(imm), \
-                                                 (__v2df)_mm_setzero_pd(), \
-                                                 (__mmask8)-1))
+#define _mm512_extractf64x2_pd(A, imm)                                         \
+  ((__m128d)__builtin_ia32_extractf64x2_512_mask(                              \
+      (__v8df)(__m512d)(A), (int)(imm), (__v2df)_mm_setzero_pd(),              \
+      (__mmask8) - 1))
 
 #define _mm512_mask_extractf64x2_pd(W, U, A, imm) \
   ((__m128d)__builtin_ia32_extractf64x2_512_mask((__v8df)(__m512d)(A), \
@@ -1245,10 +1244,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
                                                  (__v2df)_mm_setzero_pd(), \
                                                  (__mmask8)(U)))
 
-#define _mm512_extracti32x8_epi32(A, imm) \
-  ((__m256i)__builtin_ia32_extracti32x8_mask((__v16si)(__m512i)(A), (int)(imm), \
-                                             (__v8si)_mm256_setzero_si256(), \
-                                             (__mmask8)-1))
+#define _mm512_extracti32x8_epi32(A, imm)                                      \
+  ((__m256i)__builtin_ia32_extracti32x8_mask(                                  \
+      (__v16si)(__m512i)(A), (int)(imm), (__v8si)_mm256_setzero_si256(),       \
+      (__mmask8) - 1))
 
 #define _mm512_mask_extracti32x8_epi32(W, U, A, imm) \
   ((__m256i)__builtin_ia32_extracti32x8_mask((__v16si)(__m512i)(A), (int)(imm), \
@@ -1260,11 +1259,10 @@ _mm512_maskz_broadcast_i64x2(__mmask8 __M, __m128i __A)
                                              (__v8si)_mm256_setzero_si256(), \
                                              (__mmask8)(U)))
 
-#define _mm512_extracti64x2_epi64(A, imm) \
-  ((__m128i)__builtin_ia32_extracti64x2_512_mask((__v8di)(__m512i)(A), \
-                                                (int)(imm), \
-                                                (__v2di)_mm_setzero_si128(), \
-                                                (__mmask8)-1))
+#define _mm512_extracti64x2_epi64(A, imm)                                      \
+  ((__m128i)__builtin_ia32_extracti64x2_512_mask(                              \
+      (__v8di)(__m512i)(A), (int)(imm), (__v2di)_mm_setzero_si128(),           \
+      (__mmask8) - 1))
 
 #define _mm512_mask_extracti64x2_epi64(W, U, A, imm) \
   ((__m128i)__builtin_ia32_extracti64x2_512_mask((__v8di)(__m512i)(A), \
diff --git a/clang/lib/Headers/avx512fintrin.h b/clang/lib/Headers/avx512fintrin.h
index 2768a5bae..13370aaef 100644
--- a/clang/lib/Headers/avx512fintrin.h
+++ b/clang/lib/Headers/avx512fintrin.h
@@ -3164,10 +3164,10 @@ _mm512_maskz_permutex2var_epi64(__mmask8 __U, __m512i __A, __m512i __I,
                                  (__v16si)_mm512_setzero_si512()))
 /* Vector Extract */
 
-#define _mm512_extractf64x4_pd(A, I) \
-  ((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(I), \
-                                             (__v4df)_mm256_setzero_pd(), \
-                                             (__mmask8)-1))
+#define _mm512_extractf64x4_pd(A, I)                                           \
+  ((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(I),   \
+                                             (__v4df)_mm256_setzero_pd(),      \
+                                             (__mmask8) - 1))
 
 #define _mm512_mask_extractf64x4_pd(W, U, A, imm) \
   ((__m256d)__builtin_ia32_extractf64x4_mask((__v8df)(__m512d)(A), (int)(imm), \
@@ -3179,10 +3179,10 @@ _mm512_maskz_permutex2var_epi64(__mmask8 __U, __m512i __A, __m512i __I,
                                              (__v4df)_mm256_setzero_pd(), \
                                              (__mmask8)(U)))
 
-#define _mm512_extractf32x4_ps(A, I) \
-  ((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(I), \
-                                            (__v4sf)_mm_setzero_ps(), \
-                                            (__mmask8)-1))
+#define _mm512_extractf32x4_ps(A, I)                                           \
+  ((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(I),    \
+                                            (__v4sf)_mm_setzero_ps(),          \
+                                            (__mmask8) - 1))
 
 #define _mm512_mask_extractf32x4_ps(W, U, A, imm) \
   ((__m128)__builtin_ia32_extractf32x4_mask((__v16sf)(__m512)(A), (int)(imm), \
@@ -7105,10 +7105,10 @@ _mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
   __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
 }
 
-#define _mm512_extracti32x4_epi32(A, imm) \
-  ((__m128i)__builtin_ia32_extracti32x4_mask((__v16si)(__m512i)(A), (int)(imm), \
-                                             (__v4si)_mm_setzero_si128(), \
-                                             (__mmask8)-1))
+#define _mm512_extracti32x4_epi32(A, imm)                                      \
+  ((__m128i)__builtin_ia32_extracti32x4_mask(                                  \
+      (__v16si)(__m512i)(A), (int)(imm), (__v4si)_mm_setzero_si128(),          \
+      (__mmask8) - 1))
 
 #define _mm512_mask_extracti32x4_epi32(W, U, A, imm) \
   ((__m128i)__builtin_ia32_extracti32x4_mask((__v16si)(__m512i)(A), (int)(imm), \
@@ -7120,10 +7120,10 @@ _mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
                                              (__v4si)_mm_setzero_si128(), \
                                              (__mmask8)(U)))
 
-#define _mm512_extracti64x4_epi64(A, imm) \
+#define _mm512_extracti64x4_epi64(A, imm)                                      \
   ((__m256i)__builtin_ia32_extracti64x4_mask((__v8di)(__m512i)(A), (int)(imm), \
-                                             (__v4di)_mm256_setzero_si256(), \
-                                             (__mmask8)-1))
+                                             (__v4di)_mm256_setzero_si256(),   \
+                                             (__mmask8) - 1))
 
 #define _mm512_mask_extracti64x4_epi64(W, U, A, imm) \
   ((__m256i)__builtin_ia32_extracti64x4_mask((__v8di)(__m512i)(A), (int)(imm), \
diff --git a/clang/lib/Headers/avx512vldqintrin.h b/clang/lib/Headers/avx512vldqintrin.h
index 2d3c4b551..8aded1c47 100644
--- a/clang/lib/Headers/avx512vldqintrin.h
+++ b/clang/lib/Headers/avx512vldqintrin.h
@@ -1072,11 +1072,10 @@ _mm256_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
                                             (__v4di)_mm256_setzero_si256());
 }
 
-#define _mm256_extractf64x2_pd(A, imm) \
-  ((__m128d)__builtin_ia32_extractf64x2_256_mask((__v4df)(__m256d)(A), \
-                                                 (int)(imm), \
-                                                 (__v2df)_mm_setzero_pd(), \
-                                                 (__mmask8)-1))
+#define _mm256_extractf64x2_pd(A, imm)                                         \
+  ((__m128d)__builtin_ia32_extractf64x2_256_mask(                              \
+      (__v4df)(__m256d)(A), (int)(imm), (__v2df)_mm_setzero_pd(),              \
+      (__mmask8) - 1))
 
 #define _mm256_mask_extractf64x2_pd(W, U, A, imm) \
   ((__m128d)__builtin_ia32_extractf64x2_256_mask((__v4df)(__m256d)(A), \
@@ -1090,11 +1089,10 @@ _mm256_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
                                                  (__v2df)_mm_setzero_pd(), \
                                                  (__mmask8)(U)))
 
-#define _mm256_extracti64x2_epi64(A, imm) \
-  ((__m128i)__builtin_ia32_extracti64x2_256_mask((__v4di)(__m256i)(A), \
-                                                (int)(imm), \
-                                                (__v2di)_mm_setzero_si128(), \
-                                                (__mmask8)-1))
+#define _mm256_extracti64x2_epi64(A, imm)                                      \
+  ((__m128i)__builtin_ia32_extracti64x2_256_mask(                              \
+      (__v4di)(__m256i)(A), (int)(imm), (__v2di)_mm_setzero_si128(),           \
+      (__mmask8) - 1))
 
 #define _mm256_mask_extracti64x2_epi64(W, U, A, imm) \
   ((__m128i)__builtin_ia32_extracti64x2_256_mask((__v4di)(__m256i)(A), \
diff --git a/clang/lib/Headers/avx512vlintrin.h b/clang/lib/Headers/avx512vlintrin.h
index 252fb1119..eefeb1dad 100644
--- a/clang/lib/Headers/avx512vlintrin.h
+++ b/clang/lib/Headers/avx512vlintrin.h
@@ -7606,11 +7606,10 @@ _mm256_mask_cvtepi64_storeu_epi16 (void * __P, __mmask8 __M, __m256i __A)
   __builtin_ia32_pmovqw256mem_mask ((__v8hi *) __P, (__v4di) __A, __M);
 }
 
-#define _mm256_extractf32x4_ps(A, imm) \
-  ((__m128)__builtin_ia32_extractf32x4_256_mask((__v8sf)(__m256)(A), \
-                                                (int)(imm), \
-                                                (__v4sf)_mm_setzero_ps(), \
-                                                (__mmask8)-1))
+#define _mm256_extractf32x4_ps(A, imm)                                         \
+  ((__m128)__builtin_ia32_extractf32x4_256_mask(                               \
+      (__v8sf)(__m256)(A), (int)(imm), (__v4sf)_mm_setzero_ps(),               \
+      (__mmask8) - 1))
 
 #define _mm256_mask_extractf32x4_ps(W, U, A, imm) \
   ((__m128)__builtin_ia32_extractf32x4_256_mask((__v8sf)(__m256)(A), \
@@ -7624,11 +7623,10 @@ _mm256_mask_cvtepi64_storeu_epi16 (void * __P, __mmask8 __M, __m256i __A)
                                                 (__v4sf)_mm_setzero_ps(), \
                                                 (__mmask8)(U)))
 
-#define _mm256_extracti32x4_epi32(A, imm) \
-  ((__m128i)__builtin_ia32_extracti32x4_256_mask((__v8si)(__m256i)(A), \
-                                                 (int)(imm), \
-                                                 (__v4si)_mm_setzero_si128(), \
-                                                 (__mmask8)-1))
+#define _mm256_extracti32x4_epi32(A, imm)                                      \
+  ((__m128i)__builtin_ia32_extracti32x4_256_mask(                              \
+      (__v8si)(__m256i)(A), (int)(imm), (__v4si)_mm_setzero_si128(),           \
+      (__mmask8) - 1))
 
 #define _mm256_mask_extracti32x4_epi32(W, U, A, imm) \
   ((__m128i)__builtin_ia32_extracti32x4_256_mask((__v8si)(__m256i)(A), \

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like a merge has gone wrong at some point - as you've altered a lot of recent code changes by other commits - you might have to consider a rebase or start a new PR

APSInt Val = popToAPSInt(S, Call->getArg(0));
pushInteger(S, Val.reverseBits(), Call->getType());
return true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these - they have been removed from a recent patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see — that makes sense. I must have messed up the merge at some point.
I’m still getting used to the rebase workflow, so mistakes like this happen sometimes.
I’ll fix it and clean up the patch soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, when merge conflicts like this happen, what’s the usual approach that people take?
Do most contributors usually rebase onto the latest upstream, or do they start a new PR from upstream/main and cherry-pick their commits?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing the merge is the best approach, rebasing usually means a lot of the review comments get lost which can be terrible for continuing the review for some patches - often at the point its better not to rebase, create a new PR and refer to the old one for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance and your patience. This branch has become difficult to manage, so I'll open a fresh PR and try again.

SeongjaeP and others added 12 commits October 10, 2025 10:55
Implements constexpr evaluation for:
- _mm256_extracti128_si256 (AVX2, VEXTRACTI128)
- _mm256_extractf128_ps
- _mm256_extractf128_pd
- _mm256_extractf128_si256

These now work correctly in constant expressions by extracting
the appropriate 128-bit lane from a 256-bit vector.
std::realloc is declared there
…uiltins in InterpBuiltin

- Route AVX/AVX2 vextractf128/ extract128i256 to 2-arg extract helper.
- Route all AVX-512(VL/DQ) extract builtins to unified 4-arg masked helper:
  * extractf32x4_{256,_}
  * extractf32x8_
  * extractf64x2_{256,512}
  * extractf64x4_
  * extracti32x4_{256,_}
  * extracti32x8_
  * extracti64x2_{256,512}
  * extracti64x4_
- Implement mask/merge/all-ones(mask=plain)/maskz semantics.
- Initialize all elements in the destination vector.

NOTE:
  Tests are not included yet. This patch wires up InterpBuiltin support only.
  A follow-up patch will add constexpr tests under clang/test/AST/Interp/.
@SeongjaeP SeongjaeP force-pushed the issue157712-avx-extract branch from 21f366b to 24e06be Compare October 10, 2025 01:56
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterpBuiltin.cpp is still really broken

@RKSimon RKSimon closed this Oct 12, 2025
RKSimon pushed a commit that referenced this pull request Oct 20, 2025
… AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 (#162836)

**This PR supersedes and replaces PR #158853**

The original branch diverged too far from the main branch, resulting in
significant merge conflicts that were difficult to resolve cleanly. To
provide a clean and reviewable history, this new PR was created by
cherry-picking the necessary commits onto a fresh branch based on the
latest `main`.

---

*(Original Description)*

This patch enables the use of AVX/AVX512 subvector extraction intrinsics
within `constexpr` functions. This is achieved by implementing the
evaluation logic for these intrinsics in
`VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`.

The original discussion and review comments can be found in the previous
pull request for context: #158853

Fixes #157712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:bytecode Issues for the clang bytecode constexpr interpreter clang:frontend Language frontend issues, e.g. anything involving "Sema" constexpr Anything related to constant evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX/AVX512 subvector extraction intrinsics to be used in constexpr

4 participants