Skip to content

Commit 634b18f

Browse files
Do not use hadd_pd to implement reduce_add
It's generally slower than the sse2 version due to a latency of 5 (!) for hadd_pd. Related to #1107
1 parent e4f0f96 commit 634b18f

File tree

1 file changed

+0
-6
lines changed

1 file changed

+0
-6
lines changed

include/xsimd/arch/xsimd_sse3.hpp

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,6 @@ namespace xsimd
5050
__m128 tmp1 = _mm_hadd_ps(tmp0, tmp0);
5151
return _mm_cvtss_f32(tmp1);
5252
}
53-
template <class A>
54-
XSIMD_INLINE double reduce_add(batch<double, A> const& self, requires_arch<sse3>) noexcept
55-
{
56-
__m128d tmp0 = _mm_hadd_pd(self, self);
57-
return _mm_cvtsd_f64(tmp0);
58-
}
5953

6054
}
6155

0 commit comments

Comments
 (0)