You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Translate SIMD construction as insertelements and a single store.
This almost completely avoids GEPi's and pointer manipulation,
postponing it until the end with one big write of the whole vector. This
leads to a small speed-up in compilation, and makes it easier for LLVM
to work with the values, e.g. with `--opt-level=0`,
pub fn foo() -> f32x4 {
f32x4(0.,0.,0.,0.)
}
was previously compiled to
define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
entry-block:
%sret_slot = alloca <4 x float>
%0 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 0
store float 0.000000e+00, float* %0
%1 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 1
store float 0.000000e+00, float* %1
%2 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 2
store float 0.000000e+00, float* %2
%3 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 3
store float 0.000000e+00, float* %3
%4 = load <4 x float>* %sret_slot
ret <4 x float> %4
}
but now becomes
define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
entry-block:
ret <4 x float> zeroinitializer
}
0 commit comments