-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
Compiling the code
#include <immintrin.h>
template<class> struct Zero;
template<> struct Zero<__m256> {
[[gnu::target("avx")]]
[[gnu::always_inline]]
inline __m256 operator()() const {
return _mm256_setzero_ps();
}
};
template<> struct Zero<__m256d> {
[[gnu::target("avx2")]]
[[gnu::always_inline]]
inline __m256d operator()() const {
return _mm256_setzero_pd();
}
};
template<class T>
[[gnu::always_inline]]
static inline auto bar(T gen) {
return gen();
}
[[gnu::target("avx")]]
void foo() {
bar(Zero<__m256>());
}
fails with
<source>:29:2: error: AVX vector return of type '__m256' (vector of 8 'float' values) without 'avx' enabled changes the ABI
29 | bar(Zero<__m256>());
| ^
<source>:24:9: error: always_inline function 'operator()' requires target feature 'avx', but would be inlined into function 'bar' that is compiled without support for 'avx'
24 | return gen();
| ^
<source>:24:9: error: AVX vector return of type '__m256' (vector of 8 'float' values) without 'avx' enabled changes the ABI
but I don't see any reasonable annotation that could possibly be placed on bar, nor why this should be necessary at all.
Fundamentally, the target attribute is only required during codegen, for functions whose target cannot be inferred. But bar is always_inline with internal linkage, so unless its address is taken, it should not care what the target architecture is -- determining that is only the concern of the first non-inline caller, for which codegen actually occurs.
Moreover, I think the same could apply for any function whose initial declaration is also a definition. When a function's body is always available, its target attribute could always be safely inferred as the union of the targets of all its callees, and thus making its specification mandatory only introduces friction.
I feel this is a severe design limitation -- every function, even a template, is being required to know the targets of its callees, but it has no reasonable way to tell what those might be. This makes it impossible to build abstractions over SIMD code without severe code duplication.
Could the insistence on target therefore be relaxed?