-
Notifications
You must be signed in to change notification settings - Fork 110
Description
Describe what you are looking for
LASX provides 256-bit SIMD with 32 vector registers, analogous to AVX2. Its standout strength is widening integer multiply-accumulate via even/odd split (xvmaddwev_h_b / xvmaddwod_h_b for i8->i16, then xvmaddwev_w_h / xvmaddwod_w_h into i32). No native FP16 or BF16 β only f32 and f64 hardware floats.
dot/ and dots/
The highest-value new kernels are nk_dot_i8_loongapx and nk_dot_u8_loongapx. The widening even/odd multiply-add pair processes 32 i8 elements per 256-bit iteration into i32 accumulators, matching AVX2 throughput. Sub-byte i4/u4 use xvandi_b + shift for nibble extraction, then the same widening chain.
nk_dot_f32_loongapx uses xvfmadd_s directly. nk_dot_bf16_loongapx needs manual upcast: unpack via xvilvl_h / xvilvh_h with zero, left-shift 16 via xvslli_w, reinterpret as f32. FP16 requires a more involved software conversion (sign-extend exponent, shift mantissa). The e4m3/e5m2 float8 types need LUT-based conversion to f32, similar to Haswell. Batched dots/ variants replicate accumulators across output lanes with the same arithmetic.
Complex dot products (f32c, bf16c) use xvfmul_s + xvxor_v + xvfadd_s for the delayed sign-flip pattern.
spatial/ and spatials/
nk_euclidean_f32_loongapx uses xvfsub_s + xvfmul_s + xvfadd_s. The i8 Euclidean variant benefits most β subtract then widen-multiply the difference with itself through the even/odd chain. Cosine kernels run three accumulators (ab, a^2, b^2) in parallel; LASX's 32 registers handle this comfortably.
BF16/FP16 spatial kernels apply the same manual upcast as dot before the subtract-square-accumulate sequence. Batched spatials/ tiles fit well at 8 f32 lanes per accumulator, 4 accumulators per tile row.
set/ and sets/
nk_hamming_u1_loongapx and nk_jaccard_u1_loongapx use xvxor_v + xvpcnt_b + horizontal sum. Batched sets/ variants replicate accumulators with the same pattern.
Can you contribute to the implementation?
- I can contribute
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct