Skip to content

Feature: Loongson LAPX backendΒ #317

@ashvardanian

Description

@ashvardanian

Describe what you are looking for

LASX provides 256-bit SIMD with 32 vector registers, analogous to AVX2. Its standout strength is widening integer multiply-accumulate via even/odd split (xvmaddwev_h_b / xvmaddwod_h_b for i8->i16, then xvmaddwev_w_h / xvmaddwod_w_h into i32). No native FP16 or BF16 β€” only f32 and f64 hardware floats.

dot/ and dots/

The highest-value new kernels are nk_dot_i8_loongapx and nk_dot_u8_loongapx. The widening even/odd multiply-add pair processes 32 i8 elements per 256-bit iteration into i32 accumulators, matching AVX2 throughput. Sub-byte i4/u4 use xvandi_b + shift for nibble extraction, then the same widening chain.

nk_dot_f32_loongapx uses xvfmadd_s directly. nk_dot_bf16_loongapx needs manual upcast: unpack via xvilvl_h / xvilvh_h with zero, left-shift 16 via xvslli_w, reinterpret as f32. FP16 requires a more involved software conversion (sign-extend exponent, shift mantissa). The e4m3/e5m2 float8 types need LUT-based conversion to f32, similar to Haswell. Batched dots/ variants replicate accumulators across output lanes with the same arithmetic.

Complex dot products (f32c, bf16c) use xvfmul_s + xvxor_v + xvfadd_s for the delayed sign-flip pattern.

spatial/ and spatials/

nk_euclidean_f32_loongapx uses xvfsub_s + xvfmul_s + xvfadd_s. The i8 Euclidean variant benefits most β€” subtract then widen-multiply the difference with itself through the even/odd chain. Cosine kernels run three accumulators (ab, a^2, b^2) in parallel; LASX's 32 registers handle this comfortably.

BF16/FP16 spatial kernels apply the same manual upcast as dot before the subtract-square-accumulate sequence. Batched spatials/ tiles fit well at 8 f32 lanes per accumulator, 4 accumulators per tile row.

set/ and sets/

nk_hamming_u1_loongapx and nk_jaccard_u1_loongapx use xvxor_v + xvpcnt_b + horizontal sum. Batched sets/ variants replicate accumulators with the same pattern.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions