-
Notifications
You must be signed in to change notification settings - Fork 0
Pull requests: bugparty/cpu_math_kernels_pri
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
⚡ Thunderbolt: max_v4 — 16x Unrolled AVX2 Max Reduction
#45
opened May 31, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: Softmax — 8x Unroll Max and Norm Phases
#44
opened May 29, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: Softmax — Combine FMA and asymmetric unroll
#43
opened May 26, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — Single FMA for ln(2) range reduction
#42
opened May 25, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — Single-FMA shift-invariant exp range reduction
#41
opened May 23, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax — Single FMA Range Reduction
#40
opened May 22, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — Single-FMA range reduction and 8x unroll
#39
opened May 20, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: Softmax — Single-FMA Range Reduction
#38
opened May 19, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — FMA-fused exp range reduction and 8x max unroll
#37
opened May 17, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — AVX-512 Vectorized Softmax
#36
opened May 8, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: softmax_v6 — AVX2 explicit instruction interleaving
#35
opened May 7, 2026 by
bugparty
Owner
Loading…
⚡ Thunderbolt: ReLU — Masked vector epilogue and 8x unroll
#34
opened May 6, 2026 by
bugparty
Owner
Loading…
ProTip!
Follow long discussions with comments:>50.