Much faster prompt processing for I-quants (ARM_NEON) #550

ikawrakow · 2025-06-23T13:42:48Z

It is time to give some attention to the ARM_NEON back-end, which has fallen behind quite a bit.

This PR corresponds to PRs #531, #533, #534, #546, #549, and applies the on-the-fly repacking technique to i-quants (IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S) for the ARM_NEON implementation.

Here is a PP-512 performance comparison between the main branch and this PR for LlaMA-3.1-8B-Instruct on M2-Max

type	t/s (main)	t/s (PR)	Speedup
IQ2_XXS	55.79	167.55	3.003
IQ2_XS	46.40	166.65	3.592
IQ2_S	42.75	166.83	3.903
IQ3_XXS	51.84	165.56	3.194
IQ3_S	46.02	162.03	3.521

At this point i- and IQK quants are the top tier quants for prompt processing speed on ARM_NEON.

55.8 -> 167.5 t/s. iq2_xxs is at 93.7 t/s

46.4 -> 166.6 t/s. iq2_xs_r4 is at 72.3 t/s.

42.8 t/s -> 166.8 t/s. iq2_s_r4 is at 71.1 t/s.

51.8 t/s -> 165.6 t/s. iq3_xxs_r4 is at 84.6 t/s.

46.0 t/s -> 162.0 t/s. iq3_s_r4 is at 79.4 t/s

Iwan Kawrakow added 5 commits June 23, 2025 13:50

iq2_xxs

edb5f9c

55.8 -> 167.5 t/s. iq2_xxs is at 93.7 t/s

iq2_xs

8b33186

46.4 -> 166.6 t/s. iq2_xs_r4 is at 72.3 t/s.

iq2_s

c52f589

42.8 t/s -> 166.8 t/s. iq2_s_r4 is at 71.1 t/s.

iq3_xxs

2696567

51.8 t/s -> 165.6 t/s. iq3_xxs_r4 is at 84.6 t/s.

iq3_s

548a5f3

46.0 t/s -> 162.0 t/s. iq3_s_r4 is at 79.4 t/s

ikawrakow merged commit ddda4d9 into main Jun 23, 2025

This was referenced Jun 24, 2025

Much faster prompt processing for k-quants (ARM_NEON) #552

Merged

Much faster prompt processing for IQ1_S and IQ1_M on ARM_NEON #553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Much faster prompt processing for I-quants (ARM_NEON) #550

Much faster prompt processing for I-quants (ARM_NEON) #550

Uh oh!

ikawrakow commented Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Much faster prompt processing for I-quants (ARM_NEON) #550

Much faster prompt processing for I-quants (ARM_NEON) #550

Uh oh!

Conversation

ikawrakow commented Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants