Skip to content

Conversation

angt
Copy link
Collaborator

@angt angt commented Dec 17, 2024

Same output and same perf:

| Model         |   Threads | Test   |   t/s master |   t/s ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 |   Speedup |
|:--------------|----------:|:-------|-------------:|---------------------------------------------------------------------------:|----------:|
| llama 1B Q4_0 |         2 | pp512  |       155.22 |                                                                     156.98 |      1.01 |
| llama 1B Q4_0 |         2 | tg128  |        33.96 |                                                                      33.65 |      0.99 |
| llama 1B Q4_0 |         4 | pp512  |       312.66 |                                                                     313.25 |      1.00 |
| llama 1B Q4_0 |         4 | tg128  |        60.70 |                                                                      60.31 |      0.99 |
| llama 1B Q4_0 |         8 | pp512  |       585.92 |                                                                     578.52 |      0.99 |
| llama 1B Q4_0 |         8 | tg128  |        93.77 |                                                                      94.56 |      1.01 |
| llama 1B Q4_0 |        16 | pp512  |       966.15 |                                                                     952.94 |      0.99 |
| llama 1B Q4_0 |        16 | tg128  |        67.38 |                                                                      68.80 |      1.02 |
| llama 3B Q4_0 |         2 | pp512  |        58.62 |                                                                      58.69 |      1.00 |
| llama 3B Q4_0 |         2 | tg128  |        15.16 |                                                                      14.72 |      0.97 |
| llama 3B Q4_0 |         4 | pp512  |       118.17 |                                                                     117.87 |      1.00 |
| llama 3B Q4_0 |         4 | tg128  |        26.93 |                                                                      27.22 |      1.01 |
| llama 3B Q4_0 |         8 | pp512  |       221.48 |                                                                     221.54 |      1.00 |
| llama 3B Q4_0 |         8 | tg128  |        42.32 |                                                                      44.05 |      1.04 |
| llama 3B Q4_0 |        16 | pp512  |       360.97 |                                                                     363.18 |      1.01 |
| llama 3B Q4_0 |        16 | tg128  |        32.99 |                                                                      32.78 |      0.99 |

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 17, 2024
@angt angt force-pushed the ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 branch from b283fc3 to 2cfbccd Compare December 19, 2024 09:44
@slaren slaren merged commit e34c5af into ggml-org:master Dec 20, 2024
48 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
…() (ggml-org#10874)

* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml-cpu: format code

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
…() (ggml-org#10874)

* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml-cpu: format code

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
…() (ggml-org#10874)

* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml-cpu: format code

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants