Commit af7747c
ggml-cpu: Support s390x SIMD Instruction Set (ggml-org#12019)
* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <[email protected]>
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <[email protected]>
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <[email protected]>
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <[email protected]>
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <[email protected]>
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <[email protected]>
* ggml: remove test.py
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <[email protected]>
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <[email protected]>
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <[email protected]>
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <[email protected]>
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <[email protected]>
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <[email protected]>
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <[email protected]>
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <[email protected]>
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <[email protected]>
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <[email protected]>
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <[email protected]>
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <[email protected]>
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <[email protected]>
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <[email protected]>
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <[email protected]>
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <[email protected]>
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <[email protected]>
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <[email protected]>
* ggml: resolve pr merge via cherry-pick 225bbbf
Signed-off-by: Aaron Teo <[email protected]>
* ggml : fix LoongArch compile error with 128-bit SIMD (ggml-org#11701)
* ggml: resolve pr merge via cherry-pick 4571953
Signed-off-by: Aaron Teo <[email protected]>
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <[email protected]>
---------
Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Jinyang He <[email protected]>
Co-authored-by: junchao-zhao <[email protected]>1 parent a28e0d5 commit af7747c
File tree
8 files changed
+826
-1
lines changed- ggml
- include
- src
- ggml-cpu
8 files changed
+826
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
102 | 103 | | |
103 | 104 | | |
104 | 105 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
310 | 310 | | |
311 | 311 | | |
312 | 312 | | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
313 | 334 | | |
314 | 335 | | |
315 | 336 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
62 | 71 | | |
63 | 72 | | |
64 | 73 | | |
| |||
359 | 368 | | |
360 | 369 | | |
361 | 370 | | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
362 | 513 | | |
363 | 514 | | |
364 | 515 | | |
| |||
0 commit comments