Enable automatic endianness conversion, SIMD & MASS support #4

VyoJ · 2025-07-31T11:50:28Z

Summary

This PR adds support for automatically converting models to the platform's native endianness at runtime and support for some z/OS specific architectural features.

Changes Made

Build Configuration (buildenv):

Added support for SIMD through flags in ZOPEN_CONFIGURE_OPTS
Fixed issues with CURL and the SSL certificate by adding a build-time environment variable
Fixed some errors with the regex for parsing the tests log file to give accurate test results
Fixed some corrupted patches

Patches

Endianness Changes

Referred to the fork and PR made by Aleksei Nikiforov from IBM for enabling s390x (specifically Linux on IBM Z) to load little endian models unmodified
Implemented all his changes (as patches across 15-20 files) with the latest commits from llama.cpp with some z/OS specific changes like using the __builtin_bswap functions in the compiler.
Verified working with most types of quantized models on both little endian and big endian platforms
Faced issues with byteswapping layers in Q4_K type quantized models due to a new feature in llama.cpp called repacking
Temporarily disabled repacking for now (affects only Q4 series models) as the performance degradation on standard CPU-only machines was minimal (~2%)

SIMD Changes

It relies on vec_xl, vec_xst, vec_reve, vec_extract, vec_insert and vec_splat to manipulate vectorized data.
Accelerated tensor operations, dot products and activation functions with ~9% improvement in prompt processing.

MASS changes

Used z/OS MASS Scalar library to speed up mathematical operations for faster outputs
By replacing inbuilt math functions such as sin, cos, exp, log with MASS equivalents, we observed an increase of 0.5-0.7 tokens/sec inference throughput.

Testing

Ran llama.cpp's test suite - 92.11% of tests are passing now:

test-chat case was failing previously too
test-tokenizers-ggml-vocabs and test-thread-safety are new test cases
test-thread-safety doesn't make sense for us currently because it's testing with GPU from what I can see

…atches

…k log. 92% tests passing!

IgorTodorovskiIBM · 2025-07-31T20:25:03Z

buildenv

-# rm -f "llama"
-# ln -s "llama.cpp" "llama"
-# ln -s "llama.cpp" $ZOPEN_NAME
+export ZOPEN_SKIP_ZOSLIB_ENV_HOOK=1


Do you have more info on why this was needed?

Charan tried this to restrict clang cmake flags while integrating SIMD / MASS.
It's not required and I verified this by removing it, building afresh and testing all the functionalities.
I have removed it in the latest commit.

IgorTodorovskiIBM · 2025-07-31T20:25:59Z

examples/frontend/src/components/ChatMode.js

            disabled={isTyping}
          >
-            â¤
+            Ã¢ÂÂ¤


Is this change intentional?

No, it's some kind of encoding issue with the send symbol on the demo website (I have attached a picture of it)

We weren't able to git restore the file to its original state, and it seemed to be a one-time issue with encoding that character so we just committed it for now.

IgorTodorovskiIBM · 2025-07-31T20:28:33Z

patches/repack.cpp.patch

+     // instance for IQ4
+     static const ggml::cpu::repack::tensor_traits<block_iq4_nl, 4, 4, GGML_TYPE_Q8_0> iq4_nl_4x4_q8_0;
+
+-    if (cur->type == GGML_TYPE_Q4_0) {


Perhaps we can guard this with #ifndef __MVS__ instead of commenting it?

Yep, that's a much better approach, I have integrated this change in the latest commit. Thanks!

IgorTodorovskiIBM

LGTM

VyoJ added 4 commits July 31, 2025 03:29

Working patches with latest branch and z/OS specific formatting for p…

d4b22d3

…atches

Auto-endianness integrated!

7372716

Fixed curl test issues and added Charan's patches for SIMD and MASS

14ee1cd

Fixed build warnings, updated test case regex to correctly parse chec…

b52f31b

…k log. 92% tests passing!

IgorTodorovskiIBM reviewed Jul 31, 2025

View reviewed changes

Implemented fixes suggested in PR review

5d337b7

IgorTodorovskiIBM approved these changes Aug 6, 2025

View reviewed changes

IgorTodorovskiIBM merged commit b73e203 into zopencommunity:main Aug 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable automatic endianness conversion, SIMD & MASS support #4

Enable automatic endianness conversion, SIMD & MASS support #4

Uh oh!

VyoJ commented Jul 31, 2025

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Uh oh!

VyoJ Aug 1, 2025

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Uh oh!

VyoJ Aug 1, 2025

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Uh oh!

VyoJ Aug 1, 2025

Uh oh!

IgorTodorovskiIBM left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable automatic endianness conversion, SIMD & MASS support #4

Enable automatic endianness conversion, SIMD & MASS support #4

Uh oh!

Conversation

VyoJ commented Jul 31, 2025

Summary

Changes Made

Build Configuration (buildenv):

Patches

Endianness Changes

SIMD Changes

MASS changes

Testing

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

VyoJ Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

VyoJ Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

IgorTodorovskiIBM Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

VyoJ Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

IgorTodorovskiIBM left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants