Skip to content

Conversation

@VyoJ
Copy link
Contributor

@VyoJ VyoJ commented Jul 31, 2025

Summary

This PR adds support for automatically converting models to the platform's native endianness at runtime and support for some z/OS specific architectural features.

Changes Made

Build Configuration (buildenv):

  • Added support for SIMD through flags in ZOPEN_CONFIGURE_OPTS
  • Fixed issues with CURL and the SSL certificate by adding a build-time environment variable
  • Fixed some errors with the regex for parsing the tests log file to give accurate test results
  • Fixed some corrupted patches

Patches

Endianness Changes

  • Referred to the fork and PR made by Aleksei Nikiforov from IBM for enabling s390x (specifically Linux on IBM Z) to load little endian models unmodified
  • Implemented all his changes (as patches across 15-20 files) with the latest commits from llama.cpp with some z/OS specific changes like using the __builtin_bswap functions in the compiler.
  • Verified working with most types of quantized models on both little endian and big endian platforms
  • Faced issues with byteswapping layers in Q4_K type quantized models due to a new feature in llama.cpp called repacking
  • Temporarily disabled repacking for now (affects only Q4 series models) as the performance degradation on standard CPU-only machines was minimal (~2%)

SIMD Changes

  • It relies on vec_xl, vec_xst, vec_reve, vec_extract, vec_insert and vec_splat to manipulate vectorized data.
  • Accelerated tensor operations, dot products and activation functions with ~9% improvement in prompt processing.

MASS changes

  • Used z/OS MASS Scalar library to speed up mathematical operations for faster outputs
  • By replacing inbuilt math functions such as sin, cos, exp, log with MASS equivalents, we observed an increase of 0.5-0.7 tokens/sec inference throughput.

Testing

Ran llama.cpp's test suite - 92.11% of tests are passing now:

  • test-chat case was failing previously too
  • test-tokenizers-ggml-vocabs and test-thread-safety are new test cases
  • test-thread-safety doesn't make sense for us currently because it's testing with GPU from what I can see

buildenv Outdated
# rm -f "llama"
# ln -s "llama.cpp" "llama"
# ln -s "llama.cpp" $ZOPEN_NAME
export ZOPEN_SKIP_ZOSLIB_ENV_HOOK=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have more info on why this was needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charan tried this to restrict clang cmake flags while integrating SIMD / MASS.
It's not required and I verified this by removing it, building afresh and testing all the functionalities.
I have removed it in the latest commit.

disabled={isTyping}
>
➤
➤
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's some kind of encoding issue with the send symbol on the demo website (I have attached a picture of it)

We weren't able to git restore the file to its original state, and it seemed to be a one-time issue with encoding that character so we just committed it for now.

image

// instance for IQ4
static const ggml::cpu::repack::tensor_traits<block_iq4_nl, 4, 4, GGML_TYPE_Q8_0> iq4_nl_4x4_q8_0;

- if (cur->type == GGML_TYPE_Q4_0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can guard this with #ifndef __MVS__ instead of commenting it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a much better approach, I have integrated this change in the latest commit. Thanks!

Copy link
Contributor

@IgorTodorovskiIBM IgorTodorovskiIBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@IgorTodorovskiIBM IgorTodorovskiIBM merged commit b73e203 into zopencommunity:main Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants