Skip to content

Commit ebd78cc

Browse files
committed
Merge branch 'master' into smallthinker
2 parents e28d2c5 + 446595b commit ebd78cc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+16953
-14031
lines changed

.devops/musa.Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
ARG UBUNTU_VERSION=22.04
22
# This needs to generally match the container host's environment.
3-
ARG MUSA_VERSION=rc4.0.1
3+
ARG MUSA_VERSION=rc4.2.0
44
# Target the MUSA build image
5-
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-devel-ubuntu${UBUNTU_VERSION}
5+
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}-amd64
66

7-
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-runtime-ubuntu${UBUNTU_VERSION}
7+
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}-amd64
88

99
FROM ${BASE_MUSA_DEV_CONTAINER} AS build
1010

.devops/rocm.Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
ARG UBUNTU_VERSION=24.04
22

33
# This needs to generally match the container host's environment.
4-
ARG ROCM_VERSION=6.3
5-
ARG AMDGPU_VERSION=6.3
4+
ARG ROCM_VERSION=6.4
5+
ARG AMDGPU_VERSION=6.4
66

77
# Target the CUDA build image
88
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -515,7 +515,7 @@ jobs:
515515
516516
ubuntu-22-cmake-musa:
517517
runs-on: ubuntu-22.04
518-
container: mthreads/musa:rc4.0.1-mudnn-devel-ubuntu22.04
518+
container: mthreads/musa:rc4.2.0-devel-ubuntu22.04-amd64
519519

520520
steps:
521521
- name: Clone

ci/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ docker run --privileged -it \
5454
-v $HOME/llama.cpp/ci-cache:/ci-cache \
5555
-v $HOME/llama.cpp/ci-results:/ci-results \
5656
-v $PWD:/ws -w /ws \
57-
mthreads/musa:rc4.0.1-mudnn-devel-ubuntu22.04
57+
mthreads/musa:rc4.2.0-devel-ubuntu22.04-amd64
5858
```
5959

6060
Inside the container, execute the following commands:

docs/build-s390x.md

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,14 @@ cmake --build build --config Release -j $(nproc)
4242
cmake --build build --config Release -j $(nproc)
4343
```
4444

45-
- By default, NNPA is enabled when available. To disable it (not recommended):
45+
- By default, NNPA is disabled by default. To enable it:
4646

4747
```bash
4848
cmake -S . -B build \
4949
-DCMAKE_BUILD_TYPE=Release \
5050
-DGGML_BLAS=ON \
5151
-DGGML_BLAS_VENDOR=OpenBLAS \
52-
-DGGML_NNPA=OFF
52+
-DGGML_NNPA=ON
5353
5454
cmake --build build --config Release -j $(nproc)
5555
```
@@ -84,16 +84,24 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
8484

8585
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
8686

87-
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
87+
You can find popular models pre-converted and verified at [s390x Verified Models](https://huggingface.co/collections/taronaeo/s390x-verified-models-672765393af438d0ccb72a08) or [s390x Runnable Models](https://huggingface.co/collections/taronaeo/s390x-runnable-models-686e951824198df12416017e).
8888

89-
These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
89+
These models have already been converted from `safetensors` to `GGUF` Big-Endian and their respective tokenizers verified to run correctly on IBM z15 and later system.
9090

9191
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
9292

9393
![File Type - safetensors](https://img.shields.io/badge/File_Type-safetensors-da1e28)
9494

9595
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
9696

97+
Ensure that you have installed the required packages in advance
98+
99+
```bash
100+
pip3 install -r requirements.txt
101+
```
102+
103+
Convert the `safetensors` model to `GGUF`
104+
97105
```bash
98106
python3 convert_hf_to_gguf.py \
99107
--outfile model-name-be.f16.gguf \
@@ -116,7 +124,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
116124

117125
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
118126

119-
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
127+
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B GGUF](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
120128

121129
```bash
122130
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
@@ -141,15 +149,15 @@ Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by
141149

142150
### 2. NNPA Vector Intrinsics Acceleration
143151

144-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
152+
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
145153

146154
### 3. zDNN Accelerator
147155

148-
_Only available in IBM z16 or later system. No direction at the moment._
156+
_Only available in IBM z16 / LinuxONE 4 or later system. No support currently available._
149157

150158
### 4. Spyre Accelerator
151159

152-
_No direction at the moment._
160+
_Only available with IBM z17 / LinuxONE 5 or later system. No support currently available._
153161

154162
## Performance Tuning
155163

@@ -189,6 +197,26 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
189197

190198
Answer: Please ensure that your GCC compiler is of minimum GCC 15.1.0 version, and have `binutils` updated to the latest version. If this does not fix the problem, kindly open an issue.
191199

200+
4. Failing to install the `sentencepiece` package using GCC 15+
201+
202+
Answer: The `sentencepiece` team are aware of this as seen in [this issue](https://github.com/google/sentencepiece/issues/1108).
203+
204+
As a temporary workaround, please run the installation command with the following environment variables.
205+
206+
```bash
207+
export CXXFLAGS="-include cstdint"
208+
```
209+
210+
For example,
211+
212+
```bash
213+
CXXFLAGS="-include cstdint" pip3 install -r requirements.txt
214+
```
215+
216+
5. `-DGGML_NNPA=ON` generates gibberish output
217+
218+
Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
219+
192220
## Getting Help on IBM Z & LinuxONE
193221

194222
1. **Bugs, Feature Requests**
@@ -244,3 +272,5 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
244272
- ✅ - acceleration available
245273
- 🚫 - acceleration unavailable, will still run using scalar implementation
246274
- ❓ - acceleration unknown, please contribute if you can test it yourself
275+
276+
Last Updated by **Aaron Teo ([email protected])** on July 25, 2025.

docs/build.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ cmake --build build --config Release
6868
cmake --build build-x64-windows-llvm-release
6969
```
7070
- Curl usage is enabled by default and can be turned off with `-DLLAMA_CURL=OFF`. Otherwise you need to install development libraries for libcurl.
71+
- **Debian / Ubuntu:** `sudo apt-get install libcurl4-openssl-dev` # (or `libcurl4-gnutls-dev` if you prefer GnuTLS)
72+
- **Fedora / RHEL / Rocky / Alma:** `sudo dnf install libcurl-devel`
73+
- **Arch / Manjaro:** `sudo pacman -S curl` # includes libcurl headers
7174
7275
## BLAS Build
7376

docs/development/HOWTO-add-model.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,19 @@ The convert script reads the model configuration, tokenizer, tensor names+data a
2323

2424
The required steps to implement for an HF model are:
2525

26-
1. Define the model `Model.register` annotation in a new `Model` subclass, example:
26+
1. Define the model `ModelBase.register` annotation in a new `TextModel` or `MmprojModel` subclass, example:
2727

2828
```python
29-
@Model.register("MyModelForCausalLM")
30-
class MyModel(Model):
29+
@ModelBase.register("MyModelForCausalLM")
30+
class MyModel(TextModel):
31+
model_arch = gguf.MODEL_ARCH.MYMODEL
32+
```
33+
34+
or
35+
36+
```python
37+
@ModelBase.register("MyModelForConditionalGeneration")
38+
class MyModel(MmprojModel):
3139
model_arch = gguf.MODEL_ARCH.MYMODEL
3240
```
3341

@@ -75,9 +83,10 @@ block_mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
7583
`transformer.blocks.{bid}.norm_1` will be mapped to `blk.{bid}.attn_norm` in GGUF.
7684

7785
Depending on the model configuration, tokenizer, code and tensors layout, you will have to override:
78-
- `Model#set_gguf_parameters`
79-
- `Model#set_vocab`
80-
- `Model#write_tensors`
86+
- `TextModel#set_gguf_parameters`
87+
- `MmprojModel#set_gguf_parameters`
88+
- `ModelBase#set_vocab`
89+
- `ModelBase#modify_tensors`
8190

8291
NOTE: Tensor names must end with `.weight` or `.bias` suffixes, that is the convention and several tools like `quantize` expect this to proceed the weights.
8392

docs/docker.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ You may want to pass in some different `ARGS`, depending on the MUSA environment
110110

111111
The defaults are:
112112

113-
- `MUSA_VERSION` set to `rc4.0.1`
113+
- `MUSA_VERSION` set to `rc4.2.0`
114114

115115
The resulting images, are essentially the same as the non-MUSA images:
116116

docs/ops.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
List of GGML operations and backend support status.
44

5+
## How to add a backend to this table:
6+
7+
1. Run `test-backend-ops support --output csv` with your backend name and redirect output to a csv file in `docs/ops/` (e.g., `docs/ops/CUDA.csv`)
8+
2. Regenerate `/docs/ops.md` via `./scripts/create_ops_docs.py`
9+
510
Legend:
611
- ✅ Fully supported by this backend
712
- 🟡 Partially supported by this backend
@@ -18,7 +23,8 @@ Legend:
1823
| ARGSORT |||||
1924
| CLAMP |||| 🟡 |
2025
| CONCAT ||| 🟡 ||
21-
| CONT ||| 🟡 ||
26+
| CONT |||||
27+
| CONV_2D |||||
2228
| CONV_2D_DW |||||
2329
| CONV_TRANSPOSE_1D |||||
2430
| CONV_TRANSPOSE_2D |||||
@@ -30,7 +36,7 @@ Legend:
3036
| DIAG_MASK_INF |||| 🟡 |
3137
| DIV |||| 🟡 |
3238
| DUP ||| 🟡 | 🟡 |
33-
| ELU ||| | 🟡 |
39+
| ELU ||| 🟡 | 🟡 |
3440
| EXP ||| 🟡 ||
3541
| FLASH_ATTN_EXT ||| 🟡 | 🟡 |
3642
| GATED_LINEAR_ATTN |||||
@@ -66,14 +72,16 @@ Legend:
6672
| REPEAT_BACK |||||
6773
| RMS_NORM |||| 🟡 |
6874
| RMS_NORM_BACK |||||
69-
| RMS_NORM_MUL |||||
75+
| RMS_NORM_MUL |||||
76+
| RMS_NORM_MUL_ADD |||||
77+
| ROLL |||||
7078
| ROPE |||||
7179
| ROPE_BACK |||||
7280
| RWKV_WKV6 |||||
7381
| RWKV_WKV7 |||||
7482
| SCALE |||||
7583
| SET |||||
76-
| SET_ROWS || 🟡 | | 🟡 |
84+
| SET_ROWS || 🟡 | 🟡 | 🟡 |
7785
| SGN ||| 🟡 ||
7886
| SIGMOID ||| 🟡 | 🟡 |
7987
| SILU ||| 🟡 | 🟡 |

0 commit comments

Comments
 (0)