Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
fd7855f
doc: [MUSA] minor changes (#12583)
yeahdongcn Mar 26, 2025
5ed38b6
ggml : fix MUL_MAT_ID repack with Q8_K (#12544)
ggerganov Mar 26, 2025
df4d20c
convert : fix squeeze for ssm_conv tensors (#12573)
ggerganov Mar 26, 2025
02082f1
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA…
Ivy233 Mar 26, 2025
2447ad8
upgrade to llguidance 0.7.10 (#12576)
mmoskal Mar 26, 2025
b3298fa
metal : refactor mat-vec code (#12569)
ggerganov Mar 26, 2025
bd40678
HIP: Add support for RDNA4 targets (#12372)
slojosic-amd Mar 26, 2025
f17a3bb
SYCL: implement memset ggml backend buffer interface (#12580)
qnixsynapse Mar 27, 2025
f28bc4c
llama : make loras compatible with repacking (#12593)
ggerganov Mar 27, 2025
24feaec
ggml : riscv: add 128-bit RVV support (#12530)
xctan Mar 27, 2025
c7b43ab
llamafile : ppc64le MMA implementation for Q4_0. (#12489)
amritahs-ibm Mar 27, 2025
0306aad
cmake : sync/merge PowerPC build commands (#0)
ggerganov Mar 27, 2025
df0665a
sync : ggml
ggerganov Mar 27, 2025
771d843
scripts : update sync + fix cmake merge
ggerganov Mar 27, 2025
029c693
sync : ggml
ggerganov Mar 27, 2025
d5c6309
convert : Support Qwen2_5_VLForConditionalGeneration (#12595)
csabakecskemeti Mar 27, 2025
953c2a6
model : restore support for T5Encoder (#12590)
HighDoping Mar 27, 2025
f125b8d
llama : add PLM GGUF Conversion & Inference Support (#12457)
Si1w Mar 27, 2025
5dec47d
opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600)
lhez Mar 27, 2025
2969019
media : add SVG logo [no ci] (#12616)
ggerganov Mar 27, 2025
2099a9d
server : Support listening on a unix socket (#12613)
p1-0tr Mar 27, 2025
ab6ab8f
rpc : send hash when tensor data is above some fixed threshold (#12496)
rgerganov Mar 28, 2025
1373176
llamafile : ppc64le GEMV forwarding for FP32. (#12594)
amritahs-ibm Mar 28, 2025
ef03229
rpc : update README for cache usage (#12620)
rgerganov Mar 28, 2025
5d01670
server : include speculative decoding stats when timings_per_token is…
mostlygeek Mar 28, 2025
dd373dd
llama: fix error on bad grammar (#12628)
JohannesGaessler Mar 28, 2025
b86f600
vulkan: fix coopmat shader generation when cross-compiling (#12272)
Icenowy Mar 28, 2025
b4ae508
metal : improve FA + improve MoE (#12612)
ggerganov Mar 28, 2025
3714c3e
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631)
CISC Mar 28, 2025
d07a0d7
CANN : remove clang-format in ggml-cann (#12607)
hipudding Mar 29, 2025
a69f846
cmake : fix ccache conflict (#12522)
BusyJay Mar 29, 2025
0bb2919
llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra ->…
Djip007 Mar 29, 2025
af6ae1e
llama : fix non-causal mask for gemma 3 (#12615)
ngxson Mar 29, 2025
3891e18
examples : command.wasm updates (whisper/2904)
danbev Mar 20, 2025
e408d43
ggml : add logging for native build options/vars (whisper/2935)
danbev Mar 24, 2025
a62d7fa
cpu: de-duplicate some of the operators and refactor (ggml/1144)
cmdr2 Mar 29, 2025
360dc22
cpu : rm unused variable (ggml/1166)
ngxson Mar 29, 2025
d3f1f0a
sync : ggml
ggerganov Mar 29, 2025
492d7f1
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci a…
yeahdongcn Mar 30, 2025
7242dd9
llama-chat : Add Yandex instruct model template support (#12621)
vorobyov01 Mar 30, 2025
b3de7ca
llama : add Trillion 7B model support (#12556)
juyoung-trl Mar 30, 2025
4663bd3
metal : use constexpr in FA kernels + fix typedef (#12659)
ggerganov Mar 30, 2025
2c3f8b8
llama : support BailingMoE (Ling) (#12634)
CISC Mar 30, 2025
52de2e5
tts : remove printfs (#12640)
marcoStocchi Mar 31, 2025
f52d59d
llava : fix clip loading GGUFs with missing description (#12660)
CISC Mar 31, 2025
6c02a03
SYCL: Remove misleading ggml_sycl_op_flatten function (#12387)
qnixsynapse Mar 31, 2025
1a85949
llava : proper description fix (#12668)
CISC Mar 31, 2025
a772448
cmake: improve Vulkan cooperative matrix support checks (whisper/2966)
sandrohanea Mar 31, 2025
0114a32
sync : ggml
ggerganov Mar 31, 2025
1790e73
cmake : fix whitespace (#0)
ggerganov Mar 31, 2025
a8a1f33
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)
0cc4m Mar 31, 2025
403fbac
convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if pr…
CISC Mar 31, 2025
250d795
ggml : faster ssm scan (#10558)
A3shTnT Mar 31, 2025
c80a775
vocab : add special infill tokens for CodeLlama (#11850)
danbev Mar 31, 2025
35782ae
convert : BailingMoE : avoid setting rope_dim to 0 (#12678)
CISC Mar 31, 2025
8bbf260
SYCL: switch to SYCL namespace (#12674)
qnixsynapse Apr 1, 2025
8293970
SYCL: Rename oneMKL to oneMath (#12192)
Rbiessy Apr 1, 2025
2bb3597
vulkan: fix build when glslc doesn't support coopmat (#12683)
wbruna Apr 1, 2025
a6f32f0
Fix clang warning in gguf_check_reserved_keys (#12686)
yeahdongcn Apr 1, 2025
3fd072a
metal : use F32 prec in FA kernels (#12688)
ggerganov Apr 1, 2025
5936a61
convert : BailingMoE : fix qkv split when head_dim is 0 (#12687)
CISC Apr 1, 2025
e39e727
llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key (#12672)
jklincn Apr 1, 2025
f423981
opencl : fix memory allocation size (#12649)
sparkleholic Apr 1, 2025
267c139
common : refactor downloading system, handle mmproj with -hf option (…
ngxson Apr 1, 2025
9bacd6b
[CANN] get_rows and dup optimization (#12671)
noemotiovon Apr 2, 2025
42eb248
common : remove json.hpp from common.cpp (#12697)
ngxson Apr 2, 2025
83a88bd
vocab : BailingMoE : change possessive quantifiers to greedy (#12677)
CISC Apr 2, 2025
a10b36c
llama : refactor kv cache guard (#12695)
ggerganov Apr 2, 2025
e0e912f
llama : add option to override model tensor buffers (#11397)
slaren Apr 2, 2025
833e2b7
model : print tensor size during load (#12711)
ggerganov Apr 2, 2025
92e3006
Vulkan: Fix mmq int dot float cache size (#12722)
0cc4m Apr 2, 2025
be0a0f8
vulkan: Implement grouped query attention in the coopmat2 FA shader (…
jeffbolznv Apr 2, 2025
6f3bd38
cmake: remove caching from vulkan coopmat checks (#12719)
bandoti Apr 2, 2025
f01bd02
vulkan: Implement split_k for coopmat2 flash attention. (#12627)
jeffbolznv Apr 2, 2025
97a20c0
opencl: use `max_alloc_size` in backend ctx instead of querying again…
lhez Apr 3, 2025
2a0dc97
CANN: Fix failed test cases (#12708)
hipudding Apr 3, 2025
3f9da22
Simplify and improve CUDA graphs through use of indirect copy pointer…
agray3 Apr 3, 2025
65cfe13
CANN: Support operator SIN COS ARGMAX (#12709)
noemotiovon Apr 3, 2025
193c3e0
fix MUSA compiler warning (#12704)
A3shTnT Apr 3, 2025
5f696e8
sync : minja (inclusionAI/Ling) and update tests (#12699)
yeahdongcn Apr 3, 2025
2004644
ci : add env variable in ggml-ci and document the same in SYCL.md (#1…
AD2605 Apr 3, 2025
1c05999
vulkan: Fix missing cmake logic for dot product extension (#12721)
jeffbolznv Apr 3, 2025
5dd5d1a
vocab : use string_view::find() to avoid unnecessary looking up beyon…
yumeyao Apr 3, 2025
c262bed
CUDA: Prefer vector flash decoding kernel for Gemma models (#12738)
gaugarg-nv Apr 3, 2025
7d7b1ba
opencl: update doc for OpenCL (#12702)
lhez Apr 4, 2025
35e592e
vulkan: set cmake minimum and project name in vulkan-shaders (#12744)
jeffbolznv Apr 4, 2025
74d4f5b
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (…
jeffbolznv Apr 4, 2025
348888e
docs : add XCFramework section to README.md [no ci] (#12746)
danbev Apr 4, 2025
9ac4d61
cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747)
hydroo Apr 4, 2025
94148ba
sycl: allow ggml-sycl configuration and compilation using Visual Stud…
s-Nick Apr 4, 2025
23106f9
gguf-split : --merge now respects --dry-run option (#12681)
nickhuang99 Apr 4, 2025
b772394
server : webui : Upgrade daisyui, tailwindcss. (#12735)
nauful Apr 4, 2025
1be76e4
ci: add Linux cross-compile build (#12428)
bandoti Apr 4, 2025
3e1d293
kv-cache : simplify + fix warning for recurrent models (#12756)
ggerganov Apr 4, 2025
7a84777
sync: minja (#12739)
ochafik Apr 4, 2025
c6ff5d2
common: custom hf endpoint support (#12769)
eternaphia Apr 5, 2025
0364178
clip : refactor clip_init, add tests (#12757)
ngxson Apr 5, 2025
f1e3eb4
common : fix includes in arg.cpp and gemma3-cli.cpp (#12766)
barracuda156 Apr 5, 2025
6bf28f0
Vulkan: Tune Vulkan mmq int dot shader for performance (#12767)
0cc4m Apr 5, 2025
80b717d
vulkan: Use unclamped loads for flash attention mask (#12720)
jeffbolznv Apr 6, 2025
0c74b04
vulkan: fix NaN issue in flash attention shader (#12776)
jeffbolznv Apr 6, 2025
916c83b
musa: fix compilation warnings in mp_22/31 (#12780)
yeahdongcn Apr 6, 2025
d0d5b22
CANN: Refactor to reduce duplicate code (#12731)
hipudding Apr 7, 2025
3b06a95
Merge branch 'layla-build' into merge
l3utterfly Apr 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
name: Build on Linux using cross-compiler
on:
workflow_dispatch:
workflow_call:

jobs:
ubuntu-latest-riscv64-cpu-cross:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Setup Riscv
run: |
sudo dpkg --add-architecture riscv64
sudo sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://ports.ubuntu.com/ubuntu-ports|g' \
/etc/apt/sources.list /etc/apt/apt-mirrors.txt
sudo apt-get clean
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu

- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=riscv64 \
-DCMAKE_C_COMPILER=riscv64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=riscv64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/riscv64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

ubuntu-latest-riscv64-vulkan-cross:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup Riscv
run: |
sudo dpkg --add-architecture riscv64
sudo sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://ports.ubuntu.com/ubuntu-ports|g' \
/etc/apt/sources.list /etc/apt/apt-mirrors.txt
sudo apt-get clean
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
glslc \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu \
libvulkan-dev:riscv64

- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=riscv64 \
-DCMAKE_C_COMPILER=riscv64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=riscv64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/riscv64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

ubuntu-latest-arm64-vulkan-cross:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup Arm64
run: |
sudo dpkg --add-architecture arm64
sudo sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://ports.ubuntu.com/ubuntu-ports|g' \
/etc/apt/sources.list /etc/apt/apt-mirrors.txt
sudo apt-get clean
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
glslc \
crossbuild-essential-arm64 \
libvulkan-dev:arm64

- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/aarch64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)
7 changes: 5 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ on:
push:
branches:
- master
paths: ['.github/workflows/build.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
paths: ['.github/workflows/build.yml', '.github/workflows/build-linux-cross.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
pull_request:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/build.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
Expand Down Expand Up @@ -606,6 +606,9 @@ jobs:
-DGGML_SYCL_F16=ON
cmake --build build --config Release -j $(nproc)

build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

macOS-latest-cmake-ios:
runs-on: macos-latest

Expand Down Expand Up @@ -803,7 +806,7 @@ jobs:
env:
OPENBLAS_VERSION: 0.3.23
SDE_VERSION: 9.33.0-2024-01-07
VULKAN_VERSION: 1.4.304.1
VULKAN_VERSION: 1.4.309.0

strategy:
matrix:
Expand Down
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- [x] [RWKV-6](https://github.com/BlinkDL/RWKV-LM)
- [x] [QRWKV-6](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1)
- [x] [GigaChat-20B-A3B](https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct)
- [X] [Trillion-7B-preview](https://huggingface.co/trillionlabs/Trillion-7B-preview)
- [x] [Ling models](https://huggingface.co/collections/inclusionAI/ling-67c51c85b34a7ea0aba94c32)

#### Multimodal

Expand Down Expand Up @@ -528,6 +530,35 @@ If your issue is with model generation quality, then please at least scan the fo
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

## XCFramework
The XCFramework is a precompiled version of the library for iOS, visionOS, tvOS,
and macOS. It can be used in Swift projects without the need to compile the
library from source. For example:
```swift
// swift-tools-version: 5.10
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

let package = Package(
name: "MyLlamaPackage",
targets: [
.executableTarget(
name: "MyLlamaPackage",
dependencies: [
"LlamaFramework"
]),
.binaryTarget(
name: "LlamaFramework",
url: "https://github.com/ggml-org/llama.cpp/releases/download/b5046/llama-b5046-xcframework.zip",
checksum: "c19be78b5f00d8d29a25da41042cb7afa094cbf6280a225abe614b03b20029ab"
)
]
)
```
The above example is using an intermediate build `b5046` of the library. This can be modified
to use a different version by changing the URL and checksum.

## Completions
Command-line completion is available for some environments.

Expand Down
2 changes: 1 addition & 1 deletion ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ docker run --privileged -it \
Inside the container, execute the following commands:

```bash
apt update -y && apt install -y cmake git python3.10-venv wget
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
git config --global --add safe.directory /ws
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache
```
Expand Down
4 changes: 3 additions & 1 deletion ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ if [ ! -z ${GG_BUILD_SYCL} ]; then
export ONEAPI_DEVICE_SELECTOR="level_zero:0"
# Enable sysman for correct memory reporting
export ZES_ENABLE_SYSMAN=1
# to circumvent precision issues on CPY operations
export SYCL_PROGRAM_COMPILE_OPTIONS="-cl-fp32-correctly-rounded-divide-sqrt"
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_SYCL=1 -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON"
fi

Expand All @@ -69,7 +71,7 @@ fi
if [ ! -z ${GG_BUILD_MUSA} ]; then
# Use qy1 by default (MTT S80)
MUSA_ARCH=${MUSA_ARCH:-21}
CMAKE_EXTRA="-DGGML_MUSA=ON -DMUSA_ARCHITECTURES=${MUSA_ARCH}"
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=${MUSA_ARCH}"
fi
## helpers

Expand Down
4 changes: 2 additions & 2 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ if (LLAMA_LLGUIDANCE)

ExternalProject_Add(llguidance_ext
GIT_REPOSITORY https://github.com/guidance-ai/llguidance
# v0.6.12:
GIT_TAG ced1c9023d47ec194fa977932d35ce65c2ebfc09
# v0.7.10:
GIT_TAG 0309d2a6bf40abda35344a362edc71e06d5009f8
PREFIX ${CMAKE_BINARY_DIR}/llguidance
SOURCE_DIR ${LLGUIDANCE_SRC}
BUILD_IN_SOURCE TRUE
Expand Down
Loading
Loading