Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
144 commits
Select commit Hold shift + click to select a range
c4a3e97
metal : copy kernels for quant to F32/F16 conversions (#12017)
gcp Feb 25, 2025
e920a42
llama : expose llama_model_n_head_kv in the API (#11997)
vlovich Feb 25, 2025
fc55585
Add Doc for Converting Granite Vision -> GGUF (#12006)
alex-jw-brooks Feb 25, 2025
fcdd8fa
server: support add_generation_prompt query param (#12062)
ochafik Feb 25, 2025
da1a1e2
vulkan: implement more backpropagation operators (#11914)
remyoudompheng Feb 25, 2025
87d637e
ggml-cpu: Fix build with sve (#12059)
MollySophia Feb 25, 2025
0ff95db
add OP sigmoid (#12056)
foldl Feb 25, 2025
dea4756
server: handle echo=false on /v1/completions (#12060)
rhjdvsgsgks Feb 25, 2025
e1fd56d
vulkan: fix assertion when qy_needs_dequant (#12068)
jeffbolznv Feb 25, 2025
fe93d46
docs: add docs/function-calling.md to lighten server/README.md's plig…
ochafik Feb 25, 2025
89d1d41
readme : update infra list (#9096)
kerthcet Feb 26, 2025
777e781
gguf-py: enable reading non-native endian files (#12081)
AlekseiNikiforovIBM Feb 26, 2025
fc06fbb
Refactor gguf scripts to improve metadata handling (#11909)
CISC Feb 26, 2025
db4aebe
llava : add struct for FFI bindgen (#12079)
tinglou Feb 26, 2025
d30f073
cmake: Fix ggml backend dependencies and installation (#11818)
vvuksanovic Feb 27, 2025
02df4c5
vulkan: improve im2col (#11826)
daniandtheweb Feb 28, 2025
e34d633
vulkan: matmul dequantization improvements (#12015)
netrunnereve Feb 28, 2025
5f61b24
CANN: Fix build error with GCC 13 (#11990)
hipudding Feb 28, 2025
23f6cd6
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064)
Vithulep Feb 28, 2025
eab6c76
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098)
JohannesGaessler Feb 28, 2025
297c73c
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizatio…
remyoudompheng Feb 28, 2025
d851f59
Update granite vision docs for 3.2 model (#12105)
alex-jw-brooks Feb 28, 2025
5319053
llama : add Phi-4-mini support (supersede #12099) (#12108)
ngxson Feb 28, 2025
50463a7
ggml : upgrade init_tensor API to return a ggml_status (#11854)
WilliamTambellini Feb 28, 2025
7b616fc
convert : fix Norway problem when parsing YAML (#12114)
ngxson Feb 28, 2025
2c1f2eb
webui : minor typo fixes (#12116)
vynride Mar 1, 2025
02fe919
CUDA: compress mode option and default to size (#12029)
Green-Sky Mar 1, 2025
11d9f00
common : add --system-prompt parameter, replace behavior of -p in con…
CISC Mar 1, 2025
556a32b
main: update outdated system prompt message (followup to #12131) (#12…
CISC Mar 1, 2025
863955a
main: use jinja chat template system prompt by default (#12118)
CISC Mar 2, 2025
d393566
ggml-backend : keep paths in native string type when possible (#12144)
slaren Mar 2, 2025
deb2d2a
SYCL: Move CPY kernels to a separate file and add few missing kernels…
qnixsynapse Mar 3, 2025
837810e
webui : add ?m=... and ?q=... params (#12148)
ngxson Mar 3, 2025
f358861
Adding UTF-8 support to llama.cpp (#12111)
ericcurtin Mar 3, 2025
238f61a
ggml : fix kleidiai build (#12159)
ag2s20150909 Mar 3, 2025
3049c9e
test-backend-ops : add option -p to filter by op params (#12155)
slaren Mar 3, 2025
cebadb2
tts: add speaker file support (#12048)
dm4 Mar 3, 2025
d34328c
ci : set GITHUB_ACTION env var for server tests (#12162)
danbev Mar 3, 2025
b828c7b
scripts : sync-ggml-am.sh fix
ggerganov Feb 28, 2025
36e9e14
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU)…
cmdr2 Feb 25, 2025
80a8d34
Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)
petterreinholdtsen Feb 26, 2025
7270863
cmake : fix compile assumptions for power9/etc (whisper/2777)
midnightmagic Feb 5, 2025
13b0c3d
whisper : support GGML_BACKEND_DL (whisper/2843)
slaren Feb 27, 2025
ae95668
cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
cmdr2 Feb 28, 2025
98460bb
sync : ggml
ggerganov Feb 28, 2025
bac8244
cuda/vulkan: specify fp32-only support for some operations in support…
cmdr2 Feb 28, 2025
d937246
sync : ggml
ggerganov Feb 28, 2025
39bcace
cuda: unary ops as float + de-duplicate (ggml/1130)
cmdr2 Mar 3, 2025
87a8ed8
sync : ggml
ggerganov Mar 3, 2025
ab86017
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032)
hjc4869 Mar 3, 2025
2d555c3
`server`: fix deadly typo in response_format.json_schema.schema handl…
ochafik Mar 4, 2025
08ea450
main: allow preloading conversation with -p and add -st / --single-tu…
CISC Mar 4, 2025
5839cb1
readme : fix roadmap link (#12185)
ggerganov Mar 4, 2025
5dccdee
ggml : portability fixes for VS 2017 (#12150)
mgroeber9110 Mar 4, 2025
4f2160f
llama : add xcframework build script (#11996)
danbev Mar 5, 2025
248d5b5
server : fix cache reuse logic (#12161)
Clauszy Mar 5, 2025
3d8707b
ci : remove xframework upload (#12190)
danbev Mar 5, 2025
12b29d9
ci : fix xcframework artifact tag (#12191)
danbev Mar 5, 2025
372254f
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, suppor…
ochafik Mar 5, 2025
aca4bb8
ci : add fetch-depth to xcframework upload (#12195)
danbev Mar 5, 2025
3993cbd
ggml : fix GGMLMetalClass ODR (#12200)
pminev Mar 5, 2025
c9f2cfc
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12…
remyoudompheng Mar 6, 2025
daab580
opencl : fix profile-related errors (#12095)
simon886212 Mar 6, 2025
feaca24
opencl : fix `ulong` kernel args were set from `int` variables (#12174)
linehill Mar 6, 2025
e8bfee5
opencl : fix buffer alignment (#12197)
linehill Mar 6, 2025
a77ce0c
android : fix KV cache log message condition (#12212)
hanyin-arm Mar 6, 2025
26a1dd8
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of …
IMbackK Mar 6, 2025
d6da140
llava: add big-endian conversion for image encoder (#12218)
taronaeo Mar 6, 2025
8b4cccb
update function-calling.md w/ template override for functionary-small…
ochafik Mar 6, 2025
de6fff6
HIP: rocWMMA documentation and enabling in workflow builds (#12179)
hjc4869 Mar 6, 2025
288caf8
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222)
JohannesGaessler Mar 6, 2025
d769e26
readme : update bindings (#12229)
lmbelo Mar 6, 2025
92f11ef
cmake : fix undefined reference errors for std::filesystem in ggml (#…
hbuxiaofei Mar 6, 2025
f3ca2b1
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops…
lhez Mar 7, 2025
5d61aaa
metal : fix default.metallib build (#12224)
danbev Mar 7, 2025
2af92fa
HIP: fix rocWMMA build flags under Windows (#12230)
hjc4869 Mar 7, 2025
0882890
metal : simplify kernel arguments using a struct (#3229) (#12194)
BB-fat Mar 7, 2025
fd414d2
sync: minja - support QwQ-32B (#12235)
ochafik Mar 7, 2025
205aa64
server : Log original chat template parsing error (#12233)
CISC Mar 7, 2025
3aa6bd9
ci : fix save-load test invocations (#12245)
ggerganov Mar 7, 2025
c3e4a76
ggml-cpu: faster AVX2 variant for IQ1_M (#12216)
remyoudompheng Mar 7, 2025
50c9389
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/…
vmobilis Mar 7, 2025
9846ac3
sync : ggml
ggerganov Mar 7, 2025
64048b7
ggml : skip intermediate .air file when compiling .metallib (#12247)
danbev Mar 7, 2025
22164bb
server : infill gen ends on new line (#12254)
ggerganov Mar 7, 2025
11a32ee
ggml-backend : make path_str compatible with C++20 (#12269)
ctrysbita Mar 8, 2025
0bac404
authors : update (#12271)
ggerganov Mar 8, 2025
17345ac
server : add speculative decoding presets for FIM (#12287)
ggerganov Mar 9, 2025
f384f48
llava : fix bug in minicpm-v code (#11513)
tc-mb Mar 10, 2025
80cb71d
`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from t…
ochafik Mar 10, 2025
8d4728c
allow missing content in message if tool_calls provided (#12293)
ochafik Mar 10, 2025
85adb37
`tool-call`: ensure there's always a non-empty tool call id (#12292)
ochafik Mar 10, 2025
c384ddf
`server`: extract <think> tags from qwq outputs (#12297)
ochafik Mar 10, 2025
09888a9
common : refactor '-o' option (#12278)
marcoStocchi Mar 10, 2025
4eaf9de
tests : fix test-quantize-fns to init the CPU backend (#12306)
ggerganov Mar 10, 2025
82db278
readme: added Sidekick to available UIs (#12311)
johnbean393 Mar 10, 2025
40a5a60
opencl: use OpenCL C standard supported by the device (#12221)
Mar 10, 2025
e800662
musa: support new arch mp_31 and update doc (#12296)
yeahdongcn Mar 10, 2025
1989578
mat vec double buffer (#12188)
netrunnereve Mar 10, 2025
b8b61f0
clip : bring back GPU support (#12322)
ngxson Mar 11, 2025
715a993
metal : Cache the Metal library at the device context level (#12265)
BB-fat Mar 11, 2025
d6f4764
ggml-backend : fix backend search path (#12330)
Mar 11, 2025
d0bebcc
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …
IMbackK Mar 11, 2025
7957c21
vulkan: fix bug in coopmat1 mul_mat_id (#12316)
jeffbolznv Mar 12, 2025
3ea6abf
llama : Add Gemma 3 support (+ experimental vision capability) (#12343)
ngxson Mar 12, 2025
b5bcf44
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315)
IMbackK Mar 12, 2025
118c301
llama.swiftui : fix xcframework dir in README [no ci] (#12353)
danbev Mar 12, 2025
9bfb2dc
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK…
oscarbg Mar 12, 2025
a573718
server : fix crash when using verbose output with input tokens that a…
ishaangandhi Mar 13, 2025
de6e65f
llama : refactor llama_context, llama_kv_cache, llm_build_context (#1…
ggerganov Mar 13, 2025
354376f
arg : no n_predict = -2 for examples except for main and infill (#12364)
ngxson Mar 13, 2025
3428d14
llama : fix Gemma3 SWA KV cache shift (#12373)
ggerganov Mar 13, 2025
9288b9b
hparams : add SWA rope parameters (#12374)
ggerganov Mar 14, 2025
e1e8915
graph : simplify attn input build for unified KV cache (#12381)
ggerganov Mar 14, 2025
5689212
server: fix "--grammar-file" parameter (#12285)
dodekapod Mar 14, 2025
236c9b2
Load all MoE experts during warmup (#11571)
fairydreaming Mar 14, 2025
6aa3adf
main : add -sysf / --system-prompt-file (#12249) (#12250)
CISC Mar 14, 2025
3cd6f1f
Add CLI arg to llama-run to adjust the number of threads used (#12370)
ericcurtin Mar 14, 2025
cb1de04
[CANN]MUL_MAT optimization (#12382)
noemotiovon Mar 15, 2025
822a6d3
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (…
fairydreaming Mar 15, 2025
874af05
SYCL: Delete redundant plus sign and space (#12391)
aubreyli Mar 15, 2025
90993c7
llama-tts : add '-o' option (#12398)
marcoStocchi Mar 15, 2025
d63d47e
ci : add --symlinks to xcframework zip command (#12409)
danbev Mar 16, 2025
12d7a79
context : fix init of n_outputs (#12397)
ggerganov Mar 16, 2025
7e6f894
llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400)
CISC Mar 16, 2025
104d8bd
SYCL: set extras only on GGML_TYPE_Q4_0 (#12366)
qnixsynapse Mar 17, 2025
e917e57
cmake : enable building llama.cpp using system libggml (#12321)
ckastner Mar 17, 2025
19aab34
vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258)
jeffbolznv Mar 17, 2025
423e999
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…
jeffbolznv Mar 17, 2025
d7dc779
vulkan: use fp32 in coopmat2 q4_k dequant function (#12309)
jeffbolznv Mar 17, 2025
e9030cc
vulkan: subgroup size tuning (#12087)
daniandtheweb Mar 17, 2025
1e33f90
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312)
jeffbolznv Mar 17, 2025
0897424
ggml-vulkan: remove unused find_program(glslc) (#12416)
guusw Mar 17, 2025
d985ea4
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394)
gaugarg-nv Mar 17, 2025
24c26e4
docs : bring llama-cli conversation/template docs up-to-date (#12426)
CISC Mar 17, 2025
348bc09
llama: Add support for RWKV v7 architecture (#12412)
MollySophia Mar 17, 2025
b1972f5
fixed compilation warnings in ggml-sycl (#12424)
lslusarczyk Mar 18, 2025
2402c3d
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…
0cc4m Mar 18, 2025
e866ff8
ggml : add SVE support for q6_K_q8_K (#12361)
fj-y-saito Mar 18, 2025
568e027
cmake : fix PowerPC build (#12241)
mehendarkarprajwal Mar 18, 2025
b7417a3
server : fix warmup draft cache type (#12446)
ggerganov Mar 18, 2025
4f03d40
context : always use non-causal attention for encoder graphs (#12447)
ggerganov Mar 18, 2025
784783c
rm other backend ci, keep cpu, rpc, sycl
arthw Mar 19, 2025
7e411f9
rm tail space
arthw Mar 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,262 changes: 35 additions & 1,227 deletions .github/workflows/build.yml

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@ jobs:
- name: Tests
id: server_integration_tests
if: ${{ matrix.sanitizer == '' }}
env:
GITHUB_ACTIONS: "true"
run: |
cd examples/server/tests
./tests.sh
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ lcov-report/
tags
.build/
build*
release
debug
!build-info.cmake
!build-info.cpp.in
!build-info.sh
Expand Down
61 changes: 60 additions & 1 deletion AUTHORS

Large diffs are not rendered by default.

10 changes: 9 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ else()
set(LLAMA_STANDALONE OFF)
endif()

option(LLAMA_USE_SYSTEM_GGML "Use system libggml" OFF)

if (EMSCRIPTEN)
set(BUILD_SHARED_LIBS_DEFAULT OFF)

Expand Down Expand Up @@ -145,7 +147,13 @@ endif()
# 3rd-party
#

if (NOT TARGET ggml)
if (LLAMA_USE_SYSTEM_GGML)
message(STATUS "Using system-provided libggml, skipping ggml build")
find_package(ggml REQUIRED)
add_library(ggml ALIAS ggml::ggml)
endif()

if (NOT TARGET ggml AND NOT LLAMA_USE_SYSTEM_GGML)
add_subdirectory(ggml)
# ... otherwise assume ggml is added by a parent CMakeLists.txt
endif()
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggml-org/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -836,7 +836,7 @@ ifdef GGML_MUSA
else
MUSA_PATH ?= /opt/musa
endif
MUSA_ARCHITECTURES ?= 21;22
MUSA_ARCHITECTURES ?= 21;22;31

MK_CPPFLAGS += -DGGML_USE_MUSA -DGGML_USE_CUDA
MK_LDFLAGS += -L$(MUSA_PATH)/lib -Wl,-rpath=$(MUSA_PATH)/lib
Expand Down
19 changes: 0 additions & 19 deletions Package.swift

This file was deleted.

8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ During sync the commits, I will test and update the commits impact SYCL backend.
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Server](https://github.com/ggml-org/llama.cpp/actions/workflows/server.yml/badge.svg)](https://github.com/ggml-org/llama.cpp/actions/workflows/server.yml)

[Roadmap](https://github.com/users/ggml-org/projects/7) / [Project status](https://github.com/ggml-org/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggml-org/llama.cpp/discussions/205) / [ggml](https://github.com/ggml-org/ggml)
[Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggml-org/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggml-org/llama.cpp/discussions/205) / [ggml](https://github.com/ggml-org/ggml)

Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) in pure C/C++

Expand All @@ -59,7 +59,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)

- **How to use [MTLResidencySet](https://developer.apple.com/documentation/metal/mtlresidencyset?language=objc) to keep the GPU memory active?** https://github.com/ggml-org/llama.cpp/pull/11427
- **VS Code extension for FIM completions:** https://github.com/ggml-org/llama.vscode
- Universal tool call support in `llama-server`: https://github.com/ggml-org/llama.cpp/pull/9639
- Universal [tool call support](./docs/function-calling.md) in `llama-server` https://github.com/ggml-org/llama.cpp/pull/9639
- Vim/Neovim plugin for FIM completions: https://github.com/ggml-org/llama.vim
- Introducing GGUF-my-LoRA https://github.com/ggml-org/llama.cpp/discussions/10123
- Hugging Face Inference Endpoints now support GGUF out of the box! https://github.com/ggml-org/llama.cpp/discussions/9669
Expand Down Expand Up @@ -191,6 +191,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- Guile Scheme: [guile_llama_cpp](https://savannah.nongnu.org/projects/guile-llama-cpp)
- Swift [srgtuszy/llama-cpp-swift](https://github.com/srgtuszy/llama-cpp-swift)
- Swift [ShenghaiWang/SwiftLlama](https://github.com/ShenghaiWang/SwiftLlama)
- Delphi [Embarcadero/llama-cpp-delphi](https://github.com/Embarcadero/llama-cpp-delphi)

</details>

Expand All @@ -205,6 +206,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- [eva](https://github.com/ylsdamxssjxxdd/eva) (MIT)
- [iohub/collama](https://github.com/iohub/coLLaMA) (Apache-2.0)
- [janhq/jan](https://github.com/janhq/jan) (AGPL)
- [johnbean393/Sidekick](https://github.com/johnbean393/Sidekick) (MIT)
- [KanTV](https://github.com/zhouwg/kantv?tab=readme-ov-file) (Apache-2.0)
- [KodiBot](https://github.com/firatkiral/kodibot) (GPL)
- [llama.vim](https://github.com/ggml-org/llama.vim) (MIT)
Expand Down Expand Up @@ -253,7 +255,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister) - llama.cpp as a smart contract on the Internet Computer, using WebAssembly
- [llama-swap](https://github.com/mostlygeek/llama-swap) - transparent proxy that adds automatic model switching with llama-server
- [Kalavai](https://github.com/kalavai-net/kalavai-client) - Crowdsource end to end LLM deployment at any scale

- [llmaz](https://github.com/InftyAI/llmaz) - ☸️ Easy, advanced inference platform for large language models on Kubernetes.
</details>

<details>
Expand Down
4 changes: 0 additions & 4 deletions Sources/llama/llama.h

This file was deleted.

5 changes: 0 additions & 5 deletions Sources/llama/module.modulemap

This file was deleted.

Loading