Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
373 commits
Select commit Hold shift + click to select a range
85286f3
model : add OLMo3 support (#16015)
2015aroras Sep 17, 2025
1cbd80f
examples : support encoder-decoder models in the simple example (#16002)
DamonFool Sep 17, 2025
745cbcf
llama-quant : fix the verification of attention layers for encoder-de…
DamonFool Sep 17, 2025
a91d035
ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040)
danbev Sep 17, 2025
cb5bb6c
vulkan: automatically remove unsupported devices (#15976)
netrunnereve Sep 17, 2025
cd08fc3
common : Fix corrupted memory error on json grammar initialization (#…
dralves Sep 17, 2025
c959b67
CUDA: fix FA occupancy, optimize tile kernel (#15982)
JohannesGaessler Sep 17, 2025
8f8f227
convert : add Llama4ForCausalLM (#16042)
ngxson Sep 17, 2025
a7a98e0
SvelteKit-based WebUI (#14839)
allozaur Sep 17, 2025
0320ac5
metal : refactor + optimize v2 (#15995)
ggerganov Sep 17, 2025
d304f45
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018)
reeselevine Sep 17, 2025
62c3b64
CANN: Remove print (#16044)
noemotiovon Sep 18, 2025
f2f2838
metal : handle nil cv during pipeline creation (#16065)
ggerganov Sep 18, 2025
e00f3fd
metal : avoid call free for non-owned buffer (#16067)
jhen0409 Sep 18, 2025
b213fce
metal : improve F32, F16 and BF16 mat-vec multiplication (#16057)
ggerganov Sep 18, 2025
e58174c
llama : bump max seq limit from 64 to 256 (#15916)
ggerganov Sep 18, 2025
2b6b55a
server : include usage statistics only when user request them (#16052)
rgerganov Sep 18, 2025
ad6bd90
cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn (#16060)
CISC Sep 18, 2025
703f9e3
metal : use function constants for mul_mv_ext kernels (#16074)
ggerganov Sep 18, 2025
4ca088b
Add resumable downloads for llama-server model loading (#15963)
ericcurtin Sep 18, 2025
368560a
CUDA: fix compilation on CC 6.0 (#16091)
JohannesGaessler Sep 18, 2025
38dbdf4
CUDA: Optimize PAD_REFLECT_1D (#15957)
bugparty Sep 18, 2025
c0b4509
rename optimize_graph to graph_optimize (#16082)
jeffbolznv Sep 18, 2025
3edd87c
opencl: optimize mxfp4 kernels (#16037)
shawngu-quic Sep 18, 2025
246c0d9
cmake : fix static linking for OpenMP on Unix-like systems (#16031)
angt Sep 18, 2025
69ffd89
ggml-amx : fix ggml_amx_init() on generic Linux (#16049)
angt Sep 18, 2025
0dd58b6
ggml : refactor forward_dup for cpu backend (#16062)
ngxson Sep 19, 2025
4b8560a
chat : fix build on arm64 (#16101)
ngxson Sep 19, 2025
4067f07
feat: Improve mobile UI for Settings Dialog (#16084)
allozaur Sep 19, 2025
f432d8d
chat: Fix streaming parser for granite models (#15682)
shun095 Sep 19, 2025
be79d9f
llama-bench: add --devices and --list-devices support (#16039)
ssweens Sep 19, 2025
459c0c2
server: fix SSE and OpenAI compatibility for error messages when stre…
BenjaminBruenau Sep 20, 2025
803dac2
vulkan: use vec dot for matrix matrix multiplications (#16056)
0cc4m Sep 20, 2025
fa6383c
CUDA : conditionally add cuda architectures (ggml/1341)
gjasny Sep 10, 2025
405921d
ggml : introduce semantic versioning (ggml/1336)
danbev Sep 16, 2025
7f76692
sync : ggml
ggerganov Sep 20, 2025
5bb4a3e
vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATIST…
jeffbolznv Sep 21, 2025
1eeb523
vulkan: optimize UMA buffer operations and fix driver hangs (#16059)
giuseppe Sep 21, 2025
28baac9
ci : migrate ggml ci to self-hosted runners (#16116)
ggerganov Sep 21, 2025
da30ab5
ci : add label for the RISC-V runner (#16150)
ggerganov Sep 21, 2025
c4510dc
opencl: initial `q8_0` mv support (#15732)
lhez Sep 21, 2025
51f5a45
opencl: fix concat crash on win arm64 with Adreno (#15944)
lhez Sep 21, 2025
9073a73
vulkan: vec dot matrix multiplication fix (#16151)
0cc4m Sep 22, 2025
4d0a7cb
ci : adjust params for less runtime (#16167)
ggerganov Sep 22, 2025
a20d810
vulkan: add RTE variants of exp shader (#16165)
jeffbolznv Sep 22, 2025
1d660d2
ci : use smaller model (#16168)
ggerganov Sep 22, 2025
ec65fb5
ci : remove vulkaninfo calls (#16169)
ggerganov Sep 22, 2025
5c6106a
contrib : update roles (#16113)
ggerganov Sep 22, 2025
b2d980f
codeowners : claim responsibility for ci, models, gguf-py and convert…
CISC Sep 22, 2025
96fdca0
Vulkan: add conv_transpose_2d operation (#16022)
relent95 Sep 22, 2025
05a2458
codeowners : update ownership for @ngxson and @allozuar (#16128)
ngxson Sep 22, 2025
a71ae3b
ggml : add ggml_op_is_empty (#16122)
ggerganov Sep 22, 2025
4f324a5
ggml : extend ggml_can_fuse to work with non-sequential nodes (#16123)
ggerganov Sep 22, 2025
d05affb
common : remove unused local variables (#16140)
haiyuewa Sep 22, 2025
c6db9a1
embedding : fix typos in README (#16171)
GideonSerf Sep 22, 2025
138c87c
webui : fix handling incomplete chunks (#16107)
Bramas Sep 22, 2025
37a23c1
common : enable `--offline` mode without curl support (#16137)
angt Sep 22, 2025
432cf43
codeowners : update + cleanup (#16174)
ggerganov Sep 22, 2025
3ecb2f6
ggml : implement set_rows with i32 index (#16159)
CISC Sep 22, 2025
351f3da
clang-tidy : disable warning about performance enum size (#16127)
haiyuewa Sep 22, 2025
1d0125b
feat: Add conversion support in GraniteHybrid for non-hybrid (all att…
gabe-l-hart Sep 22, 2025
85e7227
ggml-cpu : fix typo in gemm comments [no ci] (#16189)
danbev Sep 23, 2025
4b9f4cb
devops: add s390x containers (#15915)
taronaeo Sep 23, 2025
0bc7cc7
codeowners : add @danbev to model-conversion example [no ci] (#16190)
danbev Sep 23, 2025
264f1b5
zdnn: refactor codebase + add docs (#16178)
taronaeo Sep 23, 2025
f6b4af3
ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15…
CISC Sep 23, 2025
4e29084
ggml-cpu: Respect cpumask settings (#16164)
wishstudio Sep 23, 2025
0889589
ci : enable Vulkan workflow on Mac (#16194)
ggerganov Sep 23, 2025
f505bd8
ci : disable AMD workflows + update NVIDIA workflows (#16200)
ggerganov Sep 23, 2025
8ba548d
model-conversion : fix the make targets in the README.md (#16209)
DamonFool Sep 24, 2025
4d9ea03
codeowners : use slash prefix for root files [no ci] (#16210)
danbev Sep 24, 2025
7735706
model-conversion : run-org-model.py fails to run on mac m1 (#16213)
DamonFool Sep 24, 2025
c0c59c1
codeowners : match all requirements files (#16214)
CISC Sep 24, 2025
152729f
common : add missing chrono header for common.cpp (#16211)
uilianries Sep 24, 2025
63b54c8
model-conversion : make causal-verify-logits fails with model names c…
DamonFool Sep 24, 2025
3a59971
model : add label for LiquidAI LFM2-2.6B model (#16204)
tdakhran Sep 24, 2025
f2a789e
ggml : split graph allocations according to backend max buffer size (…
Acly Sep 24, 2025
e789095
llama: print memory breakdown on exit (#15860)
JohannesGaessler Sep 24, 2025
4ae88d0
codeowners: add ownership of zdnn backend [no ci] (#16229)
taronaeo Sep 24, 2025
5fb5576
devops: fix s390x docker release failure (#16231)
taronaeo Sep 25, 2025
bee378e
ci: run the x64 and arm ci on the github machines instead (#16183)
netrunnereve Sep 25, 2025
e7a5130
codeowners: add ownership of zdnn backend [no ci] (#16232)
taronaeo Sep 25, 2025
c498fc8
rpc : use ggml logging facilities
rgerganov Sep 25, 2025
02a6a82
metal : restore im2col perf (#16219)
ggerganov Sep 25, 2025
4ea0079
metal : relax reorder conditions (#16216)
ggerganov Sep 25, 2025
dfcd53f
metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220)
ggerganov Sep 25, 2025
b5bd037
llama : add support for qwen3 reranker (#15824)
iamlemec Sep 25, 2025
4cdd0bb
docs: fix typo [no ci] (#16244)
JohannesGaessler Sep 25, 2025
aa719c2
ggml : fix loongarch lsx compilation error (#15864)
junchao-loongson Sep 25, 2025
d0991da
server : add support for external server for tests (#16243)
danbev Sep 25, 2025
aa3ee0e
model-conversion : add embedding prompt file support (#15871)
danbev Sep 25, 2025
077c94d
CUDA: add a fused top-K MoE kernel (#16130)
am17an Sep 25, 2025
2705297
readme : update bindings (#16144)
romantal Sep 25, 2025
b05a9d6
vendors: update miniaudio version (#16212)
taronaeo Sep 25, 2025
835b2b9
model : add GroveMoE support (#15510)
CISC Sep 25, 2025
0f7c696
musa: fix build warnings (#15611)
yeahdongcn Sep 26, 2025
a86a580
musa: upgrade musa sdk to 4.3.0 (#16240)
yeahdongcn Sep 26, 2025
3b337b0
codeowners : add danbev as owner of build-xcframework.sh [no ci] (#16…
danbev Sep 26, 2025
00217cd
ci : create git tags for released docker images (#16008)
rgerganov Sep 26, 2025
9b26511
ggml-cpu: implement MXFP4 SIMD for s390x (#16193)
taronaeo Sep 26, 2025
4710dd3
build : fix build-ios-device (#16257)
angt Sep 26, 2025
b995a10
common : use cpp-httplib as a cURL alternative for downloads (#16185)
angt Sep 26, 2025
54dbc37
metal : report OOM errors (#16274)
ggerganov Sep 26, 2025
cc1cfa2
mtmd : fix uninitialized variable in bicubic_resize (#16275)
AlekseiNikiforovIBM Sep 26, 2025
d12a983
codeowners : add rgerganov as owner of RPC [no ci] (#16279)
rgerganov Sep 26, 2025
5d0a40f
Always show message actions for mobile UI + improvements for user mes…
allozaur Sep 26, 2025
e0539eb
webui: switch to hash-based routing (alternative of #16079) (#16157)
isaac-mcfadyen Sep 26, 2025
1a18927
Allow viewing conversations even when llama server is down (#16255)
allozaur Sep 26, 2025
807e8c6
Enhance text file detection logic for file attachments (#16199)
allozaur Sep 26, 2025
624207e
devops: add s390x & ppc64le CI (#15925)
taronaeo Sep 26, 2025
72b24d9
model : make minicpm embedding_scale, residual_scale and logit_scale …
vinkal-chudgar Sep 26, 2025
ace6a54
build : add LLAMA_OPENSSL option (#16287)
angt Sep 27, 2025
3f81b4e
vulkan: support GET_ROWS for k-quants (#16235)
jeffbolznv Sep 27, 2025
234e2ff
server : remove old LLAMA_SERVER_SSL (#16290)
angt Sep 27, 2025
0499b29
vulkan: throw system error instead of SIGABRT during init on older de…
DmyMi Sep 27, 2025
75a3a6c
CUDA: refactor and deduplicate vector FA kernels (#16208)
JohannesGaessler Sep 27, 2025
c0bfc57
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…
am17an Sep 27, 2025
4807e8f
Show message actions by default (#16289)
allozaur Sep 27, 2025
8656f5d
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…
Acly Sep 27, 2025
e6d65fb
vulkan: support arbitrary KV dimension in flash attention (#16160)
jeffbolznv Sep 27, 2025
1384abf
vulkan: handle mat_mul with A matrix > 4GB (#16176)
jeffbolznv Sep 28, 2025
3b53634
metal : fuse non-sequential nodes (#16102)
ggerganov Sep 28, 2025
6a2c614
metal : extend mat-mat multiplication support (#16225)
ggerganov Sep 28, 2025
d8359f5
vulkan: 64-bit im2col (#16135)
jeffbolznv Sep 28, 2025
2811c65
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] …
ImadSaddik Sep 28, 2025
0124ac9
devops: switch to using ubuntu-22.04-s390x image (#16302)
taronaeo Sep 28, 2025
d9e0e7c
ci : fix musa docker build (#16306)
yeahdongcn Sep 28, 2025
bd0af02
common : fix reasoning before forced tool call via tool_choice = requ…
crat0z Sep 28, 2025
b887d2f
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
CISC Sep 28, 2025
92cd103
vulkan: Fix validation failure in quantized flash attention (#16292)
jeffbolznv Sep 29, 2025
a4a0aa5
ggml : fix dependencies for ggml_set_rows (#16318)
ggerganov Sep 29, 2025
3ffd0fa
perplexity : show more kl-divergence data (#16321)
ddh0 Sep 29, 2025
2f61c0f
llama-cli: prevent spurious assistant token (#16202)
vinkal-chudgar Sep 29, 2025
66bb798
fix: preserved zero values in chat settings inputs and textareas by s…
ServeurpersoCom Sep 29, 2025
3a2bdcd
Improve Mobile UI for dialogs and action dropdowns (#16222)
allozaur Sep 29, 2025
adc7634
ggml : check cuda and metal argsort limits and add test (#16323)
CISC Sep 29, 2025
02463ab
ggml-backend : add root cause in error message if loading backend lib…
rlewczuk Sep 29, 2025
2db78c7
ggml : bump version to 0.9.1
ggerganov Sep 20, 2025
b6dff20
ggml : prepare for development of 0.9.2-dev
ggerganov Sep 20, 2025
b6ae75a
ggml : bump version to 0.9.3 (ggml/1353)
danbev Sep 25, 2025
c9b1c06
ggml : remove -dev suffix from release version (ggml/1355)
danbev Sep 26, 2025
4d3d455
sync : whisper.cpp (ggml/1359)
ggerganov Sep 29, 2025
2ddd3f2
sync : ggml
ggerganov Sep 29, 2025
b77e6c1
ggml: riscv: add riscv spacemit backend (#15288)
alex-spacemit Sep 29, 2025
d72f5f7
ci : add AMD runners and workflows (#16249)
ggerganov Sep 29, 2025
5f7e166
Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…
ServeurpersoCom Sep 29, 2025
a74a0d6
tests: override test_set_rows::max_nmse_err to allow for occasional r…
jeffbolznv Sep 30, 2025
de41f2b
codeowners: add codeowners for opencl backend (#16344)
lhez Sep 30, 2025
f1eb1cb
kleidiai : fix work size and threads sync for fp16 (#16246)
chaxu01 Sep 30, 2025
3c62aed
common : simplify etag tracking by removing json (#16342)
angt Sep 30, 2025
35fb824
metal : dynamic simdgroups for MV kernels (#16340)
ggerganov Sep 30, 2025
a014310
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)
anavp-nvidia Sep 30, 2025
075c015
ggml : bump version to 0.9.4 (ggml/1363)
ggerganov Sep 30, 2025
2df5bcf
ci : disable ccache for android (#16348)
CISC Sep 30, 2025
364a7a6
common : remove common_has_curl() (#16351)
angt Sep 30, 2025
d1c84a6
opencl: support ne3 in get_rows (#15866)
lhez Sep 30, 2025
8d78cd2
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
reeselevine Sep 30, 2025
16b0ca0
Chatapi ignore empty sampling (#16330)
ServeurpersoCom Sep 30, 2025
7c156df
opencl: support pad_ext (#15888)
lhez Sep 30, 2025
bf6f3b3
common : disable progress bar without a tty (#16352)
angt Sep 30, 2025
b2ba81d
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
CISC Sep 30, 2025
e74c92e
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…
bartowski1182 Sep 30, 2025
aa9538a
webui: Remove running `llama-server` within WebUI `dev.sh` script (#1…
allozaur Oct 1, 2025
132d673
vulkan: make ggml_vk_default_dispatcher support older vulkan headers …
netrunnereve Oct 1, 2025
4f15759
Add optional setting for showing "Model used:" information (#16337)
allozaur Oct 1, 2025
1104ca1
ci : use registry cache for docker builds (#16366)
CISC Oct 1, 2025
2a9b633
Improve code block color theming (#16325)
allozaur Oct 1, 2025
7647992
Conversation action dialogs as singletons from Chat Sidebar + apply c…
allozaur Oct 1, 2025
4201dea
common: introduce http.h for httplib-based client (#16373)
angt Oct 1, 2025
1fe4e38
ci: Properly install rocwmma for hip builds (#16305)
IMbackK Oct 1, 2025
ded67b9
llama : parameter conversion and loading fixes for PLaMo2 variants (#…
mitmul Oct 1, 2025
e95fec6
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…
IMbackK Oct 1, 2025
c8dedc9
CI: reenable cdna in rocm docker builds (#16376)
IMbackK Oct 1, 2025
95ce098
HIP: add IMbackK to codeowner (#16375)
IMbackK Oct 2, 2025
2be72c2
SYCL: Update to oneAPI 2025.2 (#16371)
NeoZhangJianyu Oct 2, 2025
bbd32bc
ci : fix clean-up of old logs (#16381)
ggerganov Oct 2, 2025
f09aefa
ci: update vulkan ci (#16294)
netrunnereve Oct 2, 2025
72ee736
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388)
CISC Oct 2, 2025
91a2a56
musa: update compile flags (#16265)
yeahdongcn Oct 2, 2025
34fcc5a
model : Apertus model implementation (#15852)
pwilkin Oct 2, 2025
ef07a40
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
reeselevine Oct 2, 2025
d64c810
test-barrier : do not use more threads than physically available (#16…
CISC Oct 2, 2025
5113efd
fix: track viewportHeight via window.innerHeight to avoid unwanted sc…
ServeurpersoCom Oct 3, 2025
136bda7
webui : Fix messages payload sent to chat completions (#16402)
allozaur Oct 3, 2025
e308efd
vulkan: in flash attention, bounds check against nem1 (don't rely on …
jeffbolznv Oct 3, 2025
7723327
Capture model name only after first token (streaming) or completed re…
allozaur Oct 3, 2025
ad12647
ci : change macos-13 to macos-15-intel (#16401)
danbev Oct 3, 2025
0e1f838
vulkan: Fix FA coopmat1 invalid array indexing (#16365)
jeffbolznv Oct 3, 2025
2aaf0a2
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…
jeffbolznv Oct 3, 2025
84c8e30
Fix missing messages on sibling navigation (#16408)
allozaur Oct 3, 2025
638d330
ggml : fix graph reallocation with multiple chunks (#16396)
Acly Oct 3, 2025
946f71e
llama : fix shapes for bert/mpt q/k norm (#16409)
CISC Oct 3, 2025
606a73f
metal : fix loop bound in ggml_mem_ranges (#16412)
ggerganov Oct 3, 2025
f6dcda3
server : context checkpointing for hybrid and recurrent models (#16382)
ddh0 Oct 3, 2025
128d522
chat : support Magistral thinking (#16413)
ServeurpersoCom Oct 3, 2025
e29acf7
vulkan : incremental shader builds (#16341)
Acly Oct 4, 2025
898acba
rpc : add support for multiple devices (#16276)
rgerganov Oct 4, 2025
f392839
rpc : check src buffer when copying tensor (#16421)
rgerganov Oct 4, 2025
86df2c9
vulkan: use a more appropriate amount of threads when generating shad…
netrunnereve Oct 4, 2025
3526657
ggml webgpu: actually add softmax, fix rms_norm offset (#16400)
reeselevine Oct 5, 2025
ca71fb9
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
gabe-l-hart Oct 5, 2025
c5fef0f
server: update readme to mention n_past_max metric (#16436)
okuvshynov Oct 6, 2025
1d49ca3
nix : removed metal for nix (#16118)
yuannan Oct 6, 2025
a80ff18
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
danbev Oct 6, 2025
04e632a
ci : remove missing reranker model files (#16444)
danbev Oct 6, 2025
a23b9bd
ggml : fix unaligned access in AMX code (#16315)
ggerganov Oct 6, 2025
3a002af
ci : refactor sdk caching to minimize storage (#16414)
CISC Oct 6, 2025
c08002a
chat : Granite Docling stopping (#16438)
gabe-l-hart Oct 6, 2025
3df2244
llama : add --no-host to disable host buffers (#16310)
Gadflyii Oct 6, 2025
8ae32dc
metal : various optimizations + refactoring (#16446)
ggerganov Oct 7, 2025
1d6092f
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
ggerganov Oct 7, 2025
0a319bb
metal : add support for non-padded FA KV (#16148)
ggerganov Oct 7, 2025
0123ff3
memory : use sequential equal splits for recurrent modules (#16442)
ggerganov Oct 7, 2025
c61ae20
rpc : update documentation (#16441)
rgerganov Oct 7, 2025
ef4c5b8
presets : fix pooling param for embedding models (#16455)
ggerganov Oct 7, 2025
4e0388a
webui : added download action (#13552) (#16282)
srogmann Oct 7, 2025
df1b612
server : add `/v1/health` endpoint (#16461)
ggerganov Oct 7, 2025
aeaf8a3
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
tdakhran Oct 7, 2025
74b8fc1
ggml webgpu: profiling, CI updates, reworking of command submission (…
reeselevine Oct 7, 2025
7fdd16b
server : improve context checkpoint logic (#16440)
ggerganov Oct 8, 2025
b2c08c9
metal : mark FA blocks (#16372)
ggerganov Oct 8, 2025
d2ee056
server : fix cancel pending task (#16467)
issixx Oct 8, 2025
9d08828
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi Oct 8, 2025
12bbc3f
refactor: centralize CoT parsing in backend for streaming mode (#16394)
ServeurpersoCom Oct 8, 2025
e08db42
model: EmbeddingGemma Adding Support for SentenceTransformers Dense M…
sfallah Oct 9, 2025
b260213
[SYCL] refactor soft_max, add soft_max_back (#16472)
NeoZhangJianyu Oct 9, 2025
d80d6d2
kleidiai: kernel interface refactoring (#16460)
chaxu01 Oct 9, 2025
aa4711d
CANN: Improve ACL graph matching (#16166)
noemotiovon Oct 9, 2025
2c0d875
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm Oct 9, 2025
56b4795
model-conversion : add support for SentenceTransformers (#16387)
danbev Oct 9, 2025
8328fd4
No markdown in cot (#16483)
ServeurpersoCom Oct 9, 2025
d00cbea
server : host-memory prompt caching (#16391)
ggerganov Oct 9, 2025
1deee0f
cpu : optimize the ggml NORM operation (#15953)
duduta Oct 9, 2025
1faa13a
webui: updated the chat service to only include max_tokens in the req…
ServeurpersoCom Oct 9, 2025
6d69ab3
cmake : Dont define XOPENSOURCE on AIX (#16481)
mehendarkarprajwal Oct 10, 2025
cdb6da4
server : log requests to /v1/completions (#16495)
rgerganov Oct 10, 2025
68ee98a
server : return HTTP 400 if prompt exceeds context length (#16486)
rgerganov Oct 10, 2025
81086cd
vocab : mark EOT token for Granite models (#16499)
ggerganov Oct 10, 2025
e60f01d
server : fix division by zero when reporting stats (#16501)
ggerganov Oct 10, 2025
477a66b
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21 Oct 11, 2025
97870e6
cuda : avoid initializing unused devices (#16510)
slaren Oct 11, 2025
31d0ff1
server / ranking : add sorting and management of top_n (#16403)
YannFollet Oct 11, 2025
4a8fbe0
feat: render user content as markdown option (#16358)
ServeurpersoCom Oct 11, 2025
a3cb047
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
ggerganov Oct 11, 2025
11f0af5
CUDA: faster tile FA, add oob checks, more HSs (#16492)
JohannesGaessler Oct 11, 2025
20cc625
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6 Oct 12, 2025
a2fba89
hparams : add check for layer index in is_recurrent (#16511)
danbev Oct 12, 2025
41aac5c
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6 Oct 12, 2025
4b2dae3
common : update presets (#16504)
ggerganov Oct 12, 2025
a406da3
Merge branch 'layla-build' into merge
l3utterfly Oct 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 7 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ AllowShortIfStatementsOnASingleLine: Never
AllowShortLambdasOnASingleLine: Inline
AllowShortLoopsOnASingleLine: false
AlwaysBreakBeforeMultilineStrings: true
# Treat CUDA keywords/attributes as "attribute macros" and avoid breaking lines inside them
AttributeMacros:
- __host__
- __device__
- __global__
- __forceinline__
- __launch_bounds__
BinPackArguments: true
BinPackParameters: false # OnePerLine
BitFieldColonSpacing: Both
Expand Down
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Checks: >
clang-analyzer-*,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
-performance-enum-size,
portability-*,
-portability-simd-intrinsics,
misc-*,
Expand Down
6 changes: 3 additions & 3 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.2.2-0-devel-ubuntu24.04

## Build Image

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
Expand Down Expand Up @@ -31,7 +31,7 @@ RUN mkdir -p /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS base
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc4.2.0
ARG MUSA_VERSION=rc4.3.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}-amd64

Expand Down
4 changes: 0 additions & 4 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,6 @@ effectiveStdenv.mkDerivation (finalAttrs: {
};

postPatch = ''
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
'';

# With PR#6015 https://github.com/ggml-org/llama.cpp/pull/6015,
Expand Down
27 changes: 14 additions & 13 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ARG UBUNTU_VERSION=24.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=6.4
ARG AMDGPU_VERSION=6.4
ARG ROCM_VERSION=7.0
ARG AMDGPU_VERSION=7.0

# Target the CUDA build image
# Target the ROCm build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

### Build image
Expand All @@ -13,18 +13,14 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggml-org/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
# gfx803, gfx900, gfx1032, gfx1101, gfx1102,not officialy supported
# gfx906 is deprecated
#check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/reference/system-requirements.html
# gfx803, gfx900, gfx906, gfx1032, gfx1101, gfx1102,not officialy supported
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html

ARG ROCM_DOCKER_ARCH='gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102'
#ARG ROCM_DOCKER_ARCH=gfx1100
ARG ROCM_DOCKER_ARCH='gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1032;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201;gfx1151'
#ARG ROCM_DOCKER_ARCH='gfx1151'

# Set nvcc architectured
# Set ROCm architectures
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
# ENV CC=/opt/rocm/llvm/bin/clang
# ENV CXX=/opt/rocm/llvm/bin/clang++

RUN apt-get update \
&& apt-get install -y \
Expand All @@ -40,7 +36,12 @@ WORKDIR /app
COPY . .

RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
cmake -S . -B build \
-DGGML_HIP=ON \
-DGGML_HIP_ROCWMMA_FATTN=ON \
-DAMDGPU_TARGETS="$ROCM_DOCKER_ARCH" \
-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
&& cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib \
Expand Down
123 changes: 123 additions & 0 deletions .devops/s390x.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
ARG GCC_VERSION=15.2.0
ARG UBUNTU_VERSION=24.04

### Build Llama.cpp stage
FROM gcc:${GCC_VERSION} AS build

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt upgrade -y && \
apt install -y --no-install-recommends \
git cmake ccache ninja-build \
# WARNING: Do not use libopenblas-openmp-dev. libopenblas-dev is faster.
libopenblas-dev libcurl4-openssl-dev && \
rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY . .

RUN --mount=type=cache,target=/root/.ccache \
--mount=type=cache,target=/app/build \
cmake -S . -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DLLAMA_BUILD_TESTS=OFF \
-DGGML_BACKEND_DL=OFF \
-DGGML_NATIVE=OFF \
-DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS && \
cmake --build build --config Release -j $(nproc) && \
cmake --install build --prefix /opt/llama.cpp

COPY *.py /opt/llama.cpp/bin
COPY .devops/tools.sh /opt/llama.cpp/bin

COPY gguf-py /opt/llama.cpp/gguf-py
COPY requirements.txt /opt/llama.cpp/gguf-py
COPY requirements /opt/llama.cpp/gguf-py/requirements


### Collect all llama.cpp binaries, libraries and distro libraries
FROM scratch AS collector

# Copy llama.cpp binaries and libraries
COPY --from=build /opt/llama.cpp/bin /llama.cpp/bin
COPY --from=build /opt/llama.cpp/lib /llama.cpp/lib
COPY --from=build /opt/llama.cpp/gguf-py /llama.cpp/gguf-py


### Base image
FROM ubuntu:${UBUNTU_VERSION} AS base

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt install -y --no-install-recommends \
# WARNING: Do not use libopenblas-openmp-dev. libopenblas-dev is faster.
# See: https://github.com/ggml-org/llama.cpp/pull/15915#issuecomment-3317166506
curl libgomp1 libopenblas-dev && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

# Copy llama.cpp libraries
COPY --from=collector /llama.cpp/lib /usr/lib/s390x-linux-gnu


### Full
FROM base AS full

ENV PATH="/root/.cargo/bin:${PATH}"
WORKDIR /app

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt install -y \
git cmake libjpeg-dev \
python3 python3-pip python3-dev && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

RUN curl https://sh.rustup.rs -sSf | bash -s -- -y

COPY --from=collector /llama.cpp/bin /app
COPY --from=collector /llama.cpp/gguf-py /app/gguf-py

RUN pip install --no-cache-dir --break-system-packages \
-r /app/gguf-py/requirements.txt

ENTRYPOINT [ "/app/tools.sh" ]


### CLI Only
FROM base AS light

WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/llama-cli /llama.cpp/bin

ENTRYPOINT [ "/llama.cpp/bin/llama-cli" ]


### Server
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/llama-server /llama.cpp/bin

EXPOSE 8080

ENTRYPOINT [ "/llama.cpp/bin/llama-server" ]
8 changes: 8 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,11 @@ insert_final_newline = unset
[vendor/miniaudio/miniaudio.h]
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/webui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset
36 changes: 36 additions & 0 deletions .github/actions/install-exe/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Install exe"
description: "Download and install exe"
inputs:
url:
description: "URL of the exe installer"
required: true
args:
description: "Installer arguments"
required: true
timeout:
description: "Timeout (in ms)"
required: false
default: "600000"

runs:
using: "composite"
steps:
- name: Install EXE
shell: pwsh
run: |
$ErrorActionPreference = "Stop"
write-host "Downloading Installer EXE"
Invoke-WebRequest -Uri "${{ inputs.url }}" -OutFile "${env:RUNNER_TEMP}\temp-install.exe"
write-host "Installing"
$proc = Start-Process "${env:RUNNER_TEMP}\temp-install.exe" -ArgumentList '${{ inputs.args }}' -NoNewWindow -PassThru
$completed = $proc.WaitForExit(${{ inputs.timeout }})
if (-not $completed) {
Write-Error "Installer timed out. Killing the process"
$proc.Kill()
exit 1
}
if ($proc.ExitCode -ne 0) {
Write-Error "Installer failed with exit code $($proc.ExitCode)"
exit 1
}
write-host "Completed installation"
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-spacemit/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup SpacemiT Toolchain"
description: "Setup SpacemiT Toolchain for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "SpacemiT toolchain version"
required: true

runs:
using: "composite"
steps:
- name: Setup SpacemiT Toolchain
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v${{ inputs.version }}.tar.xz
path: ${{ inputs.path }}
strip: 1
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-vulkan/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup Vulkan SDK"
description: "Setup Vulkan SDK for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "Vulkan SDK version"
required: true

runs:
using: "composite"
steps:
- name: Setup Vulkan SDK
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://sdk.lunarg.com/sdk/download/${{ inputs.version }}/linux/vulkan_sdk.tar.xz
path: ${{ inputs.path }}
strip: 1
27 changes: 27 additions & 0 deletions .github/actions/unarchive-tar/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: "Unarchive tar"
description: "Download and unarchive tar into directory"
inputs:
url:
description: "URL of the tar archive"
required: true
path:
description: "Directory to unarchive into"
required: true
type:
description: "Compression type (tar option)"
required: false
default: "J"
strip:
description: "Strip components"
required: false
default: "0"

runs:
using: "composite"
steps:
- name: Unarchive into directory
shell: bash
run: |
mkdir -p ${{ inputs.path }}
cd ${{ inputs.path }}
curl --no-progress-meter ${{ inputs.url }} | tar -${{ inputs.type }}x --strip-components=${{ inputs.strip }}
15 changes: 15 additions & 0 deletions .github/actions/windows-setup-rocm/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: "Windows - Setup ROCm"
description: "Setup ROCm for Windows"
inputs:
version:
description: "ROCm version"
required: true

runs:
using: "composite"
steps:
- name: Setup ROCm
uses: ./.github/actions/install-exe
with:
url: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-${{ inputs.version }}-WinSvr2022-For-HIP.exe
args: -install
Loading
Loading