Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
227 commits
Select commit Hold shift + click to select a range
5c147cf
CANN: Enable labeler for Ascend NPU (#13914)
shink Jun 9, 2025
0535f71
add geglu activation function (#14074)
huydt84 Jun 9, 2025
33ff3a0
sycl: Add reorder to Q6_K mmvq implementation (#13885)
s-Nick Jun 9, 2025
745c9e3
server : fix LRU check (#14079)
ggerganov Jun 9, 2025
3990806
webui: fix sidebar being covered by main content (#14082)
yeahdongcn Jun 9, 2025
7a97e66
CANN: Simplify the environment variable setting(#13104)
bachelor-dou Jun 9, 2025
abd6aa0
graph : fix geglu (#14077)
ggerganov Jun 9, 2025
080372b
cuda : fix device sync on buffer clear (#14033)
slaren Jun 9, 2025
a3a9821
ggml-cpu : split arch-specific implementations (#13892)
xctan Jun 9, 2025
a0a150e
llama : allow building all tests on windows when not using shared lib…
slaren Jun 9, 2025
f13dc05
kv-cache : fix shift and defrag logic (#14081)
ggerganov Jun 9, 2025
e363039
metal : use less stack memory in FA kernel (#14088)
ggerganov Jun 9, 2025
d8f2854
Add in-build ggml::ggml ALIAS library (ggml/1260)
dg0yt Jun 3, 2025
3aef2eb
sync : ggml
ggerganov Jun 10, 2025
5efd2b4
rpc : nicer error messages for RPC server crash (#14076)
isaac-mcfadyen Jun 10, 2025
8ec1000
Vulkan: Don't default to CPU device (like llvmpipe), even if no other…
0cc4m Jun 10, 2025
ffb4b81
ggml : fix weak alias win32 (whisper/0)
ggerganov Jun 10, 2025
bad4e67
sync : ggml
ggerganov Jun 10, 2025
91ecaab
Fixed spec timings to: accepted/tested instead of accepted/drafted (#…
jukofyork Jun 10, 2025
7a1b212
vulkan: force device 0 in CI (#14106)
jeffbolznv Jun 10, 2025
91c1e91
llama : support GEGLU for jina-bert-v2 (#14090)
CISC Jun 10, 2025
663365d
convert : fix duplicate key DeepSeek-R1 conversion error (#14103)
CISC Jun 10, 2025
67ff796
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)
compilade Jun 10, 2025
cdd3390
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
lhez Jun 10, 2025
3c1cac1
vulkan: Track descriptor pools/sets per-context (#14109)
jeffbolznv Jun 11, 2025
ad6387f
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)
ggerganov Jun 11, 2025
3951eee
server : pass default --keep argument (#14120)
MightyAlex200 Jun 11, 2025
36e58b6
kv-cache : relax SWA masking condition (#14119)
ggerganov Jun 11, 2025
6b90817
webui: Wrap long numbers instead of infinite horizontal scroll (#14062)
am17an Jun 11, 2025
cc32d37
vulkan: Better thread-safety for command pools/buffers (#14116)
jeffbolznv Jun 11, 2025
e480d33
tests : add test-tokenizers-repo (#14017)
CISC Jun 11, 2025
af3e114
chore : clean up relative source dir paths (#14128)
CISC Jun 11, 2025
88316d5
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)
ckastner Jun 11, 2025
8461fc9
common: fix issue with regex_escape routine on windows (#14133)
bandoti Jun 11, 2025
2d58e56
context : round n_tokens to next multiple of n_seqs when reserving (#…
compilade Jun 12, 2025
ea43f52
kv-cache : fix split_equal handling in unified implementation (#14130)
ggerganov Jun 12, 2025
4aba4c7
cmake : handle whitepsaces in path during metal build (#14126)
ggerganov Jun 12, 2025
d525491
batch : remove logits_all flag (#14141)
ggerganov Jun 12, 2025
fb333bc
context : simplify output counting logic during decode (#14142)
ggerganov Jun 12, 2025
485b6f7
server : re-enable SWA speculative decoding (#14131)
ggerganov Jun 12, 2025
e2788b0
readme : remove project status link (#14149)
ggerganov Jun 12, 2025
880c476
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
ShanoToni Jun 12, 2025
69a7933
vocab : prevent heap overflow when vocab is too small (#14145)
ggerganov Jun 13, 2025
e7e1707
cmake : Improve build-info.cpp generation (#14156)
ckastner Jun 13, 2025
0ebcbdf
SYCL: Bump oneMath commit (#14152)
Jun 13, 2025
25a13dc
sycl: Adding additional cpy dbg print output (#14034)
ShanoToni Jun 13, 2025
fc1c921
server : fix SWA condition for full context reprocess (#14163)
ggerganov Jun 13, 2025
5463a47
pooling : make cls_b and cls_out_b optional (#14165)
huydt84 Jun 13, 2025
6f5e438
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)
ckastner Jun 13, 2025
c08cb3e
readme : remove survey link (#14168)
ggerganov Jun 13, 2025
25f56d3
batch : rework llama_batch_allocr (#14153)
ggerganov Jun 13, 2025
6cbc6f0
docs : Update multimodal.md (#14122)
ddpasa Jun 13, 2025
1eb3912
batch : add LLAMA_BATCH_DEBUG environment variable (#14172)
ggerganov Jun 13, 2025
9b6193a
Merge commit from fork
GuyGoldenberg Jun 13, 2025
e2d7df7
sycl: fix docker image (#14144)
sgeor255 Jun 13, 2025
05f8578
vocab : fix build (#14175)
ggerganov Jun 13, 2025
c492b26
compare-llama-bench: add option to plot (#14169)
am17an Jun 14, 2025
6aa92a9
llama-chat : Do not throw when tool parsing fails (#14012)
p1-0tr Jun 14, 2025
c9d5c9a
docs : remove WIP since PR has been merged (#13912)
pepijndevos Jun 15, 2025
70f7779
batch : auto-gen positions + verify multi-sequence input (#14177)
ggerganov Jun 15, 2025
9f7fd09
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
ggerganov Jun 15, 2025
2424aad
model : add dots.llm1 architecture support (#14044) (#14118)
Noeda Jun 15, 2025
c09bc3d
kv-cache : fix use-after-move of defrag info (#14189)
ggerganov Jun 15, 2025
f3afe74
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRON…
IMbackK Jun 15, 2025
478eb71
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196)
IMbackK Jun 15, 2025
422ddad
quantize : change int to unsigned int for KV overrides (#14197)
EAddario Jun 15, 2025
fe81c61
server : When listening on a unix domain socket don't print http:// a…
ericcurtin Jun 15, 2025
445838b
model : Add support for Arcee AI's upcoming AFM model (#14185)
bartowski1182 Jun 15, 2025
9e70ab8
ggml-cpu : rework weak alias on apple targets (#14146)
xctan Jun 16, 2025
7a84d54
vulkan: mutex around vkQueueSubmit (#14127)
jeffbolznv Jun 16, 2025
7a89645
gguf-py : allow key override when adding value to GGUFWriter (#14194)
huydt84 Jun 16, 2025
60764cf
convert : remove arcee change in convert_hf_to_gguf_update.py (#14207)
bartowski1182 Jun 16, 2025
343b6b5
ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)
chaxu01 Jun 16, 2025
d6718d4
llama : rework embeddings logic (#14208)
ggerganov Jun 16, 2025
c319e78
HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202)
IMbackK Jun 16, 2025
6399950
model : add NeoBERT (#14164)
huydt84 Jun 16, 2025
b34cf4d
cmake: clean up external project logic for vulkan-shaders-gen (#14179)
bandoti Jun 16, 2025
d52835d
llama : add thread safety test (#14035)
slaren Jun 16, 2025
a98d749
server : fix incorrect usage of llama_get_embeddings() (#14225)
ggerganov Jun 16, 2025
d17a0df
common : suggest --jinja when autodetection fails (#14222)
CISC Jun 16, 2025
663fee0
musa: fix build warning (unused variable) (#14231)
yeahdongcn Jun 17, 2025
318ed3e
ggml-cpu : remove the weak alias trick (#14221)
xctan Jun 17, 2025
1b7d0b3
cmake: remove shader-gen step-targets from ggml-vulkan (#14226)
bandoti Jun 17, 2025
e5a8216
examples : include examples in msvc disable warn (ggml/1270)
danbev Jun 12, 2025
eb182fa
ggml : remove unused ggml_context_container (ggml/1272)
danbev Jun 13, 2025
a0a25a7
ggml : disable warnings for tests when using MSVC (ggml/1273)
danbev Jun 13, 2025
b8ebb3f
sync : ggml
ggerganov Jun 18, 2025
13a6f9c
convert : fix null head_dim AutoConfig regression (#14248)
CISC Jun 18, 2025
578b7eb
llama-chat : fix multiple system message for gemma, orion (#14246)
ngxson Jun 18, 2025
ce2175d
mtmd : refactor llava-uhd preprocessing logic (#14247)
ngxson Jun 18, 2025
0ae12dd
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258)
chaxu01 Jun 18, 2025
3c6e1be
ggml-cpu: fix uncaught underscore terminators (#14023)
taronaeo Jun 18, 2025
b1d39c2
ggml-cpu: reduce asm calls for hsum (#14037)
taronaeo Jun 18, 2025
3abd479
docs: add s390x build documentation (#14264)
taronaeo Jun 18, 2025
35283a8
metal : add mean kernel (#14267)
ggerganov Jun 19, 2025
d8a62c6
memory : Hybrid recurrent cache (#13979)
gabe-l-hart Jun 19, 2025
b722c21
Vulkan: Set device max size for host memory to avoid OOM warning and …
0cc4m Jun 19, 2025
87b027c
llamafile : support s390x SIMD instruction set (#14273)
taronaeo Jun 19, 2025
33e9d33
convert : fix remote option in Windows (#14100)
pqnet Jun 19, 2025
1d8f457
llama-bench : add --no-warmup flag (#14224) (#14270)
s2010 Jun 19, 2025
a04513c
sycl: Cleanup codepaths in Get Rows in sycl backend (#14215)
ShanoToni Jun 19, 2025
ad27987
build : suppress gcc15 compile warnings (#14261)
fanyang89 Jun 19, 2025
beb522f
server : add server parameters for draft model cache type (#13782)
aa956 Jun 19, 2025
2d49576
gguf-py : make sentencepiece optional (#14200)
Ahajha Jun 19, 2025
d042e21
ggml-cpu : remove unnecesary arm feature detection (#14281)
slaren Jun 19, 2025
f4b5149
CUDA: add conv_2d_dw (#14265)
am17an Jun 20, 2025
9acf123
ubatch : new splitting logic (#14217)
ggerganov Jun 20, 2025
46309cd
model : more uniform output id handling (#14275)
ggerganov Jun 20, 2025
a8c60ca
ggml: Update KleidiAI to v1.9.0 (#14277)
chaxu01 Jun 20, 2025
ebe39a1
ggml : fix repack work size for mul_mat_id (#14292)
ggerganov Jun 20, 2025
7916d58
cuda : synchronize graph capture and cublas handle destruction (#14288)
slaren Jun 20, 2025
6ff1b1c
llama : improve sep token handling (#14272)
CISC Jun 20, 2025
fea42e5
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)
ckastner Jun 20, 2025
a01ea44
sycl: add usage of enqueue_functions extension (#14244)
s-Nick Jun 20, 2025
e8831ca
vocab : prevent tokenizer overflow (#14301)
retr0reg Jun 20, 2025
7ce69f9
lint : remove trailing whitepace (#14304)
CISC Jun 20, 2025
139d6aa
CUDA: add conv_2d_transpose (#14287)
am17an Jun 20, 2025
8d1a2d0
docs : fix the link to llama.h (#14293)
david20571015 Jun 20, 2025
36266ce
Add `ggml_roll` (ggml/1274)
Acly Jun 18, 2025
414961a
sync : ggml
ggerganov Jun 20, 2025
f90afed
convert : fix Llama 4 conversion (#14311)
danielhanchen Jun 21, 2025
392125a
memory : rename interface to llama_memory_context_i (#14296)
ggerganov Jun 21, 2025
0d9cbe8
metal : fix thread-safety (#14300)
ggerganov Jun 21, 2025
8fdef27
gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312)
CISC Jun 21, 2025
01879a9
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (…
mtavenrath Jun 21, 2025
c9e237b
gguf-py : fix Qwen3-Embedding eos token (#14314)
CISC Jun 21, 2025
a914433
CUDA: add mean operation (#14313)
am17an Jun 22, 2025
c6c74af
common : use std::string_view now that we target c++17 (#14319)
CISC Jun 22, 2025
3ecf9cd
mtmd : fix Pixtral OOM with large images by capping image_size to 102…
yuiseki Jun 22, 2025
aee9a59
HIP: enable vec fattn on RDNA4 (#14323)
IMbackK Jun 22, 2025
15651b4
examples : fix is_first logic for tokenization (#14329)
ggerganov Jun 22, 2025
51c1780
run : avoid double tokenization (#14327)
retr0reg Jun 22, 2025
2c9516c
gguf-py : fix SpecialVocab parsing when post_processor is null (#14330)
CISC Jun 22, 2025
4cf51e0
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
EAddario Jun 22, 2025
b4ecefe
vulkan: update windows SDK in CI (#14334)
jeffbolznv Jun 23, 2025
3a24929
kv-cells : fix tracking of seq_pos (#14339)
ggerganov Jun 23, 2025
f654881
CUDA: mul_mat_v support for batch sizes > 1 (#14262)
JohannesGaessler Jun 23, 2025
770d746
llama : better rwkv chat template and add missing `inputs.use_jinja` …
MollySophia Jun 23, 2025
02419e0
vulkan: update windows SDK in release.yml (#14344)
jeffbolznv Jun 23, 2025
1e3cef3
ci: add workflow for relocatable cmake package (#14346)
bandoti Jun 23, 2025
ef1f24e
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324)
IMbackK Jun 23, 2025
ecbf7d3
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349)
bartowski1182 Jun 24, 2025
ce9b75d
main : honor --verbose-prompt on interactive prompts (#14350)
CISC Jun 24, 2025
131258f
server : move no API key doc to /health (#14352)
pnb Jun 24, 2025
4b1527f
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#…
mbaudier Jun 24, 2025
389097d
batch : fix check for empty sequences in memory (#14364)
ggerganov Jun 24, 2025
de69d88
opencl: ref count `ggml_backend_opencl_context` and refactor profilin…
lhez Jun 24, 2025
8f349d6
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973)
ShanoToni Jun 25, 2025
972f7f7
ggml : do not output unprintable characters on GGUF load failure (#14…
CISC Jun 25, 2025
cbf9ca7
ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)
taronaeo Jun 25, 2025
99bfc31
musa: enable fp16 mma (all) and cublas on qy2 (#13842)
yeahdongcn Jun 26, 2025
05cc102
docs: update s390x documentation + add faq (#14389)
taronaeo Jun 26, 2025
48c9ad9
metal : batch rows copy in a single threadgroup (#14384)
ggerganov Jun 26, 2025
fc4378c
metal : add special-case mat-vec mul for ne00 == 4 (#14385)
ggerganov Jun 26, 2025
4aea364
llama : return mistral-v7-tekken as default template only (#14390)
CISC Jun 26, 2025
5e9b3f5
cmake: regen vulkan shaders when shaders-gen sources change (#14398)
bandoti Jun 26, 2025
61021d9
model : gemma3n text-only (#14400)
ngxson Jun 26, 2025
a898c4c
convert : fix broken sentencepiece vocab (#14416)
CISC Jun 27, 2025
69fbcc4
ggml : add ggml_set_rows (#14274)
rgerganov Jun 27, 2025
3ddcee8
recurrent : call balloc split_reset() in init_batch() (#14414)
ggerganov Jun 27, 2025
e825dab
graph : make llm_graph_context destructor virtual (#14410)
ggerganov Jun 27, 2025
29f2e4b
vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427)
jeffbolznv Jun 28, 2025
3be8245
ci : fix windows build and release (#14431)
CISC Jun 28, 2025
68af1ad
fix async_mode bug (#14432)
bachelor-dou Jun 28, 2025
603eb45
model : add support for ERNIE 4.5 0.3B model (#14408)
ownia Jun 28, 2025
ffaa00e
vulkan: lock accesses of pinned_memory vector (#14333)
jeffbolznv Jun 28, 2025
6c7c22d
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…
jeffbolznv Jun 28, 2025
809eaa3
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)
am17an Jun 28, 2025
e806405
vulkan: Add fusion support for RMS_NORM+MUL (#14366)
jeffbolznv Jun 29, 2025
94ef6d4
ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)
CISC Jun 29, 2025
6955d77
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)
CISC Jun 29, 2025
fda5efa
SYCL: disable faulty fp16 exp kernel (#14395)
qnixsynapse Jun 29, 2025
cd2b0d6
server : fix appearance of the chats list context menu for Safari (#1…
rntk Jun 29, 2025
4d25fa3
server : support jinja extra template kwargs (Qwen3 enable_thinking f…
matteoserva Jun 29, 2025
895abfa
scripts : make the shell scripts cross-platform (#14341)
vedranmiletic Jun 30, 2025
27610fa
cmake : Remove redundant include path in CMakeLists.txt (#14452)
xiaobing318 Jun 30, 2025
64ada03
test-backend-ops : disable llama test (#14461)
slaren Jun 30, 2025
fc93c6f
ggml-cpu: sycl: Re-enable exp f16 (#14462)
Rbiessy Jun 30, 2025
87775bf
metal : disable fast-math for some cpy kernels (#14460)
ggerganov Jun 30, 2025
f562814
memory : correctly handle failure in apply() (#14438)
ggerganov Jun 30, 2025
9f8ecbb
Add Conv2d for CPU (#14388)
am17an Jun 30, 2025
bee4c9f
opencl : add GEGLU, REGLU, SWIGLU (#14456)
lhez Jul 1, 2025
1ee1b18
ggml-quants : rename best_mad to best_error (ggml/1283)
danbev Jun 24, 2025
179d8ff
ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285)
Acly Jul 1, 2025
bdac540
sync : ggml
ggerganov Jul 1, 2025
86f13cc
ggml : remove trailing whitespace (#0)
ggerganov Jul 1, 2025
f2517c7
add GELU_ERF (#14455)
CISC Jul 1, 2025
5486078
vulkan: Split large mul_mat_id to fit in shared memory (#14451)
jeffbolznv Jul 1, 2025
87bcb90
CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (#14411)
noemotiovon Jul 1, 2025
d72abf5
Add Vulkan images to docker.md (#14472)
xek Jul 1, 2025
cc30577
ci : disable fast-math for Metal GHA CI (#14478)
ggerganov Jul 1, 2025
6f5a56e
ggml : Callback before abort (#14481)
ScaledLizard Jul 2, 2025
157f2a1
github : add OpenCL backend to issue templates (#14492)
EZForever Jul 2, 2025
05d5672
ci : add OpenCL to labeler workflow (#14496)
CISC Jul 2, 2025
49630b2
opencl : update upscale to support align corners (#14488)
lhez Jul 2, 2025
e9228dd
opencl : skip empty nodes on cgraph compute (#14491)
EZForever Jul 2, 2025
928a6bb
simple-chat : fix context-exceeded condition (#14494)
ggerganov Jul 2, 2025
cfe473f
opencl : fix possible buffer overflow in dump_tensor (#14490)
jeffzhou2000 Jul 2, 2025
05bc507
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435)
ggerganov Jun 27, 2025
cecbeed
vulkan: support softmax/FA batch and broadcast (#14449)
jeffbolznv Jul 1, 2025
c850702
CUDA: broadcasting for FlashAttention mask (#14500)
JohannesGaessler Jul 2, 2025
8545200
CUDA: add softmax broadcast (#14475)
am17an Jul 2, 2025
72c4484
Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dyn…
rotemdan Jul 2, 2025
e7044b8
ggml : add version function to get lib version (ggml/1286)
danbev Jul 2, 2025
559ccc5
sync : ggml
ggerganov Jul 2, 2025
32a438c
llama : initial Mamba-2 support (#9126)
compilade Jul 2, 2025
0035c8c
gguf-py : add support for chat template jinja files (#14508)
CISC Jul 2, 2025
3c769d7
CUDA: add dynamic shared mem to softmax, refactor general usage (#14497)
am17an Jul 2, 2025
ca3ce83
ggml : remove kompute backend (#14501)
ggerganov Jul 3, 2025
14d1b75
ggml : fix FA mask dim 2 and 3 (#14505)
ggerganov Jul 3, 2025
584e606
kv-cache : use ggml_set_rows (#14285)
ggerganov Jul 3, 2025
584030a
convert : correct gemma 3n conversion (#14450)
ngxson Jul 3, 2025
94338b5
Fix conditional enabling following arch checks for ggml-sycl (#14504)
s-Nick Jul 3, 2025
3514d69
ggml: backward pass for split swiglu (#14483)
JohannesGaessler Jul 3, 2025
1e5ea60
vulkan: support mixed/deepseekR1 FA head sizes (#14509)
jeffbolznv Jul 3, 2025
1bab2e5
opencl : broadcast for soft_max (#14510)
lhez Jul 3, 2025
7bf30a6
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445)
CISC Jul 3, 2025
e4bdc4f
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)
luyhcsu Jul 4, 2025
6dac1b4
batch : add n_used count (#14512)
ggerganov Jul 4, 2025
8fbe003
graph : prepare for 4D mask (#14515)
ggerganov Jul 4, 2025
826bc77
batch : add optional for sequential equal split (#14511)
ggerganov Jul 4, 2025
c5d7e5e
metal : disable fast math in all quantize kernels (#14528)
ggerganov Jul 4, 2025
a9618e0
test-backend-ops: add support for specifying output format (#14368)
yeahdongcn Jul 5, 2025
11aa9ca
eval-callback : check for empty input (#14539)
ggerganov Jul 5, 2025
83a3fa9
opencl: add GELU_ERF (#14476)
CISC Jul 5, 2025
0a2c839
server : fix assistant prefilling when content is an array (#14360)
CISC Jul 5, 2025
172ab38
vulkan: Handle updated FA dim2/3 definition (#14518)
jeffbolznv Jul 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
30 changes: 17 additions & 13 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,23 @@ COPY --from=build /app/full /app

WORKDIR /app

RUN apt-get update \
&& apt-get install -y \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

RUN apt-get update && \
apt-get install -y \
git \
python3 \
python3-pip \
python3-venv && \
python3 -m venv /opt/venv && \
. /opt/venv/bin/activate && \
pip install --upgrade pip setuptools wheel && \
pip install -r requirements.txt && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENV PATH="/opt/venv/bin:$PATH"

ENTRYPOINT ["/app/tools.sh"]

Expand Down
2 changes: 1 addition & 1 deletion .devops/tools.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/usr/bin/env bash
set -e

# Read the first argument into a variable
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Kompute, Metal, Musa, RPC, SYCL, Vulkan]
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
multiple: true
validations:
required: true
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Kompute, Metal, Musa, RPC, SYCL, Vulkan]
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
multiple: true
validations:
required: true
Expand Down
18 changes: 12 additions & 6 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
# https://github.com/actions/labeler
Kompute:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-kompute.h
- ggml/src/ggml-kompute/**
- README-kompute.md
Apple Metal:
- changed-files:
- any-glob-to-any-file:
Expand Down Expand Up @@ -86,3 +80,15 @@ nix:
embedding:
- changed-files:
- any-glob-to-any-file: examples/embedding/

Ascend NPU:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-cann.h
- ggml/src/ggml-cann/**
- docs/backend/CANN.md
OpenCL:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-opencl.h
- ggml/src/ggml-opencl/**
51 changes: 51 additions & 0 deletions .github/workflows/build-cmake-pkg.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Build relocatable cmake package
on:
workflow_dispatch:
workflow_call:

jobs:
linux:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install dependencies
run: |
sudo apt update
sudo apt install -y build-essential tcl

- name: Build
run: |
PREFIX="$(pwd)"/inst
cmake -S . -B build -DCMAKE_PREFIX_PATH="$PREFIX" \
-DLLAMA_CURL=OFF -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_TOOLS=OFF \
-DLLAMA_BUILD_EXAMPLES=OFF -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --prefix "$PREFIX" --config Release

export LLAMA_CONFIG="$PREFIX"/lib/cmake/llama/llama-config.cmake
tclsh <<'EOF'
set build(commit) [string trim [exec git rev-parse --short HEAD]]
set build(number) [string trim [exec git rev-list --count HEAD]]
set build(version) "0.0.$build(number)"

set llamaconfig [read [open "$env(LLAMA_CONFIG)" r]]
set checks [list "set\\(LLAMA_VERSION \\s+$build(version)\\)" \
"set\\(LLAMA_BUILD_COMMIT\\s+$build(commit)\\)" \
"set\\(LLAMA_BUILD_NUMBER\\s+$build(number)\\)"]

puts -nonewline "Checking llama-config.cmake version... "
foreach check $checks {
if {![regexp -expanded -- $check $llamaconfig]} {
puts "\"$check\" failed!"
exit 1
}
}
puts "success."
EOF

cd examples/simple-cmake-pkg
cmake -S . -B build -DCMAKE_PREFIX_PATH="$PREFIX"/lib/cmake
cmake --build build
82 changes: 60 additions & 22 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,43 @@ on:
push:
branches:
- master
paths: ['.github/workflows/build.yml', '.github/workflows/build-linux-cross.yml', '**/CMakeLists.txt', '**/.cmake', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
paths: [
'.github/workflows/build.yml',
'.github/workflows/build-linux-cross.yml',
'.github/workflows/build-cmake-pkg.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
'**/*.cu',
'**/*.cuh',
'**/*.swift',
'**/*.m',
'**/*.metal',
'**/*.comp'
]

pull_request:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/build.yml', '.github/workflows/build-linux-cross.yml', '**/CMakeLists.txt', '**/.cmake', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
paths: [
'.github/workflows/build.yml',
'.github/workflows/build-linux-cross.yml',
'.github/workflows/build-cmake-pkg.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
'**/*.cu',
'**/*.cuh',
'**/*.swift',
'**/*.m',
'**/*.metal',
'**/*.comp'
]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down Expand Up @@ -51,7 +84,8 @@ jobs:
-DCMAKE_BUILD_RPATH="@loader_path" \
-DLLAMA_FATAL_WARNINGS=ON \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DGGML_METAL_EMBED_LIBRARY=OFF \
-DGGML_METAL_SHADER_DEBUG=ON \
-DGGML_RPC=ON
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)

Expand Down Expand Up @@ -306,6 +340,7 @@ jobs:
id: cmake_test
run: |
cd build
export GGML_VK_VISIBLE_DEVICES=0
# This is using llvmpipe and runs slower than other backends
ctest -L main --verbose --timeout 3600

Expand Down Expand Up @@ -477,6 +512,9 @@ jobs:
build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

build-cmake-pkg:
uses: ./.github/workflows/build-cmake-pkg.yml

macOS-latest-cmake-ios:
runs-on: macos-latest

Expand Down Expand Up @@ -627,7 +665,7 @@ jobs:
./build-xcframework.sh

windows-msys2:
runs-on: windows-latest
runs-on: windows-2025

strategy:
fail-fast: false
Expand Down Expand Up @@ -677,28 +715,31 @@ jobs:
cmake --build build --config ${{ matrix.build }} -j $(nproc)

windows-latest-cmake:
runs-on: windows-latest
runs-on: windows-2025

env:
OPENBLAS_VERSION: 0.3.23
SDE_VERSION: 9.33.0-2024-01-07
VULKAN_VERSION: 1.4.309.0
VULKAN_VERSION: 1.4.313.2

strategy:
matrix:
include:
- build: 'cpu-x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF'
- build: 'cpu-x64 (static)'
arch: 'x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF'
- build: 'openblas-x64'
arch: 'x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
- build: 'vulkan-x64'
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON'
arch: 'x64'
defines: '-DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON'
- build: 'llvm-arm64'
arch: 'arm64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON'
- build: 'llvm-arm64-opencl-adreno'
arch: 'arm64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON'
# - build: 'kompute-x64'
# defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_KOMPUTE=ON -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON'

steps:
- name: Clone
Expand All @@ -712,12 +753,6 @@ jobs:
variant: ccache
evict-old-files: 1d

- name: Clone Kompute submodule
id: clone_kompute
if: ${{ matrix.build == 'kompute-x64' }}
run: |
git submodule update --init ggml/src/ggml-kompute/kompute

- name: Download OpenBLAS
id: get_openblas
if: ${{ matrix.build == 'openblas-x64' }}
Expand All @@ -733,9 +768,9 @@ jobs:

- name: Install Vulkan SDK
id: get_vulkan
if: ${{ matrix.build == 'kompute-x64' || matrix.build == 'vulkan-x64' }}
if: ${{ matrix.build == 'vulkan-x64' }}
run: |
curl.exe -o $env:RUNNER_TEMP/VulkanSDK-Installer.exe -L "https://sdk.lunarg.com/sdk/download/${env:VULKAN_VERSION}/windows/VulkanSDK-${env:VULKAN_VERSION}-Installer.exe"
curl.exe -o $env:RUNNER_TEMP/VulkanSDK-Installer.exe -L "https://sdk.lunarg.com/sdk/download/${env:VULKAN_VERSION}/windows/vulkansdk-windows-X64-${env:VULKAN_VERSION}.exe"
& "$env:RUNNER_TEMP\VulkanSDK-Installer.exe" --accept-licenses --default-answer --confirm-command install
Add-Content $env:GITHUB_ENV "VULKAN_SDK=C:\VulkanSDK\${env:VULKAN_VERSION}"
Add-Content $env:GITHUB_PATH "C:\VulkanSDK\${env:VULKAN_VERSION}\bin"
Expand Down Expand Up @@ -768,6 +803,8 @@ jobs:
- name: libCURL
id: get_libcurl
uses: ./.github/actions/windows-setup-curl
with:
architecture: ${{ matrix.arch == 'x64' && 'win64' || 'win64a' }}

- name: Build
id: cmake_build
Expand All @@ -777,6 +814,7 @@ jobs:
cmake -S . -B build ${{ matrix.defines }} `
-DCURL_LIBRARY="$env:CURL_PATH/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:CURL_PATH/include"
cmake --build build --config Release -j ${env:NUMBER_OF_PROCESSORS}
cp $env:CURL_PATH/bin/libcurl-*.dll build/bin/Release

- name: Add libopenblas.dll
id: add_libopenblas_dll
Expand All @@ -787,7 +825,7 @@ jobs:

- name: Test
id: cmake_test
if: ${{ matrix.build != 'llvm-arm64' && matrix.build != 'llvm-arm64-opencl-adreno' }}
if: ${{ matrix.arch == 'x64' }}
run: |
cd build
ctest -L main -C Release --verbose --timeout 900
Expand Down Expand Up @@ -892,7 +930,7 @@ jobs:
cmake --build build --config Release

windows-latest-cmake-sycl:
runs-on: windows-latest
runs-on: windows-2022

defaults:
run:
Expand Down Expand Up @@ -926,7 +964,7 @@ jobs:

windows-latest-cmake-hip:
if: ${{ github.event.inputs.create_release != 'true' }}
runs-on: windows-latest
runs-on: windows-2022

steps:
- name: Clone
Expand Down
Loading