Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
197 commits
Select commit Hold shift + click to select a range
cdf7658
CUDA: fix non-cont. inputs for batched mat mul (#13155)
JohannesGaessler Apr 29, 2025
5a63980
llama-bench: fixed size of fields to correctly map to values (#13183)
Alcpz Apr 29, 2025
d9d398f
sampling : when top-k <= 0 -> noop (#13173)
ggerganov Apr 29, 2025
e2e1ddb
server : Prefilling assistant message in openai compatible API (#13174)
matteoserva Apr 29, 2025
19e899c
scripts: n_depth for compare-llama-bench [no ci] (#13201)
JohannesGaessler Apr 29, 2025
a0f7016
rpc : fix cache directory initialization (#13188)
hbuxiaofei Apr 30, 2025
da84c04
docker : do not build tests (#13204)
ngxson Apr 30, 2025
5933e6f
arg : allow using -hf offline (#13202)
ngxson Apr 30, 2025
44cd8d9
feat(ggml-cpu): enable z17 compile (#13182)
taronaeo Apr 30, 2025
07c2e2f
convert : correct typo image_mean --> image_std (#13208)
ngxson Apr 30, 2025
4163137
ggml : fix ppc64le build (#13176)
shalinib-ibm Apr 30, 2025
e5007a5
vulkan: use uint array index to avoid glslang bug (#13193)
jeffbolznv Apr 30, 2025
3b127c7
common : add -jf / --json-schema-file flag (#12011)
ochafik Apr 30, 2025
ceda28e
llava : remove duplicate include (#13207)
tattn Apr 30, 2025
3e168be
convert : improve model arch handling (#13122)
ngxson Apr 30, 2025
16a457f
fix typo: `n_ctx_pre_seq` -> `n_ctx_per_seq` (#13221)
ddh0 Apr 30, 2025
6f67cf1
arg : -hf do not fail if url mismatch (#13219)
ngxson Apr 30, 2025
e1e8e09
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
JohannesGaessler Apr 30, 2025
9998540
cuda : fix unused variable compile warning (whisper/0)
ggerganov Apr 24, 2025
4254bb4
ggml : fix ggml_gallocr_ptr type (ggml/1205)
slaren Apr 30, 2025
8d33d74
sync : ggml
ggerganov May 1, 2025
a70183e
llama-model : fix the reported size class for nomic-embed-text-v2-moe…
cebtenzzre May 1, 2025
13c9a33
arg : remove CURLINFO_EFFECTIVE_METHOD (#13228)
ngxson May 1, 2025
8936784
mtmd : add **vision** support for Mistral Small 3.1 (#13231)
ngxson May 1, 2025
b5769d9
ggml : suppress Windows compiler warnings (whisper/3075)
danbev Apr 29, 2025
99881f7
whisper : add check that target name exists (whisper/3103)
danbev May 1, 2025
b1dd4d0
sync : ggml
ggerganov May 1, 2025
b0ecbd4
test: non-cont. b in test-backend-ops -o MUL_MAT (#13187)
JohannesGaessler May 1, 2025
fc727bc
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul sha…
jeffbolznv May 1, 2025
79f26e9
vulkan: Add bfloat16 support (#12554)
jeffbolznv May 1, 2025
e0f572c
llama-chat : update GLM4 chat template (#13238)
matteoserva May 1, 2025
b6e4ff6
clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP…
lcarrere May 1, 2025
d7a14c4
build : fix build info on windows (#13239)
slaren May 1, 2025
f057808
ggml: Don't assert fail when tensor data changes (#13222)
jessegross May 1, 2025
8efbdad
rpc : avoid uninitialized memory in serialize_tensor (#13210)
justinsb May 1, 2025
d24d592
ci: fix cross-compile sync issues (#12804)
bandoti May 1, 2025
dcf8860
convert : explicitly disable trust_remote_code for AutoConfig (#13246)
ngxson May 2, 2025
fab647e
server : add cache reuse card link to help (#13230)
ggerganov May 2, 2025
e84773a
mtmd-cli : fix out_of_range when input image path is empty (#13244)
ahmedshakill May 2, 2025
2af6880
llama-chat : reset glmedge chat template (#13253)
piDack May 2, 2025
626083f
llama : plamo rope type is neox (#13260)
CISC May 2, 2025
cb06a3c
llama : orion rope type is neox (#13261)
CISC May 2, 2025
c642bc0
kv-cache : separate recurrent vs non-recurrent impl (#12799)
ggerganov May 2, 2025
074e42a
convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (…
ngxson May 2, 2025
7d21234
convert : use correct context length for nomic-embed-text-v2 (#13216)
cebtenzzre May 2, 2025
2f56761
llama-model : support Qwen2 embedding models and pooling_mode_lasttok…
cebtenzzre May 2, 2025
3f3769b
ggml : Enable MMA for BF16 in llamafile_sgemm (#13148)
shalinib-ibm May 2, 2025
a75cb30
context : fix reorder logic (#13267)
ggerganov May 2, 2025
b344439
sync : ggml (#13268)
ggerganov May 2, 2025
1d36b36
llama : move end-user examples to tools directory (#13249)
slaren May 2, 2025
3bf785f
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)
ymcki May 3, 2025
36667c8
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…
ngxson May 3, 2025
3e959f0
imatrix: fix oob writes if src1 is not contiguous (#13286)
JohannesGaessler May 3, 2025
8ae5ebc
vulkan: Additional type support for unary, binary, and copy (#13266)
jeffbolznv May 4, 2025
8afbd96
CUDA: fix race condition in MMQ ids_dst (#13294)
JohannesGaessler May 4, 2025
93c4e23
CUDA: fix race condition in MMQ stream-k fixup (#13299)
JohannesGaessler May 4, 2025
9f2da58
llama : build windows releases with dl backends (#13220)
slaren May 4, 2025
86bd60d
llava/mtmd : fixes to fully support dl backends (#13303)
slaren May 4, 2025
6eb7d25
ggml : activate s390x simd for Q3_K (#13301)
taronaeo May 4, 2025
9fdfcda
rpc : use backend registry, support dl backends (#13304)
slaren May 4, 2025
27aa259
mtmd : add C public API (#13184)
ngxson May 4, 2025
66645a5
SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308)
qnixsynapse May 5, 2025
ae803bf
convert : bailingmoe : set yarn metadata if present (#13312)
CISC May 5, 2025
5215b91
clip : fix confused naming ffn_up and ffn_down (#13290)
ngxson May 5, 2025
9b61acf
mtmd : rename llava directory to mtmd (#13311)
ngxson May 5, 2025
b34c859
server : Webui - change setText command from parent window to also se…
igardev May 5, 2025
233461f
sampling : Integrate Top-nσ into main sampling chain (and add it to t…
oobabooga May 5, 2025
9070365
CUDA: fix logic for clearing padding with -ngl 0 (#13320)
JohannesGaessler May 5, 2025
a7366fa
gguf-py : avoid requiring pyside6 for other scripts (#13036)
compilade May 6, 2025
15a28ec
CUDA: fix --split-mode row for MMQ (#13323)
JohannesGaessler May 6, 2025
764b856
convert : qwen2/3moe : set yarn metadata if present (#13331)
CISC May 6, 2025
2356fb1
CUDA: fix bad asserts for partial offload (#13337)
JohannesGaessler May 6, 2025
2f54e34
llama : fix build_ffn without gate (#13336)
ngxson May 6, 2025
1e333d5
SYCL: Disable reorder optimize by default and stop setting tensor ext…
qnixsynapse May 6, 2025
f4ed10b
cmake : remove arm64 msvc presets (#13342)
slaren May 6, 2025
91a86a6
sampling : don't consider -infinity values in top_n_sigma (#13344)
oobabooga May 6, 2025
ffc7272
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)
DocShotgun May 6, 2025
32916a4
clip : refactor graph builder (#13321)
ngxson May 6, 2025
141a908
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)
JohannesGaessler May 6, 2025
6c7fd67
llama : support tie embedding for chatglm models (#13328)
piDack May 7, 2025
4773d7a
examples : remove infill (#13283)
ggerganov May 7, 2025
1f73301
cuda : remove nrows_x in mul_mat_q_process_tile (#13325)
yeahdongcn May 7, 2025
39e73ae
common : Add a warning when we can't match samplers from a string or …
ycros May 7, 2025
bc4e112
llama : deci : support ffn-free with attention (#13296)
CISC May 7, 2025
bba9d94
cmake : removed stdc++fs (whisper/3097)
JaredTweed May 2, 2025
13b0a04
whisper: remove MSVC warnings pragmas (whisper/3090)
danbev May 5, 2025
d879433
sync : ggml
ggerganov May 7, 2025
814f795
docker : disable arm64 and intel images (#13356)
slaren May 7, 2025
8733e0c
sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343)
Alcpz May 8, 2025
f061021
llama : print size and type of overridden tensors (#13364)
slaren May 8, 2025
70a6991
ci : move release workflow to a separate file (#13362)
slaren May 8, 2025
51fb96b
context : remove logits_all flag (#13284)
ggerganov May 8, 2025
6562e5a
context : allow cache-less context for embeddings (#13108)
ggerganov May 8, 2025
0ccc121
mtmd : fix the calculation of n_tokens for smolvlm (#13381)
awkrail May 8, 2025
1a844be
convert : support rope_scaling type and rope_type (#13349)
CISC May 8, 2025
8c83449
server : (webui) revamp the input area, plus many small UI improvemen…
ngxson May 8, 2025
ee01d71
server : (webui) fix a very small misalignment (#13387)
ngxson May 8, 2025
f05a6d7
mtmd : Expose helper_decode_image_chunk (#13366)
mattjcly May 8, 2025
15e0328
ci : limit write permission to only the release step + fixes (#13392)
slaren May 8, 2025
d9c4acc
server : (webui) rename has_multimodal --> modalities (#13393)
ngxson May 9, 2025
02115dc
vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326)
jeffbolznv May 9, 2025
b486ba0
rpc : add rpc_msg_set_tensor_hash_req (#13353)
rgerganov May 9, 2025
3f96aef
llama : one-off chat template fix for Mistral-Small-2503 (#13398)
ngxson May 9, 2025
2189fd3
mtmd : fix batch_view for m-rope (#13397)
ngxson May 9, 2025
0527771
llama-run: add support for downloading models from ModelScope (#13370)
yeahdongcn May 9, 2025
efb8b47
imatrix : Add --parse-special for enabling parsing of special tokens …
bartowski1182 May 9, 2025
5c86c9e
CUDA: fix crash on large batch size for MoE models (#13384)
JohannesGaessler May 9, 2025
27ebfca
llama : do not crash if there is no CPU backend (#13395)
slaren May 9, 2025
0cf6725
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
JohannesGaessler May 9, 2025
611aa91
metal : optimize MoE for large batches (#13388)
ggerganov May 9, 2025
17512a9
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
Alcpz May 9, 2025
33eff40
server : vision support via libmtmd (#12898)
ngxson May 9, 2025
7c28a74
chore(llguidance): use tagged version that does not break the build (…
HRKings May 9, 2025
dc1d2ad
vulkan: scalar flash attention implementation (#13324)
jeffbolznv May 10, 2025
7fef117
arg : add env var to control mmproj (#13416)
ngxson May 10, 2025
d891942
CUDA: fix FlashAttention on Turing (#13415)
JohannesGaessler May 10, 2025
053367d
mtmd : support InternVL 2.5 and 3 (#13422)
ngxson May 10, 2025
b064a51
ci: free_disk_space flag enabled for intel variant (#13426)
Thammachart May 10, 2025
43dfd74
llguidance : set tokenizer slices to default (#13424)
CISC May 10, 2025
3b24d26
server : update docs (#13432)
ngxson May 10, 2025
15e6125
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#1…
ngxson May 10, 2025
d2a4ef0
vocab : add ByteDance-Seed/Seed-Coder (#13423)
CISC May 10, 2025
0208355
CUDA: fix race conditions FlashAttention kernels (#13438)
JohannesGaessler May 10, 2025
62d4250
docs : Fix typo in InternVL3 model name (#13440)
99991 May 10, 2025
a634d75
mtmd : move helpers to dedicated file (#13442)
ngxson May 11, 2025
3eac209
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
city96 May 11, 2025
7f323a5
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like lla…
hjc4869 May 11, 2025
7474e00
CUDA: fix crash with partial offloading of MoE (#13439)
JohannesGaessler May 11, 2025
0923237
scripts : exit compare-llama-bench.py gracefully when there's nothing…
CISC May 11, 2025
9a390c4
tools : fix uninitialized llama_batch in server (#13436)
aumfer May 11, 2025
c104023
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
city96 May 11, 2025
1449214
enable dpcpp nightly builds with libraries (#13406)
AD2605 May 12, 2025
df84919
ggml : add mrope kernel for metal (#13457)
ngxson May 12, 2025
95e1888
CUDA: fix misaligned synchronization in FA (#13469)
JohannesGaessler May 12, 2025
a71a407
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
eddnjjn May 12, 2025
22cdab3
llama-bench : accept ranges for integer parameters (#13410)
slaren May 12, 2025
91159ee
server : allow content to be null in oaicompat_completion_params_pars…
anudit May 12, 2025
064cc59
context : fix state io for memory-less contexts (#13470)
ggerganov May 12, 2025
10d2af0
llama/ggml: add LLM training support (#10544)
JohannesGaessler May 12, 2025
de4c07f
clip : cap max image size 1024 for qwen vl model (#13478)
ngxson May 12, 2025
f0d46ef
opencl: remove unnecessary assert for `add` (#13257)
lhez May 12, 2025
cf0a43b
llama-bench : add defrag-thold, check for invalid ranges (#13487)
slaren May 12, 2025
1e2809b
sync : ggml
ggerganov May 13, 2025
d590cd4
model : Granite MoE shared (#13269)
gabe-l-hart May 13, 2025
bf79371
scripts : support arbitrary input file formats in compare-llama-bench…
CISC May 13, 2025
b472634
mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change)…
ngxson May 13, 2025
b89d605
batched-bench : fix pp batch contents (#13492)
ggerganov May 13, 2025
4f711af
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)
eddnjjn May 13, 2025
c252e0c
metal : optimize multi-sequence FA vec kernel (#13493)
ggerganov May 13, 2025
f0995d2
metal : use FA-vec kernel up to batch size 20 (#13496)
ggerganov May 13, 2025
71bdbdb
clip : clip.h become private API (⚠️ breaking change) (#13510)
ngxson May 13, 2025
e5c834f
quantize : improve tensor-type pattern matching (#13033)
EAddario May 13, 2025
ab3971f
vulkan: workaround FA compile failures on macos (#13517)
jeffbolznv May 14, 2025
be1d4a1
scripts : fix compare-llama-bench.py show parameter (#13514)
CISC May 14, 2025
21ca987
docs: Update link to ggml-org in multimodal.md (#13513)
ddpasa May 14, 2025
d486dd3
webui: Allow pasting file from clipboard (#13526)
luca020400 May 14, 2025
bb1681f
webui : use fflate for more deterministic gzip compress (#13525)
ngxson May 14, 2025
24e86ca
vulkan: KHR_coopmat flash attention (#13506)
jeffbolznv May 14, 2025
09d13d9
cmake: simplify vulkan shader test logic (#13263)
bandoti May 14, 2025
360a9c9
server : fix cache_tokens bug with no cache_prompt (#13533)
ngxson May 14, 2025
0531744
server : passthrough the /models endpoint during loading (#13535)
ggerganov May 14, 2025
5e7d95e
fix: Move build_inp_pos to the top of the graph section for build_gra…
gabe-l-hart May 14, 2025
6da34fa
CUDA: faster Deepseek FA, add Turing support (#13435)
JohannesGaessler May 14, 2025
b7d2672
llama : fix quantize with dl backends (#13539)
slaren May 14, 2025
4696d56
CUDA: fix crash on large batch size for quant. MoE (#13537)
JohannesGaessler May 14, 2025
017f10b
fix: crash when calling `llama_state_get_size` on a context without a…
giladgd May 14, 2025
f5170c1
editorconfig : fix trailing whitespace from #13542 (#13546)
CISC May 14, 2025
3198405
`common`: add partial regex support (#12808)
ochafik May 14, 2025
5ab5d5f
arm64: optimize q6_k_q8_k kernel with i8mm (#13519)
cyb70289 May 14, 2025
e3a9421
kv-cache : fix out-of-bounds view during reserve graph (#13547)
ggerganov May 14, 2025
aa48e37
`server`: inject date_string in llama 3.x template + fix date for fir…
ochafik May 15, 2025
b283804
bench : handle decode errors (#13548)
ggerganov May 15, 2025
c753d7b
server : proper error handling for missing elements in messages array…
pwilkin May 15, 2025
3cc1f1f
webui : handle PDF input (as text or image) + convert pasted long con…
ngxson May 15, 2025
6c8b915
llama-bench : fix -ot with dl backends (#13563)
slaren May 15, 2025
9c404ed
sycl: use oneDNN for matrices multiplication (#12972)
lslusarczyk May 15, 2025
64bb51c
sycl: reordered Q4_K MMVQ (#13109)
sgeor255 May 15, 2025
02cdd2d
sycl: simplify bin_bcast_kernel (#13383)
AD2605 May 15, 2025
c531edf
convert : fix conversion for llama 4 (#13567)
ngxson May 15, 2025
07ad2b6
gguf-py : fix disconnect-before-connect in editor-gui (#13569)
danielzgtg May 15, 2025
c6a2c9e
gguf : use ggml log system (#13571)
slaren May 15, 2025
bc098c3
minja: sync (qwen3) (#13573)
ochafik May 15, 2025
0a338ed
sycl : fixed compilation warnings (#13582)
lslusarczyk May 16, 2025
7c07ac2
ci : add ppc64el to build-linux-cross (#13575)
CISC May 16, 2025
5364ae4
llama : print hint when loading a model when no backends are loaded (…
slaren May 16, 2025
654a677
metal : add FA-vec kernel for head size 64 (#13583)
ggerganov May 16, 2025
415e40a
releases : use arm version of curl for arm releases (#13592)
slaren May 16, 2025
06c1e4a
readme : add list of dependencies and their license (#13591)
ngxson May 16, 2025
aea9f8b
webui : improve accessibility for visually impaired people (#13551)
ngxson May 16, 2025
6aa892e
server : do not return error out of context (with ctx shift disabled)…
ngxson May 16, 2025
3e0be1c
llguidance : official v0.7.20 release (no actual changes) [noci] (#13…
CoffeeVampir3 May 16, 2025
4f41ee1
vulkan: use scalar FA rather than coopmat2 when N==1 (#13554)
jeffbolznv May 17, 2025
2f5a4e1
vulkan: move common FA code to flash_attn_base.comp (#13556)
jeffbolznv May 17, 2025
518329b
parallel : add option for non-shared and larger prompts (#13598)
ggerganov May 17, 2025
e3a7cf6
cmake: use the current build config for vulkan-shaders-gen (#13595)
giladgd May 17, 2025
6a2bc8b
server : added --no-prefill-assistant flag (#13608)
isaac-mcfadyen May 17, 2025
33d7aed
CANN: Support MOE Model MUL_MAT_ID (#13042)
noemotiovon May 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ WORKDIR /app
COPY . .

RUN if [ "$TARGETARCH" = "amd64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
elif [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
else \
echo "Unsupported architecture"; \
exit 1; \
Expand Down
2 changes: 1 addition & 1 deletion .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ COPY . .
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
&& export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
echo "Building with dynamic libs" && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${OPT_SYCL_F16} && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${OPT_SYCL_F16} && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-cann.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH

RUN echo "Building with static libs" && \
source /usr/local/Ascend/ascend-toolkit/set_env.sh --force && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_BUILD_TESTS=OFF && \
cmake --build build --config Release --target llama-cli

# TODO: use image with NNRT
Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ COPY . .
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ WORKDIR /app
COPY . .

RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
&& cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib \
Expand Down
2 changes: 1 addition & 1 deletion .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 -DLLAMA_CURL=1 -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
8 changes: 4 additions & 4 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,23 @@ indent_style = tab
[prompts/*.txt]
insert_final_newline = unset

[examples/server/public/*]
[tools/server/public/*]
indent_size = 2

[examples/server/public/deps_*]
[tools/server/public/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset

[examples/server/deps_*]
[tools/server/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset

[examples/llama.swiftui/llama.swiftui.xcodeproj/*]
indent_style = tab

[examples/cvector-generator/*.txt]
[tools/cvector-generator/*.txt]
trim_trailing_whitespace = unset
insert_final_newline = unset

Expand Down
3 changes: 2 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
max-line-length = 125
ignore = E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503
exclude =
# Do not traverse examples
# Do not traverse examples and tools
examples,
tools,
# Do not include package initializers
__init__.py,
# No need to traverse our git directory
Expand Down
22 changes: 22 additions & 0 deletions .github/actions/get-tag-name/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: "Determine tag name"
description: "Determine the tag name to use for a release"
outputs:
name:
description: "The name of the tag"
value: ${{ steps.tag.outputs.name }}

runs:
using: "composite"
steps:
- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi
67 changes: 67 additions & 0 deletions .github/actions/windows-setup-cuda/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: "Windows - Setup CUDA Toolkit"
description: "Setup CUDA Toolkit for Windows"
inputs:
cuda_version:
description: "CUDA toolkit version"
required: true

runs:
using: "composite"
steps:
- name: Install Cuda Toolkit 11.7
if: ${{ inputs.cuda_version == '11.7' }}
shell: pwsh
run: |
mkdir -p "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7"
choco install unzip -y
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cudart/windows-x86_64/cuda_cudart-windows-x86_64-11.7.99-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/windows-x86_64/cuda_nvcc-windows-x86_64-11.7.99-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/windows-x86_64/cuda_nvrtc-windows-x86_64-11.7.99-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libcublas/windows-x86_64/libcublas-windows-x86_64-11.7.4.6-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvtx/windows-x86_64/cuda_nvtx-windows-x86_64-11.7.91-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/visual_studio_integration/windows-x86_64/visual_studio_integration-windows-x86_64-11.7.91-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvprof/windows-x86_64/cuda_nvprof-windows-x86_64-11.7.101-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cccl/windows-x86_64/cuda_cccl-windows-x86_64-11.7.91-archive.zip"
unzip '*.zip' -d "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7"
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_cudart-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvcc-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvrtc-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libcublas-windows-x86_64-11.7.4.6-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvtx-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\visual_studio_integration-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvprof-windows-x86_64-11.7.101-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_cccl-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
echo "CUDA_PATH_V11_7=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
- name: Install Cuda Toolkit 12.4
if: ${{ inputs.cuda_version == '12.4' }}
shell: pwsh
run: |
mkdir -p "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
choco install unzip -y
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cudart/windows-x86_64/cuda_cudart-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/windows-x86_64/cuda_nvcc-windows-x86_64-12.4.131-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/windows-x86_64/cuda_nvrtc-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libcublas/windows-x86_64/libcublas-windows-x86_64-12.4.5.8-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvtx/windows-x86_64/cuda_nvtx-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_profiler_api/windows-x86_64/cuda_profiler_api-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/visual_studio_integration/windows-x86_64/visual_studio_integration-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvprof/windows-x86_64/cuda_nvprof-windows-x86_64-12.4.127-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cccl/windows-x86_64/cuda_cccl-windows-x86_64-12.4.127-archive.zip"
unzip '*.zip' -d "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_cudart-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvcc-windows-x86_64-12.4.131-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvrtc-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\libcublas-windows-x86_64-12.4.5.8-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvtx-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_profiler_api-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\visual_studio_integration-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvprof-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_cccl-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\libnvvp" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
echo "CUDA_PATH_V12_4=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
7 changes: 6 additions & 1 deletion .github/actions/windows-setup-curl/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ inputs:
description: 'CURL version'
required: false
default: '8.6.0_6'
architecture:
description: 'Architecture of the libcurl to download'
required: false
default: 'win64'
outputs:
curl_path:
description: "Path to the downloaded libcurl"
Expand All @@ -18,8 +22,9 @@ runs:
shell: powershell
env:
CURL_VERSION: ${{ inputs.curl_version }}
ARCHITECTURE: ${{ inputs.architecture }}
run: |
curl.exe -o $env:RUNNER_TEMP/curl.zip -L "https://curl.se/windows/dl-${env:CURL_VERSION}/curl-${env:CURL_VERSION}-win64-mingw.zip"
curl.exe -o $env:RUNNER_TEMP/curl.zip -L "https://curl.se/windows/dl-${env:CURL_VERSION}/curl-${env:CURL_VERSION}-${env:ARCHITECTURE}-mingw.zip"
mkdir $env:RUNNER_TEMP/libcurl
tar.exe -xvf $env:RUNNER_TEMP/curl.zip --strip-components=1 -C $env:RUNNER_TEMP/libcurl
echo "curl_path=$env:RUNNER_TEMP/libcurl" >> $env:GITHUB_OUTPUT
6 changes: 4 additions & 2 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ build:
- CMakePresets.json
examples:
- changed-files:
- any-glob-to-any-file: examples/**
- any-glob-to-any-file:
- examples/**
- tools/**
devops:
- changed-files:
- any-glob-to-any-file:
Expand All @@ -70,7 +72,7 @@ android:
server:
- changed-files:
- any-glob-to-any-file:
- examples/server/**
- tools/server/**
ggml:
- changed-files:
- any-glob-to-any-file:
Expand Down
30 changes: 15 additions & 15 deletions .github/workflows/bench.yml.disabled
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ on:
push:
branches:
- master
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'tools/server/*.h*', 'tools/server/*.cpp']
pull_request_target:
types: [opened, synchronize, reopened]
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'tools/server/*.h*', 'tools/server/*.cpp']
schedule:
- cron: '04 2 * * *'

Expand Down Expand Up @@ -69,7 +69,7 @@ jobs:
- name: Install python env
id: pipenv
run: |
cd examples/server/bench
cd tools/server/bench
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Expand All @@ -79,7 +79,7 @@ jobs:
run: |
wget --quiet https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
tar xzf prometheus*.tar.gz --strip-components=1
./prometheus --config.file=examples/server/bench/prometheus.yml &
./prometheus --config.file=tools/server/bench/prometheus.yml &
while ! nc -z localhost 9090; do
sleep 0.1
done
Expand All @@ -92,7 +92,7 @@ jobs:
- name: Install k6 and xk6-sse
id: k6_installation
run: |
cd examples/server/bench
cd tools/server/bench
go install go.k6.io/xk6/cmd/xk6@latest
xk6 build master \
--with github.com/phymbert/xk6-sse
Expand All @@ -116,7 +116,7 @@ jobs:
- name: Download the dataset
id: download_dataset
run: |
cd examples/server/bench
cd tools/server/bench
wget --quiet https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

- name: Server bench
Expand All @@ -126,7 +126,7 @@ jobs:
run: |
set -eux

cd examples/server/bench
cd tools/server/bench
source venv/bin/activate
python bench.py \
--runner-label ${{ env.RUNNER_LABEL }} \
Expand Down Expand Up @@ -157,9 +157,9 @@ jobs:
name: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
compression-level: 9
path: |
examples/server/bench/*.jpg
examples/server/bench/*.json
examples/server/bench/*.log
tools/server/bench/*.jpg
tools/server/bench/*.json
tools/server/bench/*.log

- name: Commit status
uses: Sibz/github-status-action@v1
Expand All @@ -178,17 +178,17 @@ jobs:
with:
client_id: ${{secrets.IMGUR_CLIENT_ID}}
path: |
examples/server/bench/prompt_tokens_seconds.jpg
examples/server/bench/predicted_tokens_seconds.jpg
examples/server/bench/kv_cache_usage_ratio.jpg
examples/server/bench/requests_processing.jpg
tools/server/bench/prompt_tokens_seconds.jpg
tools/server/bench/predicted_tokens_seconds.jpg
tools/server/bench/kv_cache_usage_ratio.jpg
tools/server/bench/requests_processing.jpg

- name: Extract mermaid
id: set_mermaid
run: |
set -eux

cd examples/server/bench
cd tools/server/bench
PROMPT_TOKENS_SECONDS=$(cat prompt_tokens_seconds.mermaid)
echo "PROMPT_TOKENS_SECONDS<<EOF" >> $GITHUB_ENV
echo "$PROMPT_TOKENS_SECONDS" >> $GITHUB_ENV
Expand Down
Loading
Loading