Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
385 commits
Select commit Hold shift + click to select a range
3865cff
convert : fix null head_dim AutoConfig regression (#14248)
CISC Jun 18, 2025
9540255
llama-chat : fix multiple system message for gemma, orion (#14246)
ngxson Jun 18, 2025
413977d
mtmd : refactor llava-uhd preprocessing logic (#14247)
ngxson Jun 18, 2025
ef03580
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258)
chaxu01 Jun 18, 2025
6231c5c
ggml-cpu: fix uncaught underscore terminators (#14023)
taronaeo Jun 18, 2025
50d2227
ggml-cpu: reduce asm calls for hsum (#14037)
taronaeo Jun 18, 2025
8d94713
docs: add s390x build documentation (#14264)
taronaeo Jun 18, 2025
ed3290a
metal : add mean kernel (#14267)
ggerganov Jun 19, 2025
edc4a29
memory : Hybrid recurrent cache (#13979)
gabe-l-hart Jun 19, 2025
10bb545
Vulkan: Set device max size for host memory to avoid OOM warning and …
0cc4m Jun 19, 2025
faed5a5
llamafile : support s390x SIMD instruction set (#14273)
taronaeo Jun 19, 2025
5fc7856
convert : fix remote option in Windows (#14100)
pqnet Jun 19, 2025
fffcce5
llama-bench : add --no-warmup flag (#14224) (#14270)
s2010 Jun 19, 2025
600e3e9
sycl: Cleanup codepaths in Get Rows in sycl backend (#14215)
ShanoToni Jun 19, 2025
456af35
build : suppress gcc15 compile warnings (#14261)
fanyang89 Jun 19, 2025
d67341d
server : add server parameters for draft model cache type (#13782)
aa956 Jun 19, 2025
381174b
gguf-py : make sentencepiece optional (#14200)
Ahajha Jun 19, 2025
8f71d0f
ggml-cpu : remove unnecesary arm feature detection (#14281)
slaren Jun 19, 2025
9eaa51e
CUDA: add conv_2d_dw (#14265)
am17an Jun 20, 2025
4c9fdfb
ubatch : new splitting logic (#14217)
ggerganov Jun 20, 2025
812939a
model : more uniform output id handling (#14275)
ggerganov Jun 20, 2025
9230dbe
ggml: Update KleidiAI to v1.9.0 (#14277)
chaxu01 Jun 20, 2025
d27b3ca
ggml : fix repack work size for mul_mat_id (#14292)
ggerganov Jun 20, 2025
e28c1b9
cuda : synchronize graph capture and cublas handle destruction (#14288)
slaren Jun 20, 2025
88fc854
llama : improve sep token handling (#14272)
CISC Jun 20, 2025
6369be0
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)
ckastner Jun 20, 2025
8308f98
sycl: add usage of enqueue_functions extension (#14244)
s-Nick Jun 20, 2025
dd6e6d0
vocab : prevent tokenizer overflow (#14301)
retr0reg Jun 20, 2025
22015b2
lint : remove trailing whitepace (#14304)
CISC Jun 20, 2025
c959f46
CUDA: add conv_2d_transpose (#14287)
am17an Jun 20, 2025
d860dd9
docs : fix the link to llama.h (#14293)
david20571015 Jun 20, 2025
b714767
Add `ggml_roll` (ggml/1274)
Acly Jun 18, 2025
06cbedf
sync : ggml
ggerganov Jun 20, 2025
b23fa0b
convert : fix Llama 4 conversion (#14311)
danielhanchen Jun 21, 2025
692e3cd
memory : rename interface to llama_memory_context_i (#14296)
ggerganov Jun 21, 2025
67ae531
metal : fix thread-safety (#14300)
ggerganov Jun 21, 2025
58cba76
gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312)
CISC Jun 21, 2025
bb16041
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (…
mtavenrath Jun 21, 2025
aa0ef5c
gguf-py : fix Qwen3-Embedding eos token (#14314)
CISC Jun 21, 2025
aa064b2
CUDA: add mean operation (#14313)
am17an Jun 22, 2025
40bfa04
common : use std::string_view now that we target c++17 (#14319)
CISC Jun 22, 2025
5d5c066
mtmd : fix Pixtral OOM with large images by capping image_size to 102…
yuiseki Jun 22, 2025
af3373f
HIP: enable vec fattn on RDNA4 (#14323)
IMbackK Jun 22, 2025
f1f5e82
examples : fix is_first logic for tokenization (#14329)
ggerganov Jun 22, 2025
66aba7a
run : avoid double tokenization (#14327)
retr0reg Jun 22, 2025
238005c
gguf-py : fix SpecialVocab parsing when post_processor is null (#14330)
CISC Jun 22, 2025
fa4a9f2
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
EAddario Jun 22, 2025
3a9457d
vulkan: update windows SDK in CI (#14334)
jeffbolznv Jun 23, 2025
7b50d58
kv-cells : fix tracking of seq_pos (#14339)
ggerganov Jun 23, 2025
defe215
CUDA: mul_mat_v support for batch sizes > 1 (#14262)
JohannesGaessler Jun 23, 2025
72c6bc3
llama : better rwkv chat template and add missing `inputs.use_jinja` …
MollySophia Jun 23, 2025
bf2a99e
vulkan: update windows SDK in release.yml (#14344)
jeffbolznv Jun 23, 2025
ce82bd0
ci: add workflow for relocatable cmake package (#14346)
bandoti Jun 23, 2025
0142961
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324)
IMbackK Jun 23, 2025
901e20b
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349)
bartowski1182 Jun 24, 2025
abf2410
main : honor --verbose-prompt on interactive prompts (#14350)
CISC Jun 24, 2025
1b809ce
server : move no API key doc to /health (#14352)
pnb Jun 24, 2025
c148cf1
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#…
mbaudier Jun 24, 2025
62af464
batch : fix check for empty sequences in memory (#14364)
ggerganov Jun 24, 2025
73e53dc
opencl: ref count `ggml_backend_opencl_context` and refactor profilin…
lhez Jun 24, 2025
2bf9d53
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973)
ShanoToni Jun 25, 2025
b193d53
ggml : do not output unprintable characters on GGUF load failure (#14…
CISC Jun 25, 2025
60ef23d
ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)
taronaeo Jun 25, 2025
716301d
musa: enable fp16 mma (all) and cublas on qy2 (#13842)
yeahdongcn Jun 26, 2025
bf5bcd0
docs: update s390x documentation + add faq (#14389)
taronaeo Jun 26, 2025
5783ae4
metal : batch rows copy in a single threadgroup (#14384)
ggerganov Jun 26, 2025
e8215db
metal : add special-case mat-vec mul for ne00 == 4 (#14385)
ggerganov Jun 26, 2025
b253462
llama : return mistral-v7-tekken as default template only (#14390)
CISC Jun 26, 2025
a01047b
cmake: regen vulkan shaders when shaders-gen sources change (#14398)
bandoti Jun 26, 2025
8846aac
model : gemma3n text-only (#14400)
ngxson Jun 26, 2025
e4cba4b
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
87576fb
ggml-qnn: santiy check
jeffzhou2000 Feb 15, 2025
99bf835
ggml-qnn: update script build-run-android.sh to compare peformance of…
jeffzhou2000 Feb 16, 2025
00bb357
ggml-qnn: fix minor issue in test-backend-ops.cpp
jeffzhou2000 Feb 17, 2025
009e331
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
4eab401
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
9fcbe28
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
89df437
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
f0bbdae
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
bfcc330
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
79eee0f
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
dedeb29
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
d0287c5
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
91bc856
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
a4b3b04
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
212de15
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
fe25351
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
e25b68d
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
b2c8318
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
1b19cfa
ggml-qnn: fix a minior typo in internal doc
jeffzhou2000 Feb 23, 2025
f620c83
ggml-qnn: refine function ggml_qnn_create_general_tensor() to avoid c…
jeffzhou2000 Feb 23, 2025
db76f92
ggml-qnn: fix a minor typo in source code
jeffzhou2000 Feb 24, 2025
7fe0909
build: avoid ggml-qnn backend breaking other backend's builds
jeffzhou2000 Feb 24, 2025
164d37f
ggml-qnn: remove redundant codes to make PR reviewers happy
jeffzhou2000 Feb 25, 2025
aea68b8
ggml-qnn: refine code format
jeffzhou2000 Feb 25, 2025
cb72c9e
ggml-qnn: offload quantized type mulmat to QNN backend
jeffzhou2000 Feb 26, 2025
785f7fc
ggml-qnn: refine source code structure to make code more clearly
jeffzhou2000 Feb 27, 2025
ee76679
ggml-qnn: enable release build with necessary logs to make reviewers …
jeffzhou2000 Feb 27, 2025
9356fa5
ggml-qnn: enable all quantize type with 2d mulmat
jeffzhou2000 Feb 27, 2025
dadacdf
ggml-qnn: enable log output of GGMLQNN_LOG_INFO in command line mode …
jeffzhou2000 Feb 28, 2025
47a31c6
ggml-qnn: Windows port --- step2
jeffzhou2000 Feb 28, 2025
66c33f6
ggml-qnn: merge UT code and corresponding script from local dev branc…
jeffzhou2000 Mar 2, 2025
3b3fd99
ggml-qnn: merge ggml_qnn_mul_mat_4d from local dev branch to make wor…
jeffzhou2000 Mar 2, 2025
5fb3236
ggml-qnn: submit AI-assisted ggml_qnn_mul_mat_4d(not worked currently…
jeffzhou2000 Mar 2, 2025
afc816e
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step2
jeffzhou2000 Mar 2, 2025
0a7da4a
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step3
jeffzhou2000 Mar 2, 2025
28e973e
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step4
jeffzhou2000 Mar 2, 2025
4ce4bda
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step5
jeffzhou2000 Mar 2, 2025
e4d6260
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step6
jeffzhou2000 Mar 2, 2025
ea9ade1
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step7
jeffzhou2000 Mar 2, 2025
c951e84
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step8
jeffzhou2000 Mar 2, 2025
755dc04
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- good in step9
jeffzhou2000 Mar 2, 2025
75700d4
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
ff33c24
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step10
jeffzhou2000 Mar 2, 2025
deff331
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
974ec64
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step11
jeffzhou2000 Mar 2, 2025
dd662de
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- both ok in st…
jeffzhou2000 Mar 2, 2025
7f19f64
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 ---finalizing ver…
jeffzhou2000 Mar 2, 2025
d2d0a47
ggml-qnn: refine ggml_qnn_mul_mat and ggml_qnn_general_node according…
jeffzhou2000 Mar 2, 2025
bd4b603
ggml-qnn: remove no-needed comments
jeffzhou2000 Mar 2, 2025
536ff65
ggml-qnn: Windows port --- step3
jeffzhou2000 Mar 3, 2025
c43dad4
ggml-qnn: remove un-needed function
jeffzhou2000 Mar 4, 2025
d58515f
ggml-qnn:rebase to upstream
jeffzhou2000 Mar 4, 2025
aa44c5b
ggml-qnn: fix a minior issue during rebase to upstream
jeffzhou2000 Mar 4, 2025
9d922ae
ggml-qnn: update script according to https://github.com/ggml-org/llam…
jeffzhou2000 Mar 4, 2025
6f9747e
ggml-qnn: fix a minior issue in ggmlqnn_create_general_tensor()
jeffzhou2000 Mar 4, 2025
7c10d6a
ggml-qnn: active member variable _device_id in class qnn_instance
jeffzhou2000 Mar 4, 2025
c2c5d6b
ggml-qnn: refine ggml_qnn_general_node and ggml_qnn_mul_mat to make c…
jeffzhou2000 Mar 4, 2025
3ac6cc4
ggml-qnn: Windows port --- step4
jeffzhou2000 Mar 6, 2025
cea523c
ggml-qnn: Windows port -- step5
jeffzhou2000 Mar 7, 2025
78a6e21
ggml-qnn: WoA(Windows on ARM) -- step6
jeffzhou2000 Mar 8, 2025
52583f3
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 9, 2025
62d74b6
ggml-qnn: pr to upstream
jeffzhou2000 Mar 11, 2025
d1009cf
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 18, 2025
2a04d7b
ggml-qnn: self code-review
jeffzhou2000 Mar 18, 2025
9cb36a3
ggml-qnn: rebase upstream
jeffzhou2000 Mar 19, 2025
bead0c8
ggml-qnn: add approach through Hexagon cDSP
jeffzhou2000 Mar 22, 2025
532e9b7
ggml-qnn: refine general approach through Hexagon cDSP
jeffzhou2000 Mar 23, 2025
1be08cb
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
8bf5f63
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
1ebc959
ggml-qnn: add build script for libggmlop_skel.so
jeffzhou2000 Mar 24, 2025
68709ba
ggml-qnn: remove redundant functions in this PR and make codes more c…
jeffzhou2000 Mar 25, 2025
429540a
ggml-qnn: original ggml_compute_forward_add and ggml_compute_forward_…
jeffzhou2000 Mar 25, 2025
f6aa641
ggml-qnn: modify build-run-android.sh to verify mulmat and validate m…
jeffzhou2000 Mar 25, 2025
d35440e
ggml-qnn: make host code(ggml-qnn.cpp) more clear and more stable
jeffzhou2000 Mar 26, 2025
6b84d83
ggml-qnn: refine code according to self code-review and make code mor…
jeffzhou2000 Mar 26, 2025
2563229
ggml-qnn: offload more ggml op to Hexagon cDSP
jeffzhou2000 Mar 27, 2025
174879a
ggml-hexagon: code on AP(arm-cpu) side is stable now
jeffzhou2000 Mar 28, 2025
351d44c
ggml-hexagon: optimize GGML_OP_ADD on cDSP side
jeffzhou2000 Mar 28, 2025
ba8d4bb
ggml-hexagon: simplify hexagon-kernel build logic in CMakeLists.txt
jeffzhou2000 Mar 29, 2025
6b02bb9
ggml-hexagon: release ggml-hexagon v0.98
jeffzhou2000 Mar 29, 2025
2c509b1
ggml-hexagon: release ggml-hexagon v0.99
jeffzhou2000 Mar 29, 2025
6874a95
ggml-hexagon: try to offload q6_k mulmat to cDSP
jeffzhou2000 Mar 29, 2025
b826ffc
ggml-hexagon: fix minior issue in ggml-hexagon.cpp after self code-re…
jeffzhou2000 Mar 29, 2025
e359c78
ggml-hexagon: check validation of ggml-hexagon.cfg before create appr…
jeffzhou2000 Mar 30, 2025
c715c2a
ggml-hexagon: fix all compiler warnings in ggml-hexagon.cpp
jeffzhou2000 Mar 30, 2025
db92dbc
ggml-hexagon: enable only one backend device for HWACCEL_CDSP and ena…
jeffzhou2000 Mar 31, 2025
3522c7c
ggml-hexagon: rpc ion memory pool and test-backend-ops works fine in …
jeffzhou2000 Mar 31, 2025
513e75b
ggml-hexagon: make comprision of mulmat performance between HWACCEL_Q…
jeffzhou2000 Mar 31, 2025
b42f350
ggml-hexagon: release ggml-hexagon v1.00
jeffzhou2000 Mar 31, 2025
27b97ff
ggml-hexagon: rebase to upstream
jeffzhou2000 Apr 1, 2025
1fb2866
ggml-hexagon: check configuration of enable_rpc_dma_mempool in functi…
jeffzhou2000 Apr 1, 2025
0d44865
ggml-hexagon: uniform rpc_ion_memsize and rpc_ion_usage between HWACC…
jeffzhou2000 Apr 1, 2025
a04dd94
ggml-hexagon: make buffer mechanism more clear in HWACCEL_CDSP approach
jeffzhou2000 Apr 1, 2025
7fd494e
ggml-hexagon: add perf function in hexagon kernerls on cDSP side
jeffzhou2000 Apr 2, 2025
9189c1f
ggml-hexagon: fix a stupid issue of why set rpc latency failure and i…
jeffzhou2000 Apr 2, 2025
b46e759
ggml-hexagon: make helper function ggmlhexagon_get_timestring() threa…
jeffzhou2000 Apr 2, 2025
6b1185d
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
ade8c06
ggml-hexagon: list all known todo and fixme tasks in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
04c6952
ggml-hexagon: fix units MB -> MiB
jeffzhou2000 Apr 2, 2025
5f57b58
ggml-hexagon: try to make ggml-hexagon backend works fine in a standa…
jeffzhou2000 Apr 3, 2025
bfd46f1
ggml-hexagon: remove reduament code and make debug log more clear
jeffzhou2000 Apr 3, 2025
1441fc8
ggml-hexagon: add gemma-3-4b-it-Q8_0.gguf to verify q8_0 mulmat on cDSP
jeffzhou2000 Apr 3, 2025
b661c68
ggml-hexagon:add skeleton code of offload GGML_OP_SOFT_MAX/GGML_OP_RM…
jeffzhou2000 Apr 3, 2025
a0944ba
ggml-hexagon: release ggml-dsp v0.60 on cDSP side
jeffzhou2000 Apr 4, 2025
a1ceab3
ggml-hexagon: merge build logic in kernels/Makefile to ggml-hexagon/C…
jeffzhou2000 Apr 5, 2025
f3e67b0
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 5, 2025
3ce5299
ggml-hexagon: uniform NDEBUG usage in ggml-hexagon.cpp and ggml-dsp.c
jeffzhou2000 Apr 6, 2025
45f7a34
ggml-hexagon: add profiler feature for purpose of visualize NPU perfo…
jeffzhou2000 Apr 7, 2025
5563c36
ggml-hexagon: remove so-called dma memory pool to avoid confusion and…
jeffzhou2000 Apr 8, 2025
a2f197b
ggml-hexagon: make function ggmlhexagon_init_rpcmempool in ggml-hexag…
jeffzhou2000 Apr 8, 2025
e41e567
ggml-hexagon: fix potential resource leak in class hexagon_profiler
jeffzhou2000 Apr 8, 2025
56a3b99
ggml-hexagon: enable multi-threading feature on cDSP side
jeffzhou2000 Apr 8, 2025
3241796
ggml-hexagon: upgrade QNN SDK to v2.33.0.250327
jeffzhou2000 Apr 9, 2025
beebb2e
ggml-hexagon: fix typo in ggml-hexagon.cpp
jeffzhou2000 Apr 9, 2025
40fa43d
ggml-dsp: probe QuRT RTOS information in function ggmlop_dsp_open
jeffzhou2000 Apr 9, 2025
7a04fed
ggml-hexagon: setting enable_rpc_ion_mempool to 1 and make test-backe…
jeffzhou2000 Apr 10, 2025
8d99180
ggml-hexagon: check whether user's specified htp arch is valid in CMa…
jeffzhou2000 Apr 10, 2025
5256171
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 11, 2025
3892766
ggml-hexagon: refine pinned-memory feature
jeffzhou2000 Apr 11, 2025
7e8a117
ggml-hexagon: refine build system in ggml-hexagon
jeffzhou2000 Apr 11, 2025
296529e
ggml-hexagon: remove redundant code in struct ggml_backend_hexagon_bu…
jeffzhou2000 Apr 11, 2025
38e529e
ggml-hexagon: upgrade Android NDK to android-ndk-r28
jeffzhou2000 Apr 11, 2025
97f9a35
ggml-dsp: split ggml-dsp.c into multiple files and cleanup
jeffzhou2000 Apr 11, 2025
88360a3
ggml-dsp: refine ggml-dsp and make ggml-dsp more clear
jeffzhou2000 Apr 12, 2025
f2d8995
ggml-hexagon: fix a minior issue in dev ops
jeffzhou2000 Apr 12, 2025
a7399e7
ggml-hexagon: fix a build issue in CI
jeffzhou2000 Apr 12, 2025
9aeac58
ggml-dsp: cleanup code
jeffzhou2000 Apr 15, 2025
0f0ced8
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 15, 2025
d9ce4de
ggml-dsp: cleanup code
jeffzhou2000 Apr 16, 2025
31ec84e
ggml-dsp:refine ggmlhexagon_dsp_add_f32
jeffzhou2000 Apr 16, 2025
fc91098
ggml-dsp: refine logic of thread_counts
jeffzhou2000 Apr 17, 2025
26824ae
ggml-hexagon: release v1.06 and ready for code review
jeffzhou2000 Apr 17, 2025
0ce9336
ggml-dsp: make GGML_OP_ADD more faster on cDSP side
jeffzhou2000 Apr 19, 2025
16851d9
ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…
jeffzhou2000 Apr 24, 2025
a792a07
sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…
jeffzhou2000 Apr 29, 2025
1c46c71
sync with upstream
jeffzhou2000 May 7, 2025
e2aa242
sync with upstream
jeffzhou2000 May 10, 2025
03a3d40
ggml-hexagon: upgrade QNN SDK to v2.34.0.250424
jeffzhou2000 May 11, 2025
3fe5493
sync with upstream
jeffzhou2000 May 16, 2025
b55ced5
ggml-hexagon: sync from project kantv(fix a long-term issue which int…
jeffzhou2000 May 17, 2025
d8497ae
ggml-hexagon: sync with upstream llama.cpp
jeffzhou2000 May 23, 2025
71b797b
build: enable self-contained-build to simplify workflow
jeffzhou2000 May 23, 2025
a8dcfb5
sync with upstream
jeffzhou2000 May 23, 2025
e9c1316
add prebuilt binary libggmlop-skel.so
jeffzhou2000 May 31, 2025
e557620
refine ggml-hexagon.cfg for the prebuilt binary libggmlop-skel.so
jeffzhou2000 May 31, 2025
c9ecae0
refine scripts to avoid confusion
jeffzhou2000 Jun 1, 2025
2168193
ggml-hexagon: add set_hexagon_cfg(int new_hexagon_backend, int new_hw…
jeffzhou2000 Jun 3, 2025
346b571
project: rename libggmlop-skel.so to libggmldsp-skel.so and add ggmlh…
jeffzhou2000 Jun 7, 2025
f58aeff
project: release libggmldsp-skel.so v0.97
jeffzhou2000 Jun 9, 2025
a379995
ggml-hexagon: upgrade QNN SDK to v2.35.0.250530
jeffzhou2000 Jun 10, 2025
305351c
project: fix typo and build issue
jeffzhou2000 Jun 10, 2025
eeef21c
ggmlhexagon-benchmark: add running timestamp and enable ggmlhexagon-b…
jeffzhou2000 Jun 10, 2025
1099a08
ggml-hexagon: update ggml-hexagon.cpp to v1.11 and refine related cod…
jeffzhou2000 Jun 12, 2025
a5a1548
llama-bench: add running timestamp to analysis regression issue in ll…
jeffzhou2000 Jun 13, 2025
6fb8d8f
project: add prebuilt LLM models for compare inference peformance bet…
jeffzhou2000 Jun 14, 2025
06fd699
script: refine scripts/build-run-android.sh
jeffzhou2000 Jun 14, 2025
3ec9341
project: sync with upstream
jeffzhou2000 Jun 16, 2025
ba62457
troubleshooting: add ggml-20250531 to troubleshooting performance reg…
jeffzhou2000 Jun 16, 2025
595f8e7
script: simplify workflow
jeffzhou2000 Jun 16, 2025
a136a84
project: sync with upstream
jeffzhou2000 Jun 16, 2025
97d22f7
project: sync with upstream
jeffzhou2000 Jun 16, 2025
7ce85d3
project: sync with upstream
jeffzhou2000 Jun 17, 2025
07687ba
project: add prebuilt LLM model t5-277M-F32.gguf for compare inferenc…
jeffzhou2000 Jun 18, 2025
8edc5ce
script: refine scripts/build-run-android.sh
jeffzhou2000 Jun 18, 2025
75a37a2
project: adapt to thread safety test in upstream
jeffzhou2000 Jun 18, 2025
c7c5797
project: remove unused ggml-20250531 which added for troubleshooting …
jeffzhou2000 Jun 18, 2025
047b200
ggml-hexagon: fix issue which introduced by test-thread-safety in the…
jeffzhou2000 Jun 18, 2025
f5a892a
project: add codes for developers/experts's effort on cDSP side
jeffzhou2000 Jun 19, 2025
40502b6
build: refine script for developers/experts's effort on cDSP side
jeffzhou2000 Jun 19, 2025
775fda0
script: fix a minor issue in scripts/build-run-android.sh
jeffzhou2000 Jun 19, 2025
477c0a3
script: refine script according to https://github.com/quic/ai-hub-app…
jeffzhou2000 Jun 19, 2025
6051578
ggml-hexagon: add mulmat_algotype for further usage
jeffzhou2000 Jun 20, 2025
fdd0e75
project: sync with upstream
jeffzhou2000 Jun 23, 2025
8525353
ggml-dsp: fix typo
jeffzhou2000 Jun 23, 2025
3c72168
project: release libggmldsp-skel.so v0.98
jeffzhou2000 Jun 25, 2025
8f69861
project: sync with upstream
jeffzhou2000 Jun 26, 2025
26f96be
project: sync with upstream
jeffzhou2000 Jun 26, 2025
8c6db00
project: sync with upstream
jeffzhou2000 Jun 27, 2025
0956c3a
test: verify Google gemma-3n on Android phone
jeffzhou2000 Jun 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
30 changes: 17 additions & 13 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,23 @@ COPY --from=build /app/full /app

WORKDIR /app

RUN apt-get update \
&& apt-get install -y \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

RUN apt-get update && \
apt-get install -y \
git \
python3 \
python3-pip \
python3-venv && \
python3 -m venv /opt/venv && \
. /opt/venv/bin/activate && \
pip install --upgrade pip setuptools wheel && \
pip install -r requirements.txt && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENV PATH="/opt/venv/bin:$PATH"

ENTRYPOINT ["/app/tools.sh"]

Expand Down
7 changes: 7 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,10 @@ nix:
embedding:
- changed-files:
- any-glob-to-any-file: examples/embedding/

Ascend NPU:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-cann.h
- ggml/src/ggml-cann/**
- docs/backend/CANN.md
51 changes: 51 additions & 0 deletions .github/workflows/build-cmake-pkg.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Build relocatable cmake package
on:
workflow_dispatch:
workflow_call:

jobs:
linux:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install dependencies
run: |
sudo apt update
sudo apt install -y build-essential tcl
- name: Build
run: |
PREFIX="$(pwd)"/inst
cmake -S . -B build -DCMAKE_PREFIX_PATH="$PREFIX" \
-DLLAMA_CURL=OFF -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_TOOLS=OFF \
-DLLAMA_BUILD_EXAMPLES=OFF -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --prefix "$PREFIX" --config Release
export LLAMA_CONFIG="$PREFIX"/lib/cmake/llama/llama-config.cmake
tclsh <<'EOF'
set build(commit) [string trim [exec git rev-parse --short HEAD]]
set build(number) [string trim [exec git rev-list --count HEAD]]
set build(version) "0.0.$build(number)"
set llamaconfig [read [open "$env(LLAMA_CONFIG)" r]]
set checks [list "set\\(LLAMA_VERSION \\s+$build(version)\\)" \
"set\\(LLAMA_BUILD_COMMIT\\s+$build(commit)\\)" \
"set\\(LLAMA_BUILD_NUMBER\\s+$build(number)\\)"]
puts -nonewline "Checking llama-config.cmake version... "
foreach check $checks {
if {![regexp -expanded -- $check $llamaconfig]} {
puts "\"$check\" failed!"
exit 1
}
}
puts "success."
EOF
cd examples/simple-cmake-pkg
cmake -S . -B build -DCMAKE_PREFIX_PATH="$PREFIX"/lib/cmake
cmake --build build
113 changes: 113 additions & 0 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,116 @@ jobs:
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

debian-13-loongarch64-cpu-cross:
runs-on: ubuntu-24.04
container: debian@sha256:653dfb9f86c3782e8369d5f7d29bb8faba1f4bff9025db46e807fa4c22903671

steps:
- uses: actions/checkout@v4
- name: Setup LoongArch
run: |
rm -f /etc/apt/sources.list.d/*
cat << EOF | tee /etc/apt/sources.list.d/debian-ports.list
deb http://snapshot.debian.org/archive/debian/20250515T202920Z/ trixie main
EOF
( echo 'quiet "true";'; \
echo 'APT::Get::Assume-Yes "true";'; \
echo 'APT::Install-Recommends "false";'; \
echo 'Acquire::Check-Valid-Until "false";'; \
echo 'Acquire::Retries "5";'; \
) > /etc/apt/apt.conf.d/99snapshot-repos

apt-get update
apt-get install -y ca-certificates debian-ports-archive-keyring cmake git zip
dpkg --add-architecture loong64

# Add arch-specific repositories for non-amd64 architectures
cat << EOF | tee /etc/apt/sources.list.d/loong64-ports.list
deb [arch=loong64] http://snapshot.debian.org/archive/debian-ports/20250515T194251Z/ sid main
EOF

apt-get update || true ;# Prevent failure due to missing URLs.

apt-get install -y --no-install-recommends \
build-essential \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu

- name: Build
run: |
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=loongarch64 \
-DCMAKE_C_COMPILER=loongarch64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=loongarch64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/loongarch64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

debian-13-loongarch64-vulkan-cross:
runs-on: ubuntu-24.04
container: debian@sha256:653dfb9f86c3782e8369d5f7d29bb8faba1f4bff9025db46e807fa4c22903671

steps:
- uses: actions/checkout@v4
- name: Setup LoongArch
run: |
rm -f /etc/apt/sources.list.d/*
cat << EOF | tee /etc/apt/sources.list.d/debian-ports.list
deb http://snapshot.debian.org/archive/debian/20250515T202920Z/ trixie main
EOF
( echo 'quiet "true";'; \
echo 'APT::Get::Assume-Yes "true";'; \
echo 'APT::Install-Recommends "false";'; \
echo 'Acquire::Check-Valid-Until "false";'; \
echo 'Acquire::Retries "5";'; \
) > /etc/apt/apt.conf.d/99snapshot-repos

apt-get update
apt-get install -y ca-certificates debian-ports-archive-keyring cmake git zip
dpkg --add-architecture loong64

# Add arch-specific repositories for non-amd64 architectures
cat << EOF | tee /etc/apt/sources.list.d/loong64-ports.list
deb [arch=loong64] http://snapshot.debian.org/archive/debian-ports/20250515T194251Z/ sid main
EOF

apt-get update || true ;# Prevent failure due to missing URLs.

apt-get install -y --no-install-recommends \
build-essential \
glslc \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu \
libvulkan-dev:loong64

- name: Build
run: |
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=loongarch64 \
-DCMAKE_C_COMPILER=loongarch64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=loongarch64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/loongarch64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)
60 changes: 49 additions & 11 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,43 @@ on:
push:
branches:
- master
paths: ['.github/workflows/build.yml', '.github/workflows/build-linux-cross.yml', '**/CMakeLists.txt', '**/.cmake', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
paths: [
'.github/workflows/build.yml',
'.github/workflows/build-linux-cross.yml',
'.github/workflows/build-cmake-pkg.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
'**/*.cu',
'**/*.cuh',
'**/*.swift',
'**/*.m',
'**/*.metal',
'**/*.comp'
]

pull_request:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/build.yml', '.github/workflows/build-linux-cross.yml', '**/CMakeLists.txt', '**/.cmake', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal', '**/*.comp']
paths: [
'.github/workflows/build.yml',
'.github/workflows/build-linux-cross.yml',
'.github/workflows/build-cmake-pkg.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
'**/*.cu',
'**/*.cuh',
'**/*.swift',
'**/*.m',
'**/*.metal',
'**/*.comp'
]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down Expand Up @@ -306,6 +339,7 @@ jobs:
id: cmake_test
run: |
cd build
export GGML_VK_VISIBLE_DEVICES=0
# This is using llvmpipe and runs slower than other backends
ctest -L main --verbose --timeout 3600

Expand Down Expand Up @@ -477,6 +511,9 @@ jobs:
build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

build-cmake-pkg:
uses: ./.github/workflows/build-cmake-pkg.yml

macOS-latest-cmake-ios:
runs-on: macos-latest

Expand Down Expand Up @@ -682,17 +719,17 @@ jobs:
env:
OPENBLAS_VERSION: 0.3.23
SDE_VERSION: 9.33.0-2024-01-07
VULKAN_VERSION: 1.4.309.0
VULKAN_VERSION: 1.4.313.2

strategy:
matrix:
include:
- build: 'cpu-x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF'
- build: 'cpu-x64 (static)'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF'
- build: 'openblas-x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
- build: 'vulkan-x64'
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON'
defines: '-DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON'
- build: 'llvm-arm64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON'
- build: 'llvm-arm64-opencl-adreno'
Expand Down Expand Up @@ -735,7 +772,7 @@ jobs:
id: get_vulkan
if: ${{ matrix.build == 'kompute-x64' || matrix.build == 'vulkan-x64' }}
run: |
curl.exe -o $env:RUNNER_TEMP/VulkanSDK-Installer.exe -L "https://sdk.lunarg.com/sdk/download/${env:VULKAN_VERSION}/windows/VulkanSDK-${env:VULKAN_VERSION}-Installer.exe"
curl.exe -o $env:RUNNER_TEMP/VulkanSDK-Installer.exe -L "https://sdk.lunarg.com/sdk/download/${env:VULKAN_VERSION}/windows/vulkansdk-windows-X64-${env:VULKAN_VERSION}.exe"
& "$env:RUNNER_TEMP\VulkanSDK-Installer.exe" --accept-licenses --default-answer --confirm-command install
Add-Content $env:GITHUB_ENV "VULKAN_SDK=C:\VulkanSDK\${env:VULKAN_VERSION}"
Add-Content $env:GITHUB_PATH "C:\VulkanSDK\${env:VULKAN_VERSION}\bin"
Expand Down Expand Up @@ -777,6 +814,7 @@ jobs:
cmake -S . -B build ${{ matrix.defines }} `
-DCURL_LIBRARY="$env:CURL_PATH/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:CURL_PATH/include"
cmake --build build --config Release -j ${env:NUMBER_OF_PROCESSORS}
cp $env:CURL_PATH/bin/libcurl-*.dll build/bin/Release

- name: Add libopenblas.dll
id: add_libopenblas_dll
Expand Down Expand Up @@ -839,12 +877,12 @@ jobs:
-DGGML_CUDA=ON
cmake --build build

windows-2019-cmake-cuda:
runs-on: windows-2019
windows-2022-cmake-cuda:
runs-on: windows-2022

strategy:
matrix:
cuda: ['12.4', '11.7']
cuda: ['12.4']

steps:
- name: Clone
Expand Down Expand Up @@ -878,7 +916,7 @@ jobs:
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
cmake -S . -B build -G "Ninja Multi-Config" ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_NATIVE=OFF ^
Expand Down
Loading
Loading