Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
289 commits
Select commit Hold shift + click to select a range
cc74d5b
server : pad small embedding batches (#13692)
ggerganov May 22, 2025
ab86335
common: Include torch package for s390x (#13699)
taronaeo May 22, 2025
797990c
mtmd : add ultravox audio input (#13623)
ngxson May 22, 2025
8a1d206
tts : fix n_ubatch + make WavTokenizer cache-less (#13713)
ggerganov May 22, 2025
3079e9a
release : fix windows hip release (#13707)
slaren May 22, 2025
a127ff1
use LOG_WARN to replace `std::cerr` (#13657)
foldl May 23, 2025
c10ed6c
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't s…
jeffbolznv May 23, 2025
1dcd019
vulkan: support CPY from any type to itself (#13695)
jeffbolznv May 23, 2025
e16c473
ggml : fix the order of ggml_unary_op (#13718)
ngxson May 23, 2025
faaaff5
CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705)
noemotiovon May 23, 2025
9ecf3e6
server : support audio input (#13714)
ngxson May 23, 2025
8a2afb7
llama : allow custom list of swa_layers (#13726)
ngxson May 23, 2025
d13d0f6
hparams : initialize arrays (#13728)
ggerganov May 23, 2025
a70a8a6
ci : add winget package updater (#13732)
slaren May 23, 2025
b775345
ci : enable winget package updates (#13734)
slaren May 23, 2025
ffd0eae
CUDA: fix race condition in FA vector kernels (#13742)
JohannesGaessler May 24, 2025
c3a2624
vocab : fix ugm tokenizer precision (#13743)
CISC May 24, 2025
4c32832
ggml : add ggml_gelu_erf() CUDA kernel (#13719)
ngxson May 24, 2025
259469c
Move GLM4 f32 attention fix to the correct function (#13750)
0cc4m May 24, 2025
2bd1b30
ggml-cpu : set openmp wait time if not set (#13758)
slaren May 24, 2025
17fc817
releases : enable openmp in windows cpu backend build (#13756)
slaren May 24, 2025
a2d02d5
releases : bundle llvm omp library in windows release (#13763)
slaren May 24, 2025
f5cd27b
`server`: streaming of tool calls and thoughts when `--jinja` is on (…
ochafik May 25, 2025
515fdbf
SYCL: revert "sycl: simplify bin_bcast_kernel (#13383)" (#13752)
qnixsynapse May 25, 2025
4032ca4
llama : add support for Qwen3 MoE tied word embeddings (#13768)
estibi May 25, 2025
d785f9c
server: fix/test add_generation_prompt (#13770)
ochafik May 25, 2025
a08c1d2
docs : add Moondream2 pre-quantized link (#13745)
ddpasa May 25, 2025
40aaa8a
mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760)
ngxson May 25, 2025
c508256
rpc : Fix build on OpenBSD (#13541)
percypiper May 25, 2025
de2ef53
kv-cache : rework kv_cell (#13706)
ggerganov May 25, 2025
aa50ba4
tests : improve UGM tokenizer test coverage (#13773)
CISC May 25, 2025
2f099b5
webui : bump max upload file size to 500MB (#13779)
ngxson May 25, 2025
e121edc
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…
ochafik May 25, 2025
2d38b6e
CANN: Add the basic supports of Flash Attention kernel (#13627)
shibizhao May 26, 2025
fef693d
vulkan: mark IM2COL as supporting non-contig (#13783)
jeffbolznv May 26, 2025
9012eb9
sycl: Add more debug prints (#13640)
Rbiessy May 26, 2025
2222931
llama : clarify deprecation message (#13794)
ggerganov May 26, 2025
79c137f
examples : allow extracting embeddings from decoder contexts (#13797)
ggerganov May 26, 2025
f13847c
server: fix regression on streamed non-chat completion w/ stops (#13785)
ochafik May 26, 2025
d74e94c
`server`: fix format of streamed tool call deltas (diff name, fix id …
ochafik May 26, 2025
88c125f
examples/training: Fix file name in README (#13803)
standby24x7 May 26, 2025
03f582a
server: fix streaming crashes (#13786)
ochafik May 26, 2025
6f180b9
SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)
qnixsynapse May 26, 2025
4265a87
cuda : avoid cuGetErrorString (#13791)
ggerganov May 26, 2025
a26c4cc
scripts : add option to compare commits in Debug (#13806)
ggerganov May 26, 2025
cdf94a1
server: --offline mode (#13804)
ochafik May 26, 2025
4f81b33
llama : validate seq id batch input (#13809)
ggerganov May 27, 2025
f9cd683
sampling : make sure samplers return at least 1 token (#13822)
ggerganov May 27, 2025
8171312
kv-cells : track min/max used cells and per-sequence positions (#13808)
ggerganov May 27, 2025
952f395
ggml : allow CUDA graphs when using pipeline parallelism (#13814)
slaren May 27, 2025
7fe03e7
ggml-cpu: x86 feature detection is specific to x86 (#13811)
ckastner May 27, 2025
72b090d
docs: remove link for llama-cli function calling (#13810)
bandoti May 27, 2025
bc583e3
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#…
ngxson May 27, 2025
05f6ac6
ggml : riscv: add xtheadvector support (#13720)
xctan May 27, 2025
a8ea03d
ggml : add ggml_repeat_4d (#13824)
ngxson May 27, 2025
1c49c70
sync : ggml
ggerganov May 27, 2025
f3101a8
SYCL: add gelu_erf kernel (#13749)
qnixsynapse May 27, 2025
34b7c04
cmake : add llama-cparams.cpp to build (#13832)
ggerganov May 27, 2025
bef8176
vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817)
jeffbolznv May 27, 2025
1701d4c
opencl: mark `mul_mat` `f32f32` as supporting non-contiguous tensors …
lhez May 27, 2025
a3c3084
opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, …
lhez May 27, 2025
1e8659e
CANN: Add SOC TYPE printing in cmake configuration (#13837)
leo-pony May 28, 2025
26b79b6
convert : fix tensor naming conflict for llama 4 vision (#13836)
ngxson May 28, 2025
a682474
CUDA: fix FA tg at long context for CC >= 8.9 (#13852)
JohannesGaessler May 28, 2025
f7873fc
tests : change umlaut test (#11600)
n00b001 May 28, 2025
a3938fb
convert : fix qwen omni conversion (#13859)
ngxson May 28, 2025
c962ae3
server: fix remove 'image_url'/'input_audio' json-object effectlly fo…
flyinskyin2013 May 28, 2025
aa6dff0
convert: small addition to support LlamaModel (#13838)
huydt84 May 28, 2025
e0e3aa2
llama : add support for BertForSequenceClassification reranker (#13858)
huydt84 May 28, 2025
d98f2a3
ci: disable LLAMA_CURL for Linux cross-builds (#13871)
bandoti May 28, 2025
1096133
mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866)
ngxson May 28, 2025
763d06e
llama : fix KV shift for qwen2vl (#13870)
ngxson May 28, 2025
53ae306
gguf-py : fix SafetensorRemote return on undefined size (< 0) (#13841)
Beinsezii May 28, 2025
1b8fb81
ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843)
vineelabhinav May 29, 2025
6385b84
llama : add RobertaForSequenceClassification reranker support (#13875)
CISC May 29, 2025
5ca82fc
convert : workaround for AutoConfig dummy labels (#13881)
CISC May 29, 2025
66c9206
tests : remove json.hpp from a test (#13880)
ggerganov May 29, 2025
dd8ba93
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Al…
vineelabhinav May 29, 2025
21fcc21
cmake: Factor out CPU architecture detection (#13883)
ckastner May 29, 2025
54a2c7a
arm64: optimize q4_k_q8_k kernel with i8mm (#13886)
cyb70289 May 29, 2025
2b13162
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_…
CISC May 29, 2025
e83ba3e
llama : add support for jina-reranker-v2 (#13900)
CISC May 29, 2025
ec9e030
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890)
ckastner May 29, 2025
2c90da4
llama : use llm_build_granite for minicpm (#13911)
zkh2016 May 30, 2025
291f2b6
llama : add support for DistilBert (#13907)
huydt84 May 30, 2025
07e4351
convert : allow partial update to the chkhsh pre-tokenizer list (#13847)
ngxson May 30, 2025
db38704
convert : fix rwkv bos/eos token (#13844)
CISC May 30, 2025
53f9250
sync : vendor (#13901)
ggerganov May 30, 2025
b49a8ff
SYCL: Add mrope kernel (#13755)
qnixsynapse May 30, 2025
df0c0c7
cuda : prevent using split buffers with 3d/4d matrices (#13919)
slaren May 30, 2025
dd665cc
parallel : increase the variability of the prompt lengths (#13927)
ggerganov May 30, 2025
b47ab7b
sched : avoid changing cur_copy when a graph is already allocated (#1…
slaren May 30, 2025
e562eec
CUDA: fix typo in FlashAttention code (#13926)
JohannesGaessler May 30, 2025
eb39499
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dG…
Yangxiaoz May 31, 2025
12d0188
kv-cache : refactor + add llama_memory_state_i (#13746)
ggerganov May 31, 2025
51fa76f
mtmd : drop `_shared` from `libmtmd` name, merge helpers into libmtmd…
ngxson May 31, 2025
3f55f78
llama : auto-batch preparation (#13845)
ggerganov May 31, 2025
c7e0a20
webui : Replace alert and confirm with custom modals. (#13711)
igardev May 31, 2025
3600cc2
llama : use n_swa + n_ubatch cells for SWA cache (#13833)
ggerganov May 31, 2025
803f8ba
llama : deprecate explicit kv_self defrag/update calls (#13921)
ggerganov May 31, 2025
e15898d
server: allow unclosed thinking tags (#13931)
ochafik May 31, 2025
b3a89c3
docs : Note about necessity of having libcurl installed for standard …
jpodivin May 31, 2025
053b153
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…
max-krasnyansky May 31, 2025
70ba341
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
47d4c69
ggml-qnn: santiy check
jeffzhou2000 Feb 15, 2025
01583f7
ggml-qnn: update script build-run-android.sh to compare peformance of…
jeffzhou2000 Feb 16, 2025
9fb46a4
ggml-qnn: fix minor issue in test-backend-ops.cpp
jeffzhou2000 Feb 17, 2025
da6bd65
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
f2b5613
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
f928b50
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
d302fdb
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
65ba194
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
fde7b44
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
0b46068
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
c771352
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
d08d87e
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
26f917d
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
06af3e3
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
7633a2e
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
4ff0d6e
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
312ebf2
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
6d67f48
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
b38672b
ggml-qnn: fix a minior typo in internal doc
jeffzhou2000 Feb 23, 2025
ffa3325
ggml-qnn: refine function ggml_qnn_create_general_tensor() to avoid c…
jeffzhou2000 Feb 23, 2025
57a1853
ggml-qnn: fix a minor typo in source code
jeffzhou2000 Feb 24, 2025
81caea9
build: avoid ggml-qnn backend breaking other backend's builds
jeffzhou2000 Feb 24, 2025
60e93fa
ggml-qnn: remove redundant codes to make PR reviewers happy
jeffzhou2000 Feb 25, 2025
9c383ba
ggml-qnn: refine code format
jeffzhou2000 Feb 25, 2025
da6ddb9
ggml-qnn: offload quantized type mulmat to QNN backend
jeffzhou2000 Feb 26, 2025
e203378
ggml-qnn: refine source code structure to make code more clearly
jeffzhou2000 Feb 27, 2025
8c199ae
ggml-qnn: enable release build with necessary logs to make reviewers …
jeffzhou2000 Feb 27, 2025
5c237aa
ggml-qnn: enable all quantize type with 2d mulmat
jeffzhou2000 Feb 27, 2025
d433d0f
ggml-qnn: enable log output of GGMLQNN_LOG_INFO in command line mode …
jeffzhou2000 Feb 28, 2025
f7d5560
ggml-qnn: Windows port --- step2
jeffzhou2000 Feb 28, 2025
f26e62c
ggml-qnn: merge UT code and corresponding script from local dev branc…
jeffzhou2000 Mar 2, 2025
522a885
ggml-qnn: merge ggml_qnn_mul_mat_4d from local dev branch to make wor…
jeffzhou2000 Mar 2, 2025
161e70c
ggml-qnn: submit AI-assisted ggml_qnn_mul_mat_4d(not worked currently…
jeffzhou2000 Mar 2, 2025
1933b70
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step2
jeffzhou2000 Mar 2, 2025
a353225
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step3
jeffzhou2000 Mar 2, 2025
16bcee9
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step4
jeffzhou2000 Mar 2, 2025
57aa646
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step5
jeffzhou2000 Mar 2, 2025
bd39028
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step6
jeffzhou2000 Mar 2, 2025
cba53a6
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step7
jeffzhou2000 Mar 2, 2025
bccdf05
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step8
jeffzhou2000 Mar 2, 2025
628f5b4
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- good in step9
jeffzhou2000 Mar 2, 2025
66fb6b3
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
511595e
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step10
jeffzhou2000 Mar 2, 2025
8e6ee1b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
09bce95
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step11
jeffzhou2000 Mar 2, 2025
2de285b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- both ok in st…
jeffzhou2000 Mar 2, 2025
50df0fa
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 ---finalizing ver…
jeffzhou2000 Mar 2, 2025
2f33937
ggml-qnn: refine ggml_qnn_mul_mat and ggml_qnn_general_node according…
jeffzhou2000 Mar 2, 2025
f6ca139
ggml-qnn: remove no-needed comments
jeffzhou2000 Mar 2, 2025
992594d
ggml-qnn: Windows port --- step3
jeffzhou2000 Mar 3, 2025
566274a
ggml-qnn: remove un-needed function
jeffzhou2000 Mar 4, 2025
a4a2477
ggml-qnn:rebase to upstream
jeffzhou2000 Mar 4, 2025
8125d2f
ggml-qnn: fix a minior issue during rebase to upstream
jeffzhou2000 Mar 4, 2025
3917937
ggml-qnn: update script according to https://github.com/ggml-org/llam…
jeffzhou2000 Mar 4, 2025
1f90162
ggml-qnn: fix a minior issue in ggmlqnn_create_general_tensor()
jeffzhou2000 Mar 4, 2025
0495c33
ggml-qnn: active member variable _device_id in class qnn_instance
jeffzhou2000 Mar 4, 2025
19fffb7
ggml-qnn: refine ggml_qnn_general_node and ggml_qnn_mul_mat to make c…
jeffzhou2000 Mar 4, 2025
bee4bf5
ggml-qnn: Windows port --- step4
jeffzhou2000 Mar 6, 2025
6d50c9d
ggml-qnn: Windows port -- step5
jeffzhou2000 Mar 7, 2025
9ecebae
ggml-qnn: WoA(Windows on ARM) -- step6
jeffzhou2000 Mar 8, 2025
69a4f8c
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 9, 2025
b1fe0fb
ggml-qnn: pr to upstream
jeffzhou2000 Mar 11, 2025
5bc8cca
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 18, 2025
c72e422
ggml-qnn: self code-review
jeffzhou2000 Mar 18, 2025
5ba04ce
ggml-qnn: rebase upstream
jeffzhou2000 Mar 19, 2025
218317f
ggml-qnn: add approach through Hexagon cDSP
jeffzhou2000 Mar 22, 2025
e3dfce2
ggml-qnn: refine general approach through Hexagon cDSP
jeffzhou2000 Mar 23, 2025
49ad603
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
87ba8f4
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
b8e2eac
ggml-qnn: add build script for libggmlop_skel.so
jeffzhou2000 Mar 24, 2025
96540e3
ggml-qnn: remove redundant functions in this PR and make codes more c…
jeffzhou2000 Mar 25, 2025
3db5837
ggml-qnn: original ggml_compute_forward_add and ggml_compute_forward_…
jeffzhou2000 Mar 25, 2025
e4e90a0
ggml-qnn: modify build-run-android.sh to verify mulmat and validate m…
jeffzhou2000 Mar 25, 2025
461b316
ggml-qnn: make host code(ggml-qnn.cpp) more clear and more stable
jeffzhou2000 Mar 26, 2025
88d1382
ggml-qnn: refine code according to self code-review and make code mor…
jeffzhou2000 Mar 26, 2025
4b6377c
ggml-qnn: offload more ggml op to Hexagon cDSP
jeffzhou2000 Mar 27, 2025
1fb650b
ggml-hexagon: code on AP(arm-cpu) side is stable now
jeffzhou2000 Mar 28, 2025
1dcf86c
ggml-hexagon: optimize GGML_OP_ADD on cDSP side
jeffzhou2000 Mar 28, 2025
f7547c2
ggml-hexagon: simplify hexagon-kernel build logic in CMakeLists.txt
jeffzhou2000 Mar 29, 2025
bd2b26d
ggml-hexagon: release ggml-hexagon v0.98
jeffzhou2000 Mar 29, 2025
f8b2ff6
ggml-hexagon: release ggml-hexagon v0.99
jeffzhou2000 Mar 29, 2025
e36f734
ggml-hexagon: try to offload q6_k mulmat to cDSP
jeffzhou2000 Mar 29, 2025
0e36191
ggml-hexagon: fix minior issue in ggml-hexagon.cpp after self code-re…
jeffzhou2000 Mar 29, 2025
4f2921f
ggml-hexagon: check validation of ggml-hexagon.cfg before create appr…
jeffzhou2000 Mar 30, 2025
568f8ec
ggml-hexagon: fix all compiler warnings in ggml-hexagon.cpp
jeffzhou2000 Mar 30, 2025
8ad201a
ggml-hexagon: enable only one backend device for HWACCEL_CDSP and ena…
jeffzhou2000 Mar 31, 2025
7332991
ggml-hexagon: rpc ion memory pool and test-backend-ops works fine in …
jeffzhou2000 Mar 31, 2025
7a530b1
ggml-hexagon: make comprision of mulmat performance between HWACCEL_Q…
jeffzhou2000 Mar 31, 2025
e2dc356
ggml-hexagon: release ggml-hexagon v1.00
jeffzhou2000 Mar 31, 2025
e6c734a
ggml-hexagon: rebase to upstream
jeffzhou2000 Apr 1, 2025
b326de2
ggml-hexagon: check configuration of enable_rpc_dma_mempool in functi…
jeffzhou2000 Apr 1, 2025
983d5ac
ggml-hexagon: uniform rpc_ion_memsize and rpc_ion_usage between HWACC…
jeffzhou2000 Apr 1, 2025
bf48ee1
ggml-hexagon: make buffer mechanism more clear in HWACCEL_CDSP approach
jeffzhou2000 Apr 1, 2025
362ccb3
ggml-hexagon: add perf function in hexagon kernerls on cDSP side
jeffzhou2000 Apr 2, 2025
67dd62c
ggml-hexagon: fix a stupid issue of why set rpc latency failure and i…
jeffzhou2000 Apr 2, 2025
e693fb6
ggml-hexagon: make helper function ggmlhexagon_get_timestring() threa…
jeffzhou2000 Apr 2, 2025
82e23af
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
adca18f
ggml-hexagon: list all known todo and fixme tasks in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
8937bf1
ggml-hexagon: fix units MB -> MiB
jeffzhou2000 Apr 2, 2025
f030dc5
ggml-hexagon: try to make ggml-hexagon backend works fine in a standa…
jeffzhou2000 Apr 3, 2025
88d83ca
ggml-hexagon: remove reduament code and make debug log more clear
jeffzhou2000 Apr 3, 2025
1ea9b72
ggml-hexagon: add gemma-3-4b-it-Q8_0.gguf to verify q8_0 mulmat on cDSP
jeffzhou2000 Apr 3, 2025
feedb26
ggml-hexagon:add skeleton code of offload GGML_OP_SOFT_MAX/GGML_OP_RM…
jeffzhou2000 Apr 3, 2025
650d287
ggml-hexagon: release ggml-dsp v0.60 on cDSP side
jeffzhou2000 Apr 4, 2025
601a122
ggml-hexagon: merge build logic in kernels/Makefile to ggml-hexagon/C…
jeffzhou2000 Apr 5, 2025
0b50560
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 5, 2025
0be7947
ggml-hexagon: uniform NDEBUG usage in ggml-hexagon.cpp and ggml-dsp.c
jeffzhou2000 Apr 6, 2025
3f0fd15
ggml-hexagon: add profiler feature for purpose of visualize NPU perfo…
jeffzhou2000 Apr 7, 2025
fb3b179
ggml-hexagon: remove so-called dma memory pool to avoid confusion and…
jeffzhou2000 Apr 8, 2025
b5fa723
ggml-hexagon: make function ggmlhexagon_init_rpcmempool in ggml-hexag…
jeffzhou2000 Apr 8, 2025
377119d
ggml-hexagon: fix potential resource leak in class hexagon_profiler
jeffzhou2000 Apr 8, 2025
16e3701
ggml-hexagon: enable multi-threading feature on cDSP side
jeffzhou2000 Apr 8, 2025
a6525ac
ggml-hexagon: upgrade QNN SDK to v2.33.0.250327
jeffzhou2000 Apr 9, 2025
0760339
ggml-hexagon: fix typo in ggml-hexagon.cpp
jeffzhou2000 Apr 9, 2025
7281f74
ggml-dsp: probe QuRT RTOS information in function ggmlop_dsp_open
jeffzhou2000 Apr 9, 2025
ba00a33
ggml-hexagon: setting enable_rpc_ion_mempool to 1 and make test-backe…
jeffzhou2000 Apr 10, 2025
527985b
ggml-hexagon: check whether user's specified htp arch is valid in CMa…
jeffzhou2000 Apr 10, 2025
91cbc25
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 11, 2025
ac215b7
ggml-hexagon: refine pinned-memory feature
jeffzhou2000 Apr 11, 2025
97d3eb4
ggml-hexagon: refine build system in ggml-hexagon
jeffzhou2000 Apr 11, 2025
eb49be9
ggml-hexagon: remove redundant code in struct ggml_backend_hexagon_bu…
jeffzhou2000 Apr 11, 2025
3d0e0f0
ggml-hexagon: upgrade Android NDK to android-ndk-r28
jeffzhou2000 Apr 11, 2025
299fb1f
ggml-dsp: split ggml-dsp.c into multiple files and cleanup
jeffzhou2000 Apr 11, 2025
5248864
ggml-dsp: refine ggml-dsp and make ggml-dsp more clear
jeffzhou2000 Apr 12, 2025
5991c2c
ggml-hexagon: fix a minior issue in dev ops
jeffzhou2000 Apr 12, 2025
4e3f281
ggml-hexagon: fix a build issue in CI
jeffzhou2000 Apr 12, 2025
3a1a0cc
ggml-dsp: cleanup code
jeffzhou2000 Apr 15, 2025
9d654e8
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 15, 2025
3cfb702
ggml-dsp: cleanup code
jeffzhou2000 Apr 16, 2025
2773933
ggml-dsp:refine ggmlhexagon_dsp_add_f32
jeffzhou2000 Apr 16, 2025
e3a3d2c
ggml-dsp: refine logic of thread_counts
jeffzhou2000 Apr 17, 2025
c629118
ggml-hexagon: release v1.06 and ready for code review
jeffzhou2000 Apr 17, 2025
4f49c7a
ggml-dsp: make GGML_OP_ADD more faster on cDSP side
jeffzhou2000 Apr 19, 2025
00b5d44
ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…
jeffzhou2000 Apr 24, 2025
cbda1c8
sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…
jeffzhou2000 Apr 29, 2025
df64fef
sync with upstream
jeffzhou2000 May 7, 2025
6f6cd17
sync with upstream
jeffzhou2000 May 10, 2025
987e959
ggml-hexagon: upgrade QNN SDK to v2.34.0.250424
jeffzhou2000 May 11, 2025
2c50925
sync with upstream
jeffzhou2000 May 16, 2025
915e31e
ggml-hexagon: sync from project kantv(fix a long-term issue which int…
jeffzhou2000 May 17, 2025
26e27c9
ggml-hexagon: sync with upstream llama.cpp
jeffzhou2000 May 23, 2025
ebbdc41
build: enable self-contained-build to simplify workflow
jeffzhou2000 May 23, 2025
24a5e69
sync with upstream
jeffzhou2000 May 23, 2025
5b21435
add prebuilt binary libggmlop-skel.so
jeffzhou2000 May 31, 2025
6962ac6
refine ggml-hexagon.cfg for the prebuilt binary libggmlop-skel.so
jeffzhou2000 May 31, 2025
0c53100
refine scripts to avoid confusion
jeffzhou2000 Jun 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.0.0-0-devel-ubuntu22.04
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04

## Build Image

Expand Down
15 changes: 4 additions & 11 deletions .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.1
ARG MUSA_VERSION=rc4.0.1
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-devel-ubuntu${UBUNTU_VERSION}

ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

Expand All @@ -21,21 +21,14 @@ RUN apt-get update && \
libcurl4-openssl-dev \
libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt

WORKDIR /app

COPY . .

# Use the default MUSA archs if not specified
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
4 changes: 4 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,7 @@ end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[vendor/miniaudio/miniaudio.h]
trim_trailing_whitespace = unset
insert_final_newline = unset
30 changes: 15 additions & 15 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ jobs:
sudo apt-get install -y --no-install-recommends \
build-essential \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu \
libcurl4-openssl-dev:riscv64
g++-14-riscv64-linux-gnu
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
Expand Down Expand Up @@ -72,12 +72,12 @@ jobs:
glslc \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu \
libvulkan-dev:riscv64 \
libcurl4-openssl-dev:riscv64
libvulkan-dev:riscv64
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
Expand Down Expand Up @@ -118,12 +118,12 @@ jobs:
build-essential \
glslc \
crossbuild-essential-arm64 \
libvulkan-dev:arm64 \
libcurl4-openssl-dev:arm64
libvulkan-dev:arm64
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
Expand Down Expand Up @@ -163,12 +163,12 @@ jobs:
sudo apt-get install -y --no-install-recommends \
build-essential \
gcc-14-powerpc64le-linux-gnu \
g++-14-powerpc64le-linux-gnu \
libcurl4-openssl-dev:ppc64el
g++-14-powerpc64le-linux-gnu
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
Expand Down Expand Up @@ -209,12 +209,12 @@ jobs:
glslc \
gcc-14-powerpc64le-linux-gnu \
g++-14-powerpc64le-linux-gnu \
libvulkan-dev:ppc64el \
libcurl4-openssl-dev:ppc64el
libvulkan-dev:ppc64el
- name: Build
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ jobs:

ubuntu-22-cmake-musa:
runs-on: ubuntu-22.04
container: mthreads/musa:rc3.1.1-devel-ubuntu22.04
container: mthreads/musa:rc4.0.1-mudnn-devel-ubuntu22.04

steps:
- name: Clone
Expand Down Expand Up @@ -899,7 +899,7 @@ jobs:
shell: bash

env:
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/b380d914-366b-4b77-a74a-05e3c38b3514/intel-oneapi-base-toolkit-2025.0.0.882_offline.exe
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/7cd9bba0-7aab-4e30-b3ae-2221006a4a05/intel-oneapi-base-toolkit-2025.1.1.34_offline.exe
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel:intel.oneapi.win.dnnl:intel.oneapi.win.tbb.devel
ONEAPI_ROOT: "C:/Program Files (x86)/Intel/oneAPI"
steps:
Expand Down
Loading
Loading