Skip to content

Conversation

taronaeo
Copy link
Collaborator

Fixes #14877. Updates s390x build documentation as well.

taronaeo and others added 30 commits July 21, 2025 18:21
The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.
* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review
* add conv2d kernel

* fix trailing whitespace

* whitespace fixe

* handle f16 input and f16 kernel, more opt

* resolve conflicts

* use enqueue_ndrange_kernel
* implement bf16 cpy ops and enable bf16 cont

* deduplicate copy functions

* deduplicate checks
* Mtmd: add a way to select device for vision encoder

* simplify

* format

* Warn user if manual device selection failed

* initialize backend to nullptr
…n imatrix file (ggml-org#12718)

* Add --show-statistics option

* Add --show-statistics logic

* Add tensor name parsing

* Tidy output format

* Fix typo in title

* Improve tensor influence ranking

* Add better statistics

* Change statistics' sort order

* Add Cosine Similarity

* Add header search path

* Change header search path to private

* Add weighted statistics per layer

* Update report title

* Refactor compute_statistics out of main

* Refactor compute_cossim out of load_imatrix

* Refactor compute_statistics out of load_imatrix

* Move imatrix statistics calculation into its own functions

* Add checks and validations

* Remove unnecessary include directory

* Rename labels

* Add m_stats getter and refactor compute_statistics out of load_imatrix

* Refactor variable names

* Minor cosmetic change

* Retrigger checks (empty commit)

* Rerun checks (empty commit)

* Fix unnecessary type promotion

Co-authored-by: compilade <[email protected]>

* Reverting change to improve code readability

* Rerun checks (empty commit)

* Rerun checks (empty commit)

* Rerun checks - third time's the Charm 🤞 (empty commit)

* Minor cosmetic change

* Update README

* Fix typo

* Update README

* Rerun checks (empty commit)

* Re-implement changes on top of ggml-org#9400

* Update README.md

* Update README

* Update README.md

Co-authored-by: compilade <[email protected]>

* Update README.md

Co-authored-by: compilade <[email protected]>

* Update README.md

* Remove duplicate option in print_usage()

* Update README.md

* Update README.md

Co-authored-by: compilade <[email protected]>

* Update README.md

Co-authored-by: compilade <[email protected]>

* Remove input check

* Remove commented out code

---------

Co-authored-by: compilade <[email protected]>
* weight format to nz for 310p

* remove quant weight format to nz

* clean code

* fix

* make the conditions for converting weights to NZ format consistent

* clean code
…org#14675)

* Update llama-memory-recurrent.cpp

handle saving/loading null layers in recurrent memory

* fixed styling issues and updated comments

* fix styling issue

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* use language_model part only, ignore visual layers

* fix rope_dim calculation
* metal : fix fusion across different encoders

ggml-ci

* cont : add assertion

ggml-ci
* docs: add libcurl-dev install hint for Linux distros

Signed-off-by: PouyaGhahramanian <[email protected]>

* Update docs/build.md

---------

Signed-off-by: PouyaGhahramanian <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>
dg0yt and others added 10 commits July 25, 2025 21:24
* CMake config: Create target only once

Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.

* CMake config: Add CUDA link libs

* CMake config: Add OpenCL link libs

* CMake config: Use canonical find_dependency

Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.

* CMake config: Wire OpenMP dependency
* musa: apply mublas API changes

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: update musa version to 4.2.0

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: restore MUSA graph settings in CMakeLists.txt

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: disable mudnnMemcpyAsync by default

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: switch back to non-mudnn images

Signed-off-by: Xiaodong Ye <[email protected]>

* minor changes

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: restore rc in docker image tag

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
…org#14503)

* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/):

> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".

I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
@github-actions github-actions bot added documentation Improvements or additions to documentation script Script related testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend examples python python script changes devops improvements to build systems and github actions server ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Jul 25, 2025
@taronaeo
Copy link
Collaborator Author

Oh my god. This was not expected. Will re-create PR, sorry.

@taronaeo taronaeo closed this Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs devops improvements to build systems and github actions documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend python python script changes script Script related server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: s390x GGML_NNPA=ON Generates Gibberish Tokens at Different Thread Counts