merge from upstream #57

l3utterfly · 2025-03-13T09:26:15Z

No description provided.

* fix typos and improve menu text clarity * rename variable trimedValue to trimmedValue * add updated index.html.gz * rebuild --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

…versation mode (ggml-org#12131) * Add --system-prompt parameter * use user defined system prompt * clarify Co-authored-by: Xuan-Son Nguyen <[email protected]> * add warning * clarify Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

…) (ggml-org#12132) * Update outdated message * wording Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Use jinja chat template system prompt by default * faster conditional order * remove nested ternary --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

…rg#12144)

…ggml-org#12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs

* webui : add ?m=... and ?q=... params * also clear prefilledMessage variable * better approach * fix comment * test: bump timeout on GITHUB_ACTION

For emojis, non-alpha characters, etc. Signed-off-by: Eric Curtin <[email protected]>

The libggml API has changed, but this has not been updated.

* tts: add speaker file support Signed-off-by: dm4 <[email protected]> * tts: handle outetts-0.3 * tts : add new line in error message --------- Signed-off-by: dm4 <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

This commit tries to address/improve an issue with the server tests which are failing with a timeout. Looking at the logs it seems like they are timing out after 12 seconds: ``` FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds ``` This is somewhat strange as in utils.py we have the following values: ```python DEFAULT_HTTP_TIMEOUT = 12 if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ: DEFAULT_HTTP_TIMEOUT = 30 def start(self, timeout_seconds: int | None = DEFAULT_HTTP_TIMEOUT) -> None: ``` It should be the case that a test running in a github action should have a timeout of 30 seconds. However, it seems like this is not the case. Inspecting the logs from the CI job we can see the following environment variables: ```console Run cd examples/server/tests 2 cd examples/server/tests 3 ./tests.sh 4 shell: /usr/bin/bash -e {0} 5 env: 6 LLAMA_LOG_COLORS: 1 7 LLAMA_LOG_PREFIX: 1 8 LLAMA_LOG_TIMESTAMPS: 1 9 LLAMA_LOG_VERBOSITY: 10 10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64 ``` This probably does not address the underlying issue that the servers that are providing the models to be downloaded occasionally take a longer time to response but might improve these situations in some cases.

… backend (ggml/1121) * Support float16-to-float16 add/sub/mul/div operations in the CUDA backend * Add fp16 support for add/sub/mul/div on the CPU backend * Add test cases for fp16 add/sub/mul/div

It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <[email protected]>

* Add small comment re: VSX to readme Co-authored-by: midnight <[email protected]>

* whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <[email protected]>

* Support fp16 unary operations in the CUDA backend * cpu: increase fp16 support for unary operators in the CPU backend * cuda: increase fp16 support for unary operators in the CUDA backend * Add test cases for fp16 unary operators * metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing * metal: fix PR comments for unary op support after fp16 unary tests

ggml-ci

…s_op (ggml/1129) ggml-ci

ggml-ci

…rg#12032) Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check Adds rocWMMA support to fattn-wmma-f16 --- Signed-off-by: Carl Klemm <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Ben Jackson <[email protected]>

…ing (ggml-org#12168)

…rn (ggml-org#12145) * Add chat template formatting to -no-cnv * only enable prompt formatting if explicitly enabled * add -st / --single-turn * add --single-turn and -p in conversation mode * fix -sys + -p * reword warning * small readability change and fix (long) outdated example usage * only activate single turn in conversation mode

* Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <[email protected]>

* llama : add xcframework build script This commit adds a script to build an XCFramework for Apple ios, macos, visionos, and tvos platforms. The generated XCFramework can then be added to a project and used in the same way as a regular framework. The llama.swiftui example project has been updated to use the XCFramework and can be started using the following command: ```console $ open examples/llama.swiftui/llama.swiftui.xcodeproj/ ``` Refs: ggml-org#10747 * examples : remove llama.cpp (source dir ref) from project.pbxproj This commit removes the reference to llama.cpp from the project.pbxproj file since Package.swift has been removed. * ci : updated build.yml to use build-xcframework.sh * ci : add xcframework build to github releases This commit adds the ability to create a GitHub release with the xcframework build artifact. * scripts : add apple app validation scripts This commit adds scripts that can validate the iOS, macOS, tvOS, and VisionOS applications. The scripts create a simple test app project, copy the llama.xcframework to the test project, build and archive the app, create an IPA from the archive, and validate the IPA using altool. The motivation for this is to provide some basic validation and hopefully avoid having to manually validate apps in Xcode. * llama : remove Package.swift This commit removes the Package.swift file, as we are now building an XCFramework for the project. * llama : remove Sources and spm-headers directories * llama : use TargetConditionals.h for visionOS/tvOS

This patch nudges the llama.cpp a bit to be supported on PoCL which doesn't support OpenCL C CL2.0. The issue is solved by querying the device for the supported OpenCL C versions and using the highest one available.

Signed-off-by: Xiaodong Ye <[email protected]>

* clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up

…#12265)

* Fix backend search path * replace .native() with '/' * reverted .native()

…per block between host and device code. (ggml-org#12177) refactor mmqv to unify the calculation of nwarps and rows per block between host and device code. --------- Co-authored-by: Johannes Gäßler <[email protected]>

* tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id

…org#12343) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (ggml-org#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64

…2315) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64

This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.

… for VK_NV_cooperative_matrix2 support (ggml-org#12301)

vynride and others added 30 commits March 1, 2025 11:15

webui : minor typo fixes (ggml-org#12116)

2cc4a5e

* fix typos and improve menu text clarity * rename variable trimedValue to trimmedValue * add updated index.html.gz * rebuild --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

CUDA: compress mode option and default to size (ggml-org#12029)

80c41dd

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

main: update outdated system prompt message (followup to ggml-org#12131…

1782cdf

…) (ggml-org#12132) * Update outdated message * wording Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

main: use jinja chat template system prompt by default (ggml-org#12118)

14dec0c

* Use jinja chat template system prompt by default * faster conditional order * remove nested ternary --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

ggml-backend : keep paths in native string type when possible (ggml-o…

cc473ca

…rg#12144)

SYCL: Move CPY kernels to a separate file and add few missing kernels (…

ece9745

…ggml-org#12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs

webui : add ?m=... and ?q=... params (ggml-org#12148)

7b69003

* webui : add ?m=... and ?q=... params * also clear prefilledMessage variable * better approach * fix comment * test: bump timeout on GITHUB_ACTION

Adding UTF-8 support to llama.cpp (ggml-org#12111)

c950a1f

For emojis, non-alpha characters, etc. Signed-off-by: Eric Curtin <[email protected]>

ggml : fix kleidiai build (ggml-org#12159)

9660ffe

The libggml API has changed, but this has not been updated.

test-backend-ops : add option -p to filter by op params (ggml-org#12155)

d5c63cd

tts: add speaker file support (ggml-org#12048)

c43af92

* tts: add speaker file support Signed-off-by: dm4 <[email protected]> * tts: handle outetts-0.3 * tts : add new line in error message --------- Signed-off-by: dm4 <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

scripts : sync-ggml-am.sh fix

aede207

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU)…

f54a4ba

… backend (ggml/1121) * Support float16-to-float16 add/sub/mul/div operations in the CUDA backend * Add fp16 support for add/sub/mul/div on the CPU backend * Add test cases for fp16 add/sub/mul/div

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)

4512055

It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <[email protected]>

cmake : fix compile assumptions for power9/etc (whisper/2777)

6512a90

* Add small comment re: VSX to readme Co-authored-by: midnight <[email protected]>

whisper : support GGML_BACKEND_DL (whisper/2843)

6d4c23b

* whisper : support GGML_BACKEND_DL * fix DTW crash * whisper.objc : fix build - add ggml-cpp.h --------- Co-authored-by: Georgi Gerganov <[email protected]>

sync : ggml

8371d44

ggml-ci

cuda/vulkan: specify fp32-only support for some operations in support…

0cbee13

…s_op (ggml/1129) ggml-ci

sync : ggml

3d1cf3c

ggml-ci

cuda: unary ops as float + de-duplicate (ggml/1130)

b64d7cc

sync : ggml

dfd6b2c

ggml-ci

server: fix deadly typo in response_format.json_schema.schema handl…

1a24c46

…ing (ggml-org#12168)

readme : fix roadmap link (ggml-org#12185)

20a9b8f

johnbean393 and others added 15 commits March 10, 2025 16:13

readme: added Sidekick to available UIs (ggml-org#12311)

89b2b56

opencl: use OpenCL C standard supported by the device (ggml-org#12221)

8acdacb

This patch nudges the llama.cpp a bit to be supported on PoCL which doesn't support OpenCL C CL2.0. The issue is solved by querying the device for the supported OpenCL C versions and using the highest one available.

musa: support new arch mp_31 and update doc (ggml-org#12296)

2513645

Signed-off-by: Xiaodong Ye <[email protected]>

mat vec double buffer (ggml-org#12188)

2c9f833

clip : bring back GPU support (ggml-org#12322)

96e1280

* clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up

metal : Cache the Metal library at the device context level (ggml-org…

6ab2e47

…#12265)

ggml-backend : fix backend search path (ggml-org#12330)

ba76543

* Fix backend search path * replace .native() with '/' * reverted .native()

vulkan: fix bug in coopmat1 mul_mat_id (ggml-org#12316)

bf69cfe

* tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id

sycl : variable sg_size support for mmvq kernels (ggml-org#12336)

363f8c5

llama.swiftui : fix xcframework dir in README [no ci] (ggml-org#12353)

80a02aa

This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.

Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK…

f08f4b3

… for VK_NV_cooperative_matrix2 support (ggml-org#12301)

Merge branch 'layla-build' into merge-1

d7830d2

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU labels Mar 13, 2025

l3utterfly merged commit 367fa9c into layla-build Mar 13, 2025
30 of 48 checks passed

github-actions bot added Vulkan testing examples devops python android server ggml Apple Metal script labels Mar 13, 2025

l3utterfly deleted the merge-1 branch March 13, 2025 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from upstream #57

merge from upstream #57

Uh oh!

l3utterfly commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge from upstream #57

merge from upstream #57

Uh oh!

Conversation

l3utterfly commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants