Skip to content

Conversation

@l3utterfly
Copy link
Owner

No description provided.

vynride and others added 30 commits March 1, 2025 11:15
* fix typos and improve menu text clarity

* rename variable trimedValue to trimmedValue

* add updated index.html.gz

* rebuild

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
…versation mode (ggml-org#12131)

* Add --system-prompt parameter

* use user defined system prompt

* clarify

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* add warning

* clarify

Co-authored-by: Xuan-Son Nguyen <[email protected]>

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
…) (ggml-org#12132)

* Update outdated message

* wording

Co-authored-by: Xuan-Son Nguyen <[email protected]>

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
* Use jinja chat template system prompt by default

* faster conditional order

* remove nested ternary

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
…ggml-org#12133)

* SYCL: refactor and move cpy kernels to a separate file

* Add few missing cpy kernels

* refactor and add debug logs
* webui : add ?m=... and ?q=... params

* also clear prefilledMessage variable

* better approach

* fix comment

* test: bump timeout on GITHUB_ACTION
For emojis, non-alpha characters, etc.

Signed-off-by: Eric Curtin <[email protected]>
The libggml API has changed, but this has not been updated.
* tts: add speaker file support

Signed-off-by: dm4 <[email protected]>

* tts: handle outetts-0.3

* tts : add new line in error message

---------

Signed-off-by: dm4 <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
This commit tries to address/improve an issue with the server tests
which are failing with a timeout. Looking at the logs it seems like
they are timing out after 12 seconds:
```
FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds
```

This is somewhat strange as in utils.py we have the following values:
```python
DEFAULT_HTTP_TIMEOUT = 12

if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ:
    DEFAULT_HTTP_TIMEOUT = 30

    def start(self, timeout_seconds: int | None = DEFAULT_HTTP_TIMEOUT) -> None:
```
It should be the case that a test running in a github action should have
a timeout of 30 seconds. However, it seems like this is not the case.
Inspecting the logs from the CI job we can see the following environment
variables:
```console
Run cd examples/server/tests
2 cd examples/server/tests
3 ./tests.sh
4 shell: /usr/bin/bash -e {0}
5 env:
6 LLAMA_LOG_COLORS: 1
7 LLAMA_LOG_PREFIX: 1
8 LLAMA_LOG_TIMESTAMPS: 1
9 LLAMA_LOG_VERBOSITY: 10
10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64
```

This probably does not address the underlying issue that the servers
that are providing the models to be downloaded occasionally take a
longer time to response but might improve these situations in some
cases.
… backend (ggml/1121)

* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend

* Add fp16 support for add/sub/mul/div on the CPU backend

* Add test cases for fp16 add/sub/mul/div
It is used by Whisper talk-llama example.

Co-authored-by: Petter Reinholdtsen <[email protected]>
* Add small comment re: VSX to readme

Co-authored-by: midnight <[email protected]>
* whisper : support GGML_BACKEND_DL

* fix DTW crash

* whisper.objc : fix build - add ggml-cpp.h

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* Support fp16 unary operations in the CUDA backend

* cpu: increase fp16 support for unary operators in the CPU backend

* cuda: increase fp16 support for unary operators in the CUDA backend

* Add test cases for fp16 unary operators

* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing

* metal: fix PR comments for unary op support after fp16 unary tests
ggml-ci
ggml-ci
ggml-ci
…rg#12032)

Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16

---

Signed-off-by: Carl Klemm <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Ben Jackson <[email protected]>
…rn (ggml-org#12145)

* Add chat template formatting to -no-cnv

* only enable prompt formatting if explicitly enabled

* add -st / --single-turn

* add --single-turn and -p in conversation mode

* fix -sys + -p

* reword warning

* small readability change and fix (long) outdated example usage

* only activate single turn in conversation mode
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <[email protected]>
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: ggml-org#10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
johnbean393 and others added 15 commits March 10, 2025 16:13
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
* clip : bring back GPU support

* use n_gpu_layers param

* fix double free

* ggml_backend_init_by_type

* clean up
* Fix backend search path

* replace .native() with '/'

* reverted .native()
…per block between host and device code. (ggml-org#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <[email protected]>
* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id
…org#12343)

* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (ggml-org#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64
…2315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
@github-actions github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU labels Mar 13, 2025
@l3utterfly l3utterfly merged commit 367fa9c into layla-build Mar 13, 2025
30 of 48 checks passed
@l3utterfly l3utterfly deleted the merge-1 branch March 13, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.