Skip to content

Conversation

l3utterfly
Copy link
Owner

Make sure to read the contributing guidelines before submitting a PR

JohannesGaessler and others added 30 commits April 29, 2025 16:00
…org#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>
* docker : do not build tests

* include "ggml-cpu.h"
* arg : allow using -hf offline

* add more comments in code [no ci]
z17 compilation requires GCC 15.1.0 and onwards

Signed-off-by: Aaron Teo <[email protected]>
Build fails with compilation error on power pc.
This patch fixes the same.

Tested with unit tests run via
 --build <build_dir> && cd <build_dir> && make test

Signed-off-by: Shalini Salomi Bodapati <[email protected]>
* convert : improve model arch handling

* use AutoConfig

* rm trust_remote_code

* Update convert_hf_to_gguf.py

* fix self.block_count for vision

* fix NomicBertModel
* arg : -hf do not fail if url mismatch

* do not return if cannot parse metadata json
* convert ok

* load ok, missing patch merger

* ah sheet it works

* update llava/readme

* add test

* fix test
* whisper: suppress Windows compiler warnings

This commit disables compiler warnings on window using MSVC.

The motivation for these changes is that some compilers generate
warnings for these conversion, for example Windows MSVC, and
there are quite a few of them. This makes it a little difficult to
spot new warnings that may be introduced and also can be difficult
for users/embedders of ggml where these warnings are hard to separate
from their own warnings.

* squash! whisper: suppress Windows compiler warnings

Move ggml related warnings into ggml. This commit also fixes the
indentation and adds a missing whitespace to the if statement.
This commit adds a check to makes sure that the target exists before
trying to add compile options to ignore warnings when using MSVC.

The motivation for this is currently the build is broken depending on
the cmake options provided. With this fix it should be possible to build
even if the targets are not actually available.

Refs: ggml-org/whisper.cpp#3090 (comment)
ggml-ci
…der (ggml-org#13191)

* vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader
* vulkan: Add bfloat16 support

This adds bfloat16 matrix multiply support based on VK_KHR_shader_bfloat16.
The extension is required for coopmat multiply support, but matrix-vector
multiply trivially promotes bf16 to fp32 and doesn't require the extension.
The copy/get_rows shaders also don't require the extension.

It's probably possible to fall back to non-coopmat and promote to fp32 when
the extension isn't supported, but this change doesn't do that.

The coopmat support also requires a glslc that supports the extension, which
currently requires a custom build.

* vulkan: Support bf16 tensors without the bf16 extension or coopmat support

Compile a variant of the scalar mul_mm shader that will promote the bf16
values to float, and use that when either the bf16 extension or the coopmat
extensions aren't available.

* vulkan: bfloat16 fixes (really works without bfloat16 support now)

* vulkan: fix spirv-val failure and reenable -O
ochafik and others added 16 commits May 15, 2025 23:29
…13551)

* webui : improve accessibility for visually impaired people

* add a11y for extra contents

* fix some labels being read twice

* add skip to main content
* vulkan: move common FA code to flash_attn_base.comp

* vulkan: move common FA index/stride setup code to flash_attn_base.comp

* build fix
* parallel : add option for non-shared and larger prompts

* parallel : update readme [no ci]

* cont : add note about base models [no ci]

* parallel : better var name

ggml-ci
…13595)

* fix: use the current build config for `vulkan-shaders-gen`

* fix: only pass a valid build type to `--config`
* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md
@l3utterfly l3utterfly merged commit 140d3f1 into layla-build May 19, 2025
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.