Skip to content

Conversation

@l3utterfly
Copy link
Owner

No description provided.

ggerganov and others added 30 commits May 22, 2025 16:33
* common: update requirements.txt to include pytorch nightly for s390x

Signed-off-by: Aaron Teo <[email protected]>

* common: fix torch installation via pip for s390x

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>
* convert ok, load ok

* warmup ok

* test

* still does not work?

* fix padding

* temporary give up

* fix merge conflict

* build_ultravox()

* rm test

* fix merge conflict

* add necessary mtmd APIs

* first working version (only 4s of audio)

* will this monster compile?

* fix compile

* please compile

* fPIC

* fix windows

* various fixes

* clean up audio_helpers

* fix conversion

* add some debug stuff

* long audio input ok

* adapt the api

* add --audio arg

* final touch UX

* add miniaudio to readme

* fix typo

* refactor kv metadata

* mtmd_default_marker()
* release : fix windows hip release

* make single hip release with multiple targets
Reuse the f16/f32 copy shaders, and just scale the number of elements
according to the type size.
* [CANN]Support MUL_MAT_ID Q8 && Q4

Signed-off-by: noemotiovon <[email protected]>

* codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
* server : support audio input

* add audio support on webui
* ggml : add ggml_gelu_erf() CUDA kernel

* missing semicolon
…gml-org#12379)

* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <[email protected]>
Co-authored-by: Olivier Chafik <[email protected]>
…-org#13752)

Temporarily reverted due to failing fp16 DIV operation

This reverts commit 02cdd2d.

ggml-ci
* Multimodal: Added Moondream2 model and fixed ggml.org link

* Apply suggestions from code review

---------

Co-authored-by: name <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>
* mtmd : add Qwen2-Audio support

* small clean up

* update discussion link

* clarify mtmd_get_output_embd

* clarification in multimodal.md

* fix ultravox bug

* ggml_cont
* kv-cache : rework kv_cell

ggml-ci

* kv-cells : use "shift" instead of "delta" consistently

ggml-ci

* llama : add llama_max_parallel_sequences()

ggml-ci

* kv-cells : update comments [no ci]

* context : fail upon construction if sequences exceed max value

ggml-ci

* kv-cells : get_pos() -> pos_get() + comments

ggml-ci

* kv-cells : fix tracking of "used" cells

ggml-ci
@l3utterfly l3utterfly merged commit fcb4b60 into l3utterfly:ggml-hexagon Jun 3, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.