Sync master with upstream release b5083 #50

jan-service-account · 2025-04-09T04:07:06Z

Updates dev branch with latest release (b5083) from ggml-org/llama.cpp

* CANN: Refactor to reduce duplicate code * CANN: fix review comment

* cmake : enable curl by default * no curl if no examples * fix build * fix build-linux-cross * add windows-setup-curl * fix * shell * fix path * fix windows-latest-cmake* * run: include_directories * LLAMA_RUN_EXTRA_LIBS * sycl: no llama_curl * no test-arg-parser on windows * clarification * try riscv64 / arm64 * windows: include libcurl inside release binary * add msg * fix mac / ios / android build * will this fix xcode? * try clearing the cache * add bunch of licenses * revert clear cache * fix xcode * fix xcode (2) * fix typo

…et_tensor (ggml-org#12734)

… (ggml/1167) * cpu: refactor SIMD mappings and vectorized op functions into separate files * Fix warning for ggml_float to float * Fix warnings * cpu: move all the operations (except mul_mat) to a separate c++ file * fix whitespace * Update ggml/src/ggml-cpu/vec.h Co-authored-by: Diego Devesa <[email protected]> * Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp * Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously --------- Co-authored-by: Diego Devesa <[email protected]>

* add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr

* ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci

ggml-ci

* llama4 conversion * initial support, no chat template * clean up a bit * fix tokenizer conversion * correct hparams * try this * fix shexp * ffn_inp_normed * chat template * clean up model conversion * add_bos * add scale_before_ffn * fix order * weight_before_ffn * llm_graph_input_attn_temp * add chunk attn mask * build_inp_attn_scale() * add comment about ggml_repeat * clarify comments * fix build

* gguf-py : support lazy tensor splitting Splitting usually involves returning tuples of tensors, which need to be handled properly to avoid early eager evaluation. * gguf-py : fix flake8 lint

…uffer_set_tensor" (ggml-org#12812) * Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…" This reverts commit 518a014. * Update ggml/src/ggml-sycl/ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp * rm tail space

…rg#12785) * Update ChatScreen.tsx * useAutosizeTextarea.ts useAutosizeTextarea to encapsulate the logic. * Implement responsive auto-sizing chat textarea Replaces the manual textarea resizing with an automatic height adjustment based on content. - `useChatTextarea` hook to manage textarea state and auto-sizing logic via refs, preserving the optimization - Textarea now grows vertically up to a maximum height (`lg:max-h-48`) on large screens (lg breakpoint and up). - Disables auto-sizing and enables manual vertical resizing (`resize-vertical`) on smaller screens for better mobile usability. - Aligns the "Send" button to the bottom of the textarea (`items-end`) for consistent positioning during resize. * -update compressed index.html.gz after npm run build -refactor: replace OptimizedTextareaValue with AutosizeTextareaApi in VSCode context hook * chore: normalize line endings to LF refactor: AutosizeTextareaApi -> chatTextareaApi * refactor: Rename interface to PascalCase --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

…text (ggml-org#12824) Signed-off-by: dm4 <[email protected]>

…12825) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci

…ml-org#12834)

This allows BF16 KV-cache on CUDA.

hipudding and others added 22 commits April 7, 2025 17:10

CANN: Refactor to reduce duplicate code (ggml-org#12731)

d0d5b22

* CANN: Refactor to reduce duplicate code * CANN: fix review comment

CANN: fix typo in ggml-cann (ggml-org#12733)

52b3d71

ci : no curl on ggml-ci (ggml-org#12796)

e391d3e

sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…

518a014

…et_tensor (ggml-org#12734)

CUDA: don't convert BF16 weights to FP32 (ggml/1174)

36ca8b3

* add bf16 support * use convert_from_bf16_cuda instead of convert_unary_cuda for f32 * revert 7ec5085 * move functionality into convert_unary with constexpr

ggml : simplify Arm fp16 CPU logic (ggml/1177)

ff067db

* ggml : simlpify Arm fp16 CPU logic ggml-ci * cont : bring back CUDA/MUSA checks ggml-ci

sync : ggml

a4e46e2

ggml-ci

cuda : fix HIP and MUSA BF16 (#0)

1a1ab7e

ggml-ci

hellaswag: display estimated score confidence interval (ggml-org#12797)

4ccea21

opencl: better identify Adreno GPU (ggml-org#12760)

8297401

gguf-py : support lazy tensor splitting (ggml-org#12809)

a226bc7

* gguf-py : support lazy tensor splitting Splitting usually involves returning tuples of tensors, which need to be handled properly to avoid early eager evaluation. * gguf-py : fix flake8 lint

arg : Including limits file on AIX (ggml-org#12822)

1d343b4

llava: add more helper functions to check projector types in clip con…

2dabf75

…text (ggml-org#12824) Signed-off-by: dm4 <[email protected]>

server : fix thread.join() on exit (ggml-org#12831)

78a1ba0

llama : fix FA when KV cache is not used (i.e. embeddings) (ggml-org#…

a19b5ce

…12825) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci

llava: improve clip_ctx destructor to not memleak load_image_size (gg…

b32efad

…ml-org#12834)

cuda : add f32 to bf16 copy op (ggml-org#12806)

7538246

This allows BF16 KV-cache on CUDA.

Minh141120 force-pushed the update-dev-from-master-2025-04-09-04-07 branch from c2504d5 to 7538246 Compare April 9, 2025 04:17

chore: test commit

eb000b1

Minh141120 closed this Apr 9, 2025

Minh141120 deleted the update-dev-from-master-2025-04-09-04-07 branch April 9, 2025 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5083 #50

Sync master with upstream release b5083 #50

Uh oh!

jan-service-account commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Sync master with upstream release b5083 #50

Sync master with upstream release b5083 #50

Uh oh!

Conversation

jan-service-account commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants