Releases: ochafik/llama.cpp
Releases · ochafik/llama.cpp
b4702
ggml-cpu : add chunking support to mul_mat_id (#11666) * ggml-cpu : add chunking support to mul_mat_id * allocate chunk counter in wdata parallelize src1 quantization by column to allows parallelization even when there is only one row * disable for arm * cleanup * better way to disable for arm * fix uninitialized counter when using 1 thread only * revert test-backend-ops changes
b4692
CUDA: fix CUDART_VERSION checks (#11821)
b4677
There's a better way of clearing lines (#11756) Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <[email protected]>
b4671
ggml: Fix data race in ggml threadpool (#11736) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.
b4636
`tool-call`: command r7b fix for normal responses (#11608) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat
b4628
`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to …
b4622
server : (webui) Fix Shift+Enter handling (#11609) * Fix Shift+Enter handling `exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway * build index.html.gz --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b4615
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…
b4610
`sync`: minja (https://github.com/google/minja/commit/418a2364b56dc9b…
b4609
Implement s3:// protocol (#11511) For those that want to pull from s3 Signed-off-by: Eric Curtin <[email protected]>