Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8263e97
sync : ggml
ggerganov May 27, 2025
525d51b
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …
0cc4m May 19, 2025
380aae7
sycl : Overcoming workaround for mmap() allocation on Windows (llama/…
s-Nick May 20, 2025
2192394
metal : fix typo in FA kernel comments (llama/13651)
ggerganov May 20, 2025
1e7f745
sycl: disable reorder for sycl mulmat (llama/13536)
sgeor255 May 20, 2025
5e90842
CUDA: skip fully masked-out KV in FA vec kernel (llama/13584)
JohannesGaessler May 20, 2025
e5f0301
vulkan: fix warnings (llama/13626)
netrunnereve May 20, 2025
55050e4
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENT…
yeahdongcn May 21, 2025
39a2783
ggml : add ggml_gelu_erf() (llama/13667)
ngxson May 21, 2025
547c3cd
opencl: fix couple crashes (llama/12795)
May 21, 2025
0079d89
opencl: Add support for multiple devices (llama/12622)
May 21, 2025
09499e6
SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587)
May 22, 2025
28c7ab8
sycl : Remove waits from function calls (llama/13702)
s-Nick May 22, 2025
c10deb2
use LOG_WARN to replace `std::cerr` (llama/13657)
foldl May 23, 2025
8e2a08b
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't s…
jeffbolznv May 23, 2025
1fecf05
vulkan: support CPY from any type to itself (llama/13695)
jeffbolznv May 23, 2025
39dc9dd
ggml : fix the order of ggml_unary_op (llama/13718)
ngxson May 23, 2025
85c583d
CANN: Support MUL_MAT_ID for q8_0 and q4_0 (llama/13705)
noemotiovon May 23, 2025
d3b5380
CUDA: fix race condition in FA vector kernels (llama/13742)
JohannesGaessler May 24, 2025
093dfaa
ggml : add ggml_gelu_erf() CUDA kernel (llama/13719)
ngxson May 24, 2025
3df6086
ggml-cpu : set openmp wait time if not set (llama/13758)
slaren May 24, 2025
4d4a5d7
SYCL: revert "sycl: simplify bin_bcast_kernel (ggml/13383)" (llama/13…
qnixsynapse May 25, 2025
e2ba135
CANN: Add the basic supports of Flash Attention kernel (llama/13627)
shibizhao May 26, 2025
6370037
vulkan: mark IM2COL as supporting non-contig (llama/13783)
jeffbolznv May 26, 2025
02ed80e
sycl: Add more debug prints (llama/13640)
Rbiessy May 26, 2025
a26a34b
SYCL: Add non contiguous support in RMS_NORM and NORM kernels (llama/…
qnixsynapse May 26, 2025
45f2e0f
cuda : avoid cuGetErrorString (llama/13791)
ggerganov May 26, 2025
5eabeb7
ggml : allow CUDA graphs when using pipeline parallelism (llama/13814)
slaren May 27, 2025
4e61025
ggml-cpu: x86 feature detection is specific to x86 (llama/13811)
ckastner May 27, 2025
65206c2
ggml : riscv: add xtheadvector support (llama/13720)
xctan May 27, 2025
4575811
sync : ggml
ggerganov May 27, 2025
e2ac490
talk-llama : sync llama.cpp
ggerganov May 27, 2025
255eac6
sync : fix builds - musa, ruby
ggerganov May 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .devops/main-musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.1
ARG MUSA_VERSION=rc4.0.1
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-devel-ubuntu${UBUNTU_VERSION}
# Target the MUSA runtime image
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_MUSA_DEV_CONTAINER} AS build
WORKDIR /app
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ Run the inference examples as usual, for example:
## Moore Threads GPU support

With Moore Threads cards the processing of the models is done efficiently on the GPU via muBLAS and custom MUSA kernels.
First, make sure you have installed `MUSA SDK rc3.1.1`: https://developer.mthreads.com/sdk/download/musa?equipment=&os=&driverVersion=&version=rc3.1.1
First, make sure you have installed `MUSA SDK rc4.0.1`: https://developer.mthreads.com/sdk/download/musa?equipment=&os=&driverVersion=&version=rc4.0.1

Now build `whisper.cpp` with MUSA support:

Expand Down
1 change: 1 addition & 0 deletions bindings/ruby/ext/options.rb
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ def configure
bool "GGML_VULKAN_SHADER_DEBUG_INFO"
pending "GGML_VULKAN_VALIDATE"
bool "GGML_VXE"
bool "GGML_XTHEADVECTOR"
filepath "GIT_EXE"
filepath "MATH_LIBRARY"
filepath "METALKIT_FRAMEWORK"
Expand Down
4 changes: 3 additions & 1 deletion examples/talk-llama/llama-batch.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "llama-batch.h"

#include <cassert>
#include <cstring>
#include <algorithm>

Expand Down Expand Up @@ -281,9 +282,10 @@ llama_batch_allocr::llama_batch_allocr(struct llama_batch in_batch, llama_pos p0
batch = in_batch;
GGML_ASSERT(batch.n_tokens > 0);
if (!batch.pos) {
assert(p0 >= 0);
pos.resize(batch.n_tokens);
for (int32_t i = 0; i < batch.n_tokens; i++) {
pos[i] = i + p0;
pos[i] = p0 + i;
}
batch.pos = pos.data();
}
Expand Down
Loading
Loading