Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
52b3d71
CANN: fix typo in ggml-cann (#12733)
jeffzhou2000 Apr 7, 2025
bd3f59f
cmake : enable curl by default (#12761)
ngxson Apr 7, 2025
e391d3e
ci : no curl on ggml-ci (#12796)
ngxson Apr 7, 2025
518a014
sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…
jeffzhou2000 Apr 7, 2025
995083e
cpu: move all the operators into a separate c++ file (except mul_mat)…
cmdr2 Apr 2, 2025
36ca8b3
CUDA: don't convert BF16 weights to FP32 (ggml/1174)
CISC Apr 4, 2025
ff067db
ggml : simplify Arm fp16 CPU logic (ggml/1177)
ggerganov Apr 7, 2025
a4e46e2
sync : ggml
ggerganov Apr 7, 2025
1a1ab7e
cuda : fix HIP and MUSA BF16 (#0)
ggerganov Apr 7, 2025
4ccea21
hellaswag: display estimated score confidence interval (#12797)
stduhpf Apr 7, 2025
8297401
opencl: better identify Adreno GPU (#12760)
lhez Apr 7, 2025
1466621
llama : Support llama 4 text-only (#12791)
ngxson Apr 7, 2025
a226bc7
gguf-py : support lazy tensor splitting (#12809)
compilade Apr 8, 2025
656babd
Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_b…
NeoZhangJianyu Apr 8, 2025
8ca6e1c
server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785)
characharm Apr 8, 2025
1d343b4
arg : Including limits file on AIX (#12822)
mehendarkarprajwal Apr 8, 2025
2dabf75
llava: add more helper functions to check projector types in clip con…
dm4 Apr 8, 2025
78a1ba0
server : fix thread.join() on exit (#12831)
ngxson Apr 8, 2025
a19b5ce
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
ggerganov Apr 8, 2025
b32efad
llava: improve clip_ctx destructor to not memleak load_image_size (#1…
mattjcly Apr 8, 2025
7538246
cuda : add f32 to bf16 copy op (#12806)
CISC Apr 8, 2025
7ecd780
vulkan: Use fp16 for the flash attention P*V multiplication (#12783)
jeffbolznv Apr 9, 2025
0090950
vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory …
jeffbolznv Apr 9, 2025
6e1c4ce
CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786)
noemotiovon Apr 9, 2025
47277d6
readme : add rpc backend (#12842)
ggerganov Apr 9, 2025
65a69e6
clip : do not print ftype (#12832)
ngxson Apr 9, 2025
381603a
ci: detach common from the library (#12827)
pminev Apr 9, 2025
8ed7124
sycl: update documentation to use -no-cnv (#12845)
Rbiessy Apr 9, 2025
d9a63b2
musa: enable freediskspace for docker image build (#12839)
yeahdongcn Apr 9, 2025
d3bd719
llama : Support Qwen3 and Qwen3MoE (#12828)
bozheng-hit Apr 9, 2025
2391506
ggml-impl.h: fix build on POWER9 (#12855)
pkubaj Apr 9, 2025
31f7803
ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856)
pkubaj Apr 9, 2025
b0091ec
docker : added all CPU to GPU images (#12749)
rudiservo Apr 9, 2025
11d07e1
Fixes #12823 (#12830)
mehendarkarprajwal Apr 9, 2025
fe5b78c
CANN: Support more ops (#12841)
noemotiovon Apr 10, 2025
64eda5d
convert : ability to lazy-load safetensors remotely without downloadi…
ngxson Apr 10, 2025
8b9cc7c
llava : introduce libmtmd (#12849)
ngxson Apr 10, 2025
e4bf72d
scripts : fix sync-ggml-am.sh
ggerganov Apr 10, 2025
459895c
ggml : add more generic custom op, remove deprecated custom ops (ggml…
slaren Apr 9, 2025
fe92821
ggml : add bilinear upscale support (ggml/1185)
slaren Apr 9, 2025
cb79c2e
ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml…
cmdr2 Apr 10, 2025
eb420e1
sync : ggml
ggerganov Apr 10, 2025
1d2b613
tests : fix init order (#0)
ggerganov Apr 10, 2025
47ba87d
sync : ggml
ggerganov Apr 10, 2025
0fed24c
ggml: fix compilation error s390x (#12848)
taronaeo Apr 11, 2025
8b91d53
llama : correct rms norm for llama 4 (#12882)
ngxson Apr 11, 2025
5b1f13c
convert : proper tensor name mapping for llama4 (#12870)
ngxson Apr 11, 2025
12e9158
xcf : add check for visionos build version (#12854)
danbev Apr 11, 2025
8ac9f5d
ci : Replace freediskspace to free_disk_space in docker.yml (#12861)
yeahdongcn Apr 11, 2025
ec6c09d
convert : Llama4 RoPE fix (#12889)
danielhanchen Apr 11, 2025
fccf9ca
SYCL: Add fp16 type support to unary op kernels (#12788)
qnixsynapse Apr 11, 2025
0c50923
clip : use smart pointer (⚠️ breaking change) (#12869)
ngxson Apr 11, 2025
06bb53a
llama-model : add Glm4Model implementation for GLM-4-0414 (#12867)
zRzRzRzRzRzRzR Apr 11, 2025
b2034c2
contrib: support modelscope community (#12664)
tastelikefeet Apr 11, 2025
578754b
sycl: Support sycl_ext_oneapi_limited_graph (#12873)
EwanC Apr 11, 2025
68b08f3
common : Define cache directory on FreeBSD (#12892)
yurivict Apr 11, 2025
b6930eb
`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 …
ochafik Apr 11, 2025
e8a6263
rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903)
yurivict Apr 11, 2025
c94085d
server : add VSCode's Github Copilot Chat support (#12896)
ggerganov Apr 11, 2025
e59ea53
llava: Fix cpu-only clip image encoding sefault (#12907)
mattjcly Apr 12, 2025
a483757
vulkan: use aligned loads for flash attention mask (#12853)
jeffbolznv Apr 12, 2025
bc091a4
common : Define cache directory on AIX (#12915)
mehendarkarprajwal Apr 12, 2025
71e90e8
quantize: Handle user-defined quantization levels for additional tens…
EAddario Apr 13, 2025
307bfa2
ggml: disable CUDA graphs for unsupported DUP and CONT node types (#1…
agray3 Apr 13, 2025
e959d32
ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into…
SongXiaoXi Apr 14, 2025
a25355e
cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal w…
cmdr2 Apr 11, 2025
526739b
sync : ggml
ggerganov Apr 14, 2025
81c7e64
dsiable curl lib check, this action is missed by commit bd3f59f81289b…
NeoZhangJianyu Apr 14, 2025
c772d54
rpc : use ggml_context_ptr (#12938)
rgerganov Apr 14, 2025
75afa0a
SYCL: Fix im2col (#12910)
qnixsynapse Apr 14, 2025
d6d2c2a
Add performance print for gemma3 in example (#12929)
Russyyds Apr 14, 2025
b0c75ac
CANN: Optimize CANN buffer pool memory management (#12875)
bachelor-dou Apr 15, 2025
0019279
CANN: Opt ROPE optimization (#12865)
noemotiovon Apr 15, 2025
eccc7a1
ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829)
Srihari-mcw Apr 15, 2025
daa4228
llama : DeepSeek V2/V3 MLA implementation (#12801)
jukofyork Apr 15, 2025
5106764
SYCL: Add ROPE vision kernel (#12887)
qnixsynapse Apr 15, 2025
84778e9
CUDA/HIP: Share the same unified memory allocation logic. (#12934)
hjc4869 Apr 15, 2025
54a7272
CANN: Add x86 build ci (#12950)
hipudding Apr 15, 2025
f8f820c
metal : add FA-vec kernels for head size 96 (#12952)
ggerganov Apr 15, 2025
80f19b4
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
lhez Apr 15, 2025
b43d89e
CANN: Add 310P operator support check (#12962)
noemotiovon Apr 16, 2025
015022b
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (…
jeffbolznv Apr 16, 2025
12b1750
opencl: fix incorrect local_size index in profiling log (#12868)
kimminsu38oo Apr 16, 2025
971f245
llama : recognize IBM Granite 3.3 FIM tokens (#12988)
Noeda Apr 17, 2025
7a395f6
CANN: Add support for async operator submission (#12864)
hipudding Apr 17, 2025
207c22e
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970)
agray3 Apr 17, 2025
2f74c35
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
ggerganov Apr 17, 2025
2db9ba1
rpc : add RPC_CMD_HELLO (#12955)
rgerganov Apr 18, 2025
b9154ec
mtmd : add methods to access `mtmd_image_tokens` (#12906)
ngxson Apr 18, 2025
8d66005
SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975)
qnixsynapse Apr 18, 2025
35370ba
server : use std::move whenever possible (#12936)
ngxson Apr 18, 2025
aff9d10
gguf-py : GGUF Editor GUI - Python + Qt6 (#12930)
christopherthompson81 Apr 18, 2025
6408210
main : Fix Ctrl+D/newline handling (#12951)
danielzgtg Apr 18, 2025
37b9f0d
clip : refactor, add `image_manipulation` and `llava_uhd` classes (#1…
ngxson Apr 19, 2025
fb28f4f
gguf-py : fix upload python package workflow (#13020)
CISC Apr 19, 2025
0013715
Disable CI cross-compile builds (#13022)
bandoti Apr 19, 2025
e1b1a5a
Merge branch 'layla-build' into merge
l3utterfly Apr 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ WORKDIR /app
COPY . .

RUN if [ "$TARGETARCH" = "amd64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
elif [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
else \
echo "Unsupported architecture"; \
exit 1; \
Expand Down
2 changes: 1 addition & 1 deletion .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ COPY . .
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
&& export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
echo "Building with dynamic libs" && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-cli-cann.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
ARG ASCEND_VERSION=8.0.rc2.alpha003-910b-openeuler22.03-py3.8
ARG ASCEND_VERSION=8.1.RC1.alpha001-910b-openeuler22.03-py3.10

FROM ascendai/cann:$ASCEND_VERSION AS build

WORKDIR /app

COPY . .

RUN yum install -y gcc g++ cmake make
RUN yum install -y gcc g++ cmake make libcurl-devel
ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ENV LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling:${LD_LIBRARY_PATH}
Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ COPY . .
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
6 changes: 3 additions & 3 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# gfx906 is deprecated
#check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/reference/system-requirements.html

#ARG ROCM_DOCKER_ARCH='gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102'
ARG ROCM_DOCKER_ARCH=gfx1100
ARG ROCM_DOCKER_ARCH='gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102'
#ARG ROCM_DOCKER_ARCH=gfx1100

# Set nvcc architectured
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
Expand All @@ -40,7 +40,7 @@ WORKDIR /app
COPY . .

RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
&& cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib \
Expand Down
2 changes: 1 addition & 1 deletion .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 -DLLAMA_CURL=1 && \
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1 -DLLAMA_CURL=1 -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
Expand Down
25 changes: 25 additions & 0 deletions .github/actions/windows-setup-curl/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: 'Windows - Setup CURL'
description: 'Composite action, to be reused in other workflow'
inputs:
curl_version:
description: 'CURL version'
required: false
default: '8.6.0_6'
outputs:
curl_path:
description: "Path to the downloaded libcurl"
value: ${{ steps.get_libcurl.outputs.curl_path }}

runs:
using: "composite"
steps:
- name: libCURL
id: get_libcurl
shell: powershell
env:
CURL_VERSION: ${{ inputs.curl_version }}
run: |
curl.exe -o $env:RUNNER_TEMP/curl.zip -L "https://curl.se/windows/dl-${env:CURL_VERSION}/curl-${env:CURL_VERSION}-win64-mingw.zip"
mkdir $env:RUNNER_TEMP/libcurl
tar.exe -xvf $env:RUNNER_TEMP/curl.zip --strip-components=1 -C $env:RUNNER_TEMP/libcurl
echo "curl_path=$env:RUNNER_TEMP/libcurl" >> $env:GITHUB_OUTPUT
1 change: 0 additions & 1 deletion .github/workflows/bench.yml.disabled
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,6 @@ jobs:
cmake -B build \
-DGGML_NATIVE=OFF \
-DLLAMA_BUILD_SERVER=ON \
-DLLAMA_CURL=ON \
-DLLAMA_CUBLAS=ON \
-DCUDAToolkit_ROOT=/usr/local/cuda \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
Expand Down
9 changes: 6 additions & 3 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ jobs:
sudo apt-get install -y --no-install-recommends \
build-essential \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu
g++-14-riscv64-linux-gnu \
libcurl4-openssl-dev:riscv64

- name: Build
run: |
Expand Down Expand Up @@ -59,7 +60,8 @@ jobs:
glslc \
gcc-14-riscv64-linux-gnu \
g++-14-riscv64-linux-gnu \
libvulkan-dev:riscv64
libvulkan-dev:riscv64 \
libcurl4-openssl-dev:riscv64

- name: Build
run: |
Expand Down Expand Up @@ -99,7 +101,8 @@ jobs:
build-essential \
glslc \
crossbuild-essential-arm64 \
libvulkan-dev:arm64
libvulkan-dev:arm64 \
libcurl4-openssl-dev:arm64

- name: Build
run: |
Expand Down
Loading
Loading