Skip to content
Closed

SDPA f32 #15064

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
388a133
Add ggml-openvino base files
YangleiZouIntel Oct 29, 2024
315f6af
add openvino as optional backend for Llama.cpp ggml
zhanmyz Nov 13, 2024
2e0c4cf
* Configure the device(default CPU) that uses OpenVINO to compile th…
zhanmyz Nov 19, 2024
d363ff6
Solve the issue of abnormal model output caused by using OpenVINO ADD…
zhanmyz Nov 21, 2024
0d36b77
Add OpenVINO MUL operator to GGML of Llama.cpp.
zhanmyz Dec 2, 2024
76a273b
Add compile options
zhanmyz Dec 2, 2024
fa4a6ab
add OpenVINO frontend convert process steps
zhanmyz Dec 4, 2024
13a8828
add get openvino available ops function
zhanmyz Dec 5, 2024
0a96956
Add PoC of integration of openvino frontend. Main changes: ggml-ov-fr…
yumengbo Nov 16, 2024
9f976bf
Implement GgmlOvDecoder. Add dump functions.
yumengbo Nov 19, 2024
b052547
Convert subgraph with add, sub, mul, div op to ov model and do infer …
yumengbo Nov 22, 2024
09ba65b
Add GGML_OV_FRONTEND option. Add readme.
yumengbo Nov 22, 2024
cde1059
Change output for infer request to set output tensor. Support scale, …
yumengbo Dec 5, 2024
3546a85
add GET_ROWS operator of OpenVINO to GGML of llama.cpp
zhanmyz Dec 9, 2024
e9668e4
Update build.md and add operation mapping(GGML to OpenVINO)
zhanmyz Dec 10, 2024
694d9c9
add the rms_norm operator implemented using OpenVINO to the GGML back…
zhanmyz Dec 16, 2024
d0426f4
Fix issue for output memory copy of infer request
yumengbo Dec 12, 2024
ad958b0
Change to implementation following pytorch frontend
yumengbo Dec 12, 2024
e0749e7
Add support for UNARY SILU op . Fix pytorch impl bugs.
yumengbo Dec 17, 2024
2ba30f7
Support Softmax op
yumengbo Dec 18, 2024
ec40661
Support Softmax op
yumengbo Dec 18, 2024
c6afac5
Support ROPE op.
yumengbo Dec 21, 2024
62ca2d5
Add support for RMS_NORM OP
zhanmyz Dec 19, 2024
8c9a672
Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML ba…
zhanmyz Jan 14, 2025
f591d22
Move CPY from GGML OV Backend to OV Frontend
zhanmyz Jan 22, 2025
db56d18
add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops
zhanmyz Feb 18, 2025
4751087
add implementation of CPY when the output tensor is non-contiguous
zhanmyz Feb 19, 2025
dedd2fa
add tmp source code files
zhanmyz Feb 25, 2025
c73594c
Execute singel CONT operator is OK
zhanmyz Feb 25, 2025
79d3824
Execute CONT & VIEW operators in OV Frontend is OK
zhanmyz Mar 1, 2025
37c99aa
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion o…
zhanmyz Mar 3, 2025
28cda3e
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX…
zhanmyz Mar 5, 2025
e7a963f
Change the input parameter shape of CONT operator
zhanmyz Mar 5, 2025
40986b8
Change the input and ouput node shape of MUL_MAT operator
zhanmyz Mar 5, 2025
94cc9a5
Change the input and ouput node shape of MUL_MAT operator
zhanmyz Mar 5, 2025
8c00c2f
change CONT and MULMAT input node shape
zhanmyz Mar 6, 2025
761e5f4
All adjacent ops can conversion but calculation result is wrong and n…
zhanmyz Mar 6, 2025
27c803a
1. All operators implemented using OpenVINO can be successfully execu…
zhanmyz Mar 9, 2025
5a0bb25
1. Update the implementation of CPY node when it's non-contiguous
zhanmyz Mar 11, 2025
6bf1266
Minor Update
zhanmyz Mar 11, 2025
7910549
Try to add VIEW node to OV Frontend and have some issues that need to…
zhanmyz Mar 12, 2025
f5ea647
1. In the Prompt process and predict first token stage, the PERMUTE n…
zhanmyz Mar 15, 2025
895bf6d
add debug info
zhanmyz Mar 17, 2025
1608df2
Process Prompt and predict first token is OK
zhanmyz Mar 26, 2025
04955d0
1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase o…
zhanmyz Mar 31, 2025
8f1a8e3
1. Delete some comments
zhanmyz Mar 31, 2025
8f84ab9
* Use find_package in CMake to configure OpenVINO
wine99 Apr 14, 2025
d5d215a
change op mappings to list in openvino_supports_op
wine99 Apr 15, 2025
355846c
2nd+ token correct by fix CPY in OV, remove single op backend compute…
wine99 Apr 15, 2025
629cabc
Arbitrary token len (>32) work; Fix bug in mulmat
wine99 Apr 17, 2025
96a0306
FEAT: do PERMUTE eagerly
wine99 Apr 21, 2025
d5476c0
FEAT: Add interleaved mode for ROPE
wine99 Apr 22, 2025
4dace1e
REFACTOR: support weigts as constant
wine99 Apr 28, 2025
8cc9230
STYLE: minor refactor
wine99 Apr 28, 2025
9619222
PERF: share const nodes for weights for diff infer
wine99 Apr 28, 2025
003eb05
BUILD: update build doc, add cmake preset, add CACHE_DIR env var
wine99 Apr 29, 2025
9c7956b
FEAT: improve debug capability
wine99 Apr 30, 2025
1a21dca
PERF: compile once (dynamic graph + cache)
wine99 May 8, 2025
ba3131d
Rebase - Bring up to date and fix build process
virajwad May 9, 2025
911b561
fix build error
wine99 May 13, 2025
d1eac74
FIX: backend buffer type issue
wine99 May 13, 2025
0a909ad
STYLE: clang-format
wine99 May 9, 2025
9e75636
FEAT: Add all conversion code from ov side
wine99 May 9, 2025
b8ddb80
PERF: favor low precision matmul
wine99 May 13, 2025
6a8c916
STYLE and minor REFACTOR
wine99 May 13, 2025
b7ee911
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
wine99 May 14, 2025
a315639
FIX: input shape of KQ_mask
wine99 May 14, 2025
b5b98a9
PERF: add weight constant in parallel
wine99 May 14, 2025
aa7cdf3
FIX: set_max_token_len
wine99 May 16, 2025
b369165
PERF: use Slice+Concat in writing cache_v
wine99 May 16, 2025
31abbe7
Update build doc
wine99 May 20, 2025
e98a534
Add cgraph tensor output name to OV op name
wine99 May 22, 2025
690ea57
Update openvino build instructions
ravi9 May 29, 2025
8ea8c31
Add initial NPU support
wine99 May 27, 2025
18dd664
draft NPU support version 2: prefill + kvcache
wine99 May 29, 2025
a8acce8
NPU support version 2: prefill + kvcache
wine99 Jun 3, 2025
7b918ae
Change due to ggml cgraph changes, not correct yet
wine99 Jun 4, 2025
3da0097
Change due to ggml cgraph changes, llama-3.2 CPU work
wine99 Jun 16, 2025
74c2268
Add AMD64 to CMakeLists
wine99 Jun 16, 2025
78f7149
Change due to ggml cgraph changes, all device work
wine99 Jun 16, 2025
9a51b00
Refactor: clean, fix warning
wine99 Jun 20, 2025
bb587b0
Update clang-format
wine99 Jun 23, 2025
f808452
Statful transformation for CPU GPU
wine99 Jun 26, 2025
f5bc7dc
Add SwiGLU
wine99 Jul 3, 2025
e8a8e35
Fuse to SDPA
wine99 Jul 3, 2025
0803a61
Replace Concat with Broadcast in MulMat for GQA
wine99 Jul 4, 2025
ca7731b
Pull out indices creation for kv cache update
wine99 Jul 6, 2025
58bfd1a
Refactor: remove past_token_len from extra_inputs
wine99 Jul 9, 2025
53ea793
Fix Phi3 SwiGLU and SoftMax
wine99 Jul 9, 2025
877761e
Pull out sin cos from rope
wine99 Jul 9, 2025
7da6294
Reduce memory: free ov weights node after graph conversion
wine99 Jul 11, 2025
a563598
Fix CPY due to cgraph change
wine99 Jul 17, 2025
bf74481
Added OpenVINO CI/CD. Updated docs
ravi9 Jul 18, 2025
967fff0
Fix llama-cli
wine99 Jul 23, 2025
3acc974
Fix Phi3 ROPE; Add test-backend-ops
wine99 Jul 21, 2025
b19ebd4
Fix NPU
wine99 Jul 23, 2025
032c0ac
Fix llama-bench; Clang-format
wine99 Jul 24, 2025
fb6bad1
Fix llama-perplexity
wine99 Jul 24, 2025
fb1a055
temp. changes for mark decomp
cavusmustafa Jul 29, 2025
f111537
matmul in fp32
wine99 Jul 29, 2025
ab12066
mulmat input conversion fix
cavusmustafa Jul 30, 2025
863f920
mulmat type conversion update
cavusmustafa Jul 30, 2025
7dc7bc3
add mark decomp pass
cavusmustafa Jul 30, 2025
c496612
Revert changes in fuse_to_sdpa
wine99 Jul 30, 2025
e6201f3
Update build.md
ravi9 Jul 31, 2025
9e5f8bd
Fix test-backend-ops
wine99 Jul 31, 2025
2bfd59b
Skip test-thread-safety; Run ctest only in ci/run.sh
wine99 Jul 31, 2025
44847a9
Merge pull request #6 from ravi9/ci
ravi9 Jul 31, 2025
ca5e725
Use CiD for NPU
wine99 Aug 1, 2025
b91f9e0
SDPA in f32
wine99 Aug 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
ARG OPENVINO_VERSION_MAJOR=2025.2
ARG OPENVINO_VERSION_FULL=2025.2.0.19140.c01cd93e24d
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
ARG http_proxy=
ARG https_proxy=

## Build Image
FROM ubuntu:${UBUNTU_VERSION} AS build

# Pass proxy args to build stage
ARG http_proxy
ARG https_proxy

RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
gnupg \
wget \
git \
cmake \
ninja-build \
build-essential \
libtbb12 \
libcurl4-openssl-dev && \
rm -rf /var/lib/apt/lists/*

# Install OpenVINO for Ubuntu 24.04
ARG OPENVINO_VERSION_MAJOR
ARG OPENVINO_VERSION_FULL
RUN mkdir -p /opt/intel && \
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \
cd - && \
ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

ENV OpenVINO_DIR=/opt/intel/openvino

WORKDIR /app

COPY . .

# Build Stage
RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON && \
cmake --build build/ReleaseOV -j$(nproc)"

# Copy all necessary libraries
RUN mkdir -p /app/lib && \
find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \
find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \
find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \;

# Create runtime directories and copy binaries
RUN mkdir -p /app/full \
&& cp build/ReleaseOV/bin/* /app/full/ \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base Runtime Image
FROM ubuntu:${UBUNTU_VERSION} AS base

# Pass proxy args to runtime stage
ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app/

### Full (all binaries)
FROM base AS full

ARG http_proxy
ARG https_proxy

COPY --from=build /app/full /app/

WORKDIR /app

RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
python3 \
python3-venv \
python3-pip && \
python3 -m venv /ov-venv && \
/ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"]


### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app/

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app/

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
39 changes: 39 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,45 @@ jobs:
-DGGML_SYCL_F16=ON
cmake --build build --config Release -j $(nproc)

ubuntu-24-cmake-openvino:
runs-on: ubuntu-24.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: ccache
uses: hendrikmuhs/[email protected]
with:
key: ubuntu-24-cmake-openvino-no-preset-v1
evict-old-files: 1d

- name: Dependencies
id: depends
run: |
export OPENVINO_VERSION_MAJOR=2025.2
export OPENVINO_VERSION_FULL=2025.2.0.19140.c01cd93e24d
sudo apt-get update
sudo apt-get install -y build-essential libcurl4-openssl-dev libtbb12 cmake ninja-build python3-pip curl wget tar
sudo mkdir -p /opt/intel
wget -O openvino_${OPENVINO_VERSION_MAJOR}.tgz https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz
tar -xf openvino_${OPENVINO_VERSION_MAJOR}.tgz
sudo mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR}
rm openvino_${OPENVINO_VERSION_MAJOR}.tgz
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR}
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh && cd -
sudo ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

- name: Build
id: cmake_build
run: |
source /opt/intel/openvino/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
cmake --build build/ReleaseOV --config Release -j $(nproc)

build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
- { tag: "musa", dockerfile: ".devops/musa.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: true }
- { tag: "intel", dockerfile: ".devops/intel.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: true }
- { tag: "vulkan", dockerfile: ".devops/vulkan.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false }
- { tag: "openvino", dockerfile: ".devops/openvino.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false }
# Note: the rocm images are failing due to a compiler error and are disabled until this is fixed to allow the workflow to complete
#- {tag: "rocm", dockerfile: ".devops/rocm.Dockerfile", platforms: "linux/amd64,linux/arm64", full: true, light: true, server: true, free_disk_space: true }
steps:
Expand Down
57 changes: 57 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,63 @@ jobs:
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.zip
name: llama-bin-ubuntu-vulkan-x64.zip

ubuntu-24-openvino:
runs-on: ubuntu-24.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: ccache
uses: hendrikmuhs/[email protected]
with:
key: ubuntu-24-cmake-openvino-release-no-preset-v1
evict-old-files: 1d

- name: Dependencies
id: depends
run: |
export OPENVINO_VERSION_MAJOR=2025.2
export OPENVINO_VERSION_FULL=2025.2.0.19140.c01cd93e24d
sudo apt-get update
sudo apt-get install -y build-essential libcurl4-openssl-dev libtbb12 cmake ninja-build python3-pip curl wget tar
sudo mkdir -p /opt/intel
wget -O openvino_${OPENVINO_VERSION_MAJOR}.tgz https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz
tar -xf openvino_${OPENVINO_VERSION_MAJOR}.tgz
sudo mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR}
rm openvino_${OPENVINO_VERSION_MAJOR}.tgz
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR}
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh && cd -
sudo ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

- name: Build
id: cmake_build
run: |
source /opt/intel/openvino/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
cmake --build build/ReleaseOV --config Release -j $(nproc)

- name: Determine tag name
id: tag
uses: ./.github/actions/get-tag-name

- name: Pack artifacts
id: pack_artifacts
run: |
cp LICENSE ./build/ReleaseOV/bin/
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-openvino-x64.zip ./build/ReleaseOV/bin/*

- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-openvino-x64.zip
name: llama-bin-ubuntu-openvino-x64.zip

windows-cpu:
runs-on: windows-2025

Expand Down
20 changes: 20 additions & 0 deletions CMakePresets.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
{
"version": 4,
"configurePresets": [
{
"name": "ReleaseOV",
"generator": "Ninja",
"binaryDir": "${sourceDir}/build/${presetName}",
"installDir": "${sourceDir}/build/install/${presetName}",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release",
"GGML_OPENVINO": true,
"OpenVINO_DIR": "$env{OPENVINO_LLAMA_PATH}/build/Release"
}
},
{
"name": "ReleaseCPU",
"generator": "Ninja",
"binaryDir": "${sourceDir}/build/${presetName}",
"installDir": "${sourceDir}/build/install/${presetName}",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "base",
"hidden": true,
Expand Down
12 changes: 12 additions & 0 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
# # with MUSA support
# GG_BUILD_MUSA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
#
# # with OPENVINO support
# GG_BUILD_OPENVINO=1 GG_BUILD_LOW_PERF=1 GGML_OPENVINO_DEVICE=CPU bash ./ci/run.sh ./tmp/results ./tmp/mnt
#

if [ -z "$2" ]; then
echo "usage: $0 <output-dir> <mnt-dir>"
Expand Down Expand Up @@ -93,6 +96,15 @@ if [ ! -z ${GG_BUILD_MUSA} ]; then
MUSA_ARCH=${MUSA_ARCH:-21}
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=${MUSA_ARCH}"
fi

if [ ! -z ${GG_BUILD_OPENVINO} ]; then
if [ -z ${OpenVINO_DIR} ]; then
echo "OpenVINO_DIR not found, please install OpenVINO via archives and enable it by:"
echo "source /opt/intel/openvino/setupvars.sh"
exit 1
fi
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_OPENVINO=ON"
fi
## helpers

# download a file if it does not exist or if it is outdated
Expand Down
Loading