Skip to content

Commit 0e838a4

Browse files
Zhen Zhao (Fiona)zhaohbilya-lavrenov
authored
Ollama openvino integration (#953)
* Add new module in README * Add ollama_openvino module * Add ollama_openvino module * Add code of Ollama-OV module * genai support streaming mode * Update genai version(2025.2.0.0.dev20250320) along with exe Disable cgocheck for runtime as well. * Add action compile check * add ollama_openvino_build_and_test.yml * Modify ollama_openvino_build_and_test.yml * modify ollama_openvino_build_and_test.yml go install step * modify workflow * debug workflow * debug workflow * update workflow build * update workflow build * update workflow test * update workflow test * debug workflow * debug workflow * Update mac.yml * Update linux.yml * update workflow test * update workflow test * Update windows.yml * update ollama_ov workflow * update workflow test * update ollama_ov workflow * debug workflow * debug workflow * debug workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow * update ollama_ov workflow --------- Co-authored-by: Zhao,Hongbo <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]>
1 parent ef17290 commit 0e838a4

File tree

797 files changed

+740430
-8
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

797 files changed

+740430
-8
lines changed

.github/workflows/linux.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ jobs:
131131
-DENABLE_WHEEL=ON \
132132
-DENABLE_TESTS=ON \
133133
-DENABLE_INTEL_NPU=OFF \
134+
-DBUILD_ollama_openvino=OFF \
134135
-DCMAKE_CXX_COMPILER_LAUNCHER=${{ env.CMAKE_CXX_COMPILER_LAUNCHER }} \
135136
-DCMAKE_C_COMPILER_LAUNCHER=${{ env.CMAKE_C_COMPILER_LAUNCHER }} \
136137
-S ${OPENVINO_REPO} \

.github/workflows/mac.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ jobs:
112112
-DENABLE_OV_PYTORCH_FRONTEND=OFF \
113113
-DENABLE_CPPLINT=OFF \
114114
-DENABLE_INTEL_NPU=OFF \
115+
-DBUILD_ollama_openvino=OFF \
115116
-S ${{ env.OPENVINO_REPO }} \
116117
-B ${{ env.BUILD_DIR }}
117118
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
name: ollama_openvino_build_and_test
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- 'modules/ollama_openvino/**'
7+
- '.github/workflows/ollama_openvino_build_and_test'
8+
9+
permissions: read-all
10+
11+
jobs:
12+
test_ubuntu20:
13+
runs-on: ubuntu-20.04
14+
steps:
15+
- name: Download repo
16+
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
17+
with:
18+
repository: zhaohb/ollama_ov
19+
path: ollama_ov
20+
21+
- name: Setup python env
22+
uses: actions/setup-python@v4
23+
with:
24+
python-version: '3.10'
25+
26+
- name: Install go
27+
run: |
28+
wget https://go.dev/dl/go1.24.1.linux-amd64.tar.gz
29+
mkdir -p go
30+
tar xvzf go1.24.1.linux-amd64.tar.gz
31+
32+
- name: Download model
33+
run: |
34+
pip install -U huggingface_hub
35+
huggingface-cli download --resume-download OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov --local-dir TinyLlama-1.1B-Chat-v1.0-int4-ov --local-dir-use-symlinks False
36+
tar -zcvf TinyLlama-1.1B-Chat-v1.0-int4-ov.tar.gz TinyLlama-1.1B-Chat-v1.0-int4-ov
37+
38+
- name: Install openvino_genai and init go env and build ollama_ov
39+
run: |
40+
wget https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250320/openvino_genai_ubuntu20_2025.2.0.0.dev20250320_x86_64.tar.gz
41+
tar -xzf openvino_genai_ubuntu20_2025.2.0.0.dev20250320_x86_64.tar.gz
42+
source openvino_genai_ubuntu20_2025.2.0.0.dev20250320_x86_64/setupvars.sh
43+
printenv OpenVINO_DIR
44+
chmod +x ./
45+
export LD_LIBRARY_PATH=${{ github.workspace }}/go/bin:$LD_LIBRARY_PATH
46+
go env -w CGO_ENABLED=1
47+
cd ${{ github.workspace }}/ollama_ov
48+
export GODEBUG=cgocheck=0
49+
export CGO_LDFLAGS=-L$OpenVINO_DIR/../lib/intel64/
50+
export CGO_CFLAGS=-I$OpenVINO_DIR/../include
51+
printenv CGO_LDFLAGS
52+
printenv CGO_CFLAGS
53+
go build -o ollama
54+
mkdir bin
55+
cp ollama ./bin
56+
57+
- name: Create ollama model and test generate
58+
run: |
59+
export PATH=$PATH:${{ github.workspace }}/ollama_ov/bin
60+
source openvino_genai_ubuntu20_2025.2.0.0.dev20250320_x86_64/setupvars.sh
61+
export GODEBUG=cgocheck=0
62+
echo -e 'FROM TinyLlama-1.1B-Chat-v1.0-int4-ov.tar.gz\nModelType "OpenVINO"\nInferDevice "GPU"' > Modelfile
63+
ollama serve &
64+
sleep 10
65+
tmux new-session -d -s createsession 'ollama create TinyLlama-1.1B-Chat-v1.0-int4-ov:v1 -f Modelfile'
66+
sleep 20
67+
tmux new-session -d -s runsession 'ollama run TinyLlama-1.1B-Chat-v1.0-int4-ov:v1 "Who are you? Please give a brief answer" > output.txt'
68+
sleep 60
69+
cat output.txt
70+
71+
72+
73+
74+
75+
76+

.github/workflows/windows.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ jobs:
137137
-DENABLE_PYTHON=ON `
138138
-DENABLE_INTEL_NPU=OFF `
139139
-DENABLE_JS=OFF `
140+
-DBUILD_ollama_openvino=OFF `
140141
-DOPENVINO_EXTRA_MODULES=${{ env.OPENVINO_CONTRIB_REPO }}/modules `
141142
-DCMAKE_CXX_COMPILER_LAUNCHER=${{ env.CMAKE_CXX_COMPILER_LAUNCHER }} `
142143
-DCMAKE_C_COMPILER_LAUNCHER=${{ env.CMAKE_C_COMPILER_LAUNCHER }} `

.gitignore

Lines changed: 0 additions & 7 deletions
This file was deleted.

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This list gives an overview of all modules available inside the contrib reposito
1414
* [**custom_operations**](./modules/custom_operations/): Collection of Custom Operations -- implement Custom Operations with OpenVINO Extensibility Mechanism.
1515
* [**Token Merging**](./modules/token_merging/): adaptation of [Token Merging method](https://arxiv.org/abs/2210.09461) for OpenVINO.
1616
* [**OpenVINO Code**](./modules/openvino_code): VSCode extension for AI code completion with OpenVINO.
17+
* [**Ollama-OpenVINO**](./modules/ollama_openvino): OpenVINO GenAI empowered Ollama which accelerate LLM on Intel platforms(including CPU, iGPU/dGPU, NPU).
1718

1819
## How to build OpenVINO with extra modules
1920
You can build OpenVINO, so it will include the modules from this repository. Contrib modules are under constant development and it is recommended to use them alongside the master branch or latest releases of OpenVINO.
@@ -36,7 +37,7 @@ Additional build instructions are available for the following modules:
3637

3738
* [**nvidia_plugin**](./modules/nvidia_plugin/README.md)
3839
* [**custom_operations**](./modules/custom_operations/README.md)
39-
40+
* [**ollama_OpenVINO**](./modules/ollama_openvino)
4041
## Update the repository documentation
4142
In order to keep a clean overview containing all contributed modules, the following files need to be created/adapted:
4243

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
cmake_minimum_required(VERSION 3.21)
2+
3+
project(Ollama C CXX)
4+
5+
include(CheckLanguage)
6+
7+
find_package(Threads REQUIRED)
8+
9+
set(CMAKE_BUILD_TYPE Release)
10+
set(BUILD_SHARED_LIBS ON)
11+
12+
set(CMAKE_CXX_STANDARD 17)
13+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
14+
set(CMAKE_CXX_EXTENSIONS OFF)
15+
16+
set(GGML_BUILD ON)
17+
set(GGML_SHARED ON)
18+
set(GGML_CCACHE ON)
19+
set(GGML_BACKEND_DL ON)
20+
set(GGML_BACKEND_SHARED ON)
21+
set(GGML_SCHED_MAX_COPIES 4)
22+
23+
set(GGML_LLAMAFILE ON)
24+
set(GGML_CUDA_PEER_MAX_BATCH_SIZE 128)
25+
set(GGML_CUDA_GRAPHS ON)
26+
27+
if((NOT CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
28+
OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+"))
29+
set(GGML_CPU_ALL_VARIANTS ON)
30+
endif()
31+
32+
if (CMAKE_OSX_ARCHITECTURES MATCHES "x86_64")
33+
set(CMAKE_BUILD_RPATH "@loader_path")
34+
set(CMAKE_INSTALL_RPATH "@loader_path")
35+
endif()
36+
37+
set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib/ollama)
38+
set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib/ollama)
39+
40+
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
41+
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
42+
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
43+
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
44+
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
45+
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
46+
47+
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
48+
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/include)
49+
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu)
50+
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu/amx)
51+
52+
set(GGML_CPU ON)
53+
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
54+
set_property(TARGET ggml PROPERTY EXCLUDE_FROM_ALL TRUE)
55+
56+
get_target_property(CPU_VARIANTS ggml-cpu MANUALLY_ADDED_DEPENDENCIES)
57+
if(NOT CPU_VARIANTS)
58+
set(CPU_VARIANTS "ggml-cpu")
59+
endif()
60+
61+
install(TARGETS ggml-base ${CPU_VARIANTS}
62+
RUNTIME_DEPENDENCIES
63+
PRE_EXCLUDE_REGEXES ".*"
64+
RUNTIME DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
65+
LIBRARY DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
66+
FRAMEWORK DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
67+
)
68+
69+
check_language(CUDA)
70+
if(CMAKE_CUDA_COMPILER)
71+
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24" AND NOT CMAKE_CUDA_ARCHITECTURES)
72+
set(CMAKE_CUDA_ARCHITECTURES "native")
73+
endif()
74+
75+
find_package(CUDAToolkit)
76+
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cuda)
77+
set(OLLAMA_CUDA_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/cuda_v${CUDAToolkit_VERSION_MAJOR})
78+
install(TARGETS ggml-cuda
79+
RUNTIME_DEPENDENCIES
80+
DIRECTORIES ${CUDAToolkit_BIN_DIR} ${CUDAToolkit_LIBRARY_DIR}
81+
PRE_INCLUDE_REGEXES cublas cublasLt cudart
82+
PRE_EXCLUDE_REGEXES ".*"
83+
RUNTIME DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
84+
LIBRARY DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
85+
)
86+
endif()
87+
88+
set(WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX "^gfx(906|908|90a):xnack[+-]$"
89+
CACHE STRING
90+
"Regular expression describing AMDGPU_TARGETS not supported on Windows. Override to force building these targets. Default \"^gfx(906|908|90a):xnack[+-]$\"."
91+
)
92+
93+
check_language(HIP)
94+
if(CMAKE_HIP_COMPILER)
95+
set(HIP_PLATFORM "amd")
96+
97+
find_package(hip REQUIRED)
98+
if(NOT AMDGPU_TARGETS)
99+
list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")
100+
elseif(WIN32 AND WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX)
101+
list(FILTER AMDGPU_TARGETS EXCLUDE REGEX ${WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX})
102+
endif()
103+
104+
if(AMDGPU_TARGETS)
105+
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-hip)
106+
107+
if (WIN32)
108+
target_compile_definitions(ggml-hip PRIVATE GGML_CUDA_NO_PEER_COPY=1)
109+
endif()
110+
111+
set(OLLAMA_HIP_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/rocm)
112+
install(TARGETS ggml-hip
113+
RUNTIME_DEPENDENCIES
114+
DIRECTORIES ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR}
115+
PRE_INCLUDE_REGEXES hipblas rocblas amdhip64 rocsolver amd_comgr hsa-runtime64 rocsparse tinfo rocprofiler-register drm drm_amdgpu numa elf
116+
PRE_EXCLUDE_REGEXES ".*"
117+
POST_EXCLUDE_REGEXES "system32"
118+
RUNTIME DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
119+
LIBRARY DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
120+
)
121+
122+
foreach(HIP_LIB_BIN_INSTALL_DIR IN ITEMS ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR})
123+
if(EXISTS ${HIP_LIB_BIN_INSTALL_DIR}/rocblas)
124+
install(DIRECTORY ${HIP_LIB_BIN_INSTALL_DIR}/rocblas DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP)
125+
break()
126+
endif()
127+
endforeach()
128+
endif()
129+
endif()
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
{
2+
"version": 3,
3+
"configurePresets": [
4+
{
5+
"name": "Default",
6+
"binaryDir": "${sourceDir}/build",
7+
"installDir": "${sourceDir}/dist",
8+
"cacheVariables": {
9+
"CMAKE_BUILD_TYPE": "Release"
10+
}
11+
},
12+
{
13+
"name": "CPU",
14+
"inherits": [ "Default" ]
15+
},
16+
{
17+
"name": "CUDA",
18+
"inherits": [ "Default" ]
19+
},
20+
{
21+
"name": "CUDA 11",
22+
"inherits": [ "CUDA" ],
23+
"cacheVariables": {
24+
"CMAKE_CUDA_ARCHITECTURES": "50;52;53;60;61;62;70;72;75;80;86"
25+
}
26+
},
27+
{
28+
"name": "CUDA 12",
29+
"inherits": [ "CUDA" ],
30+
"cacheVariables": {
31+
"CMAKE_CUDA_ARCHITECTURES": "60;61;62;70;72;75;80;86;87;89;90;90a"
32+
}
33+
},
34+
{
35+
"name": "JetPack 5",
36+
"inherits": [ "CUDA" ],
37+
"cacheVariables": {
38+
"CMAKE_CUDA_ARCHITECTURES": "72;87"
39+
}
40+
},
41+
{
42+
"name": "JetPack 6",
43+
"inherits": [ "CUDA" ],
44+
"cacheVariables": {
45+
"CMAKE_CUDA_ARCHITECTURES": "87"
46+
}
47+
},
48+
{
49+
"name": "ROCm",
50+
"inherits": [ "Default" ],
51+
"cacheVariables": {
52+
"CMAKE_HIP_PLATFORM": "amd"
53+
}
54+
},
55+
{
56+
"name": "ROCm 6",
57+
"inherits": [ "ROCm" ],
58+
"cacheVariables": {
59+
"AMDGPU_TARGETS": "gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-"
60+
}
61+
}
62+
],
63+
"buildPresets": [
64+
{
65+
"name": "Default",
66+
"configurePreset": "Default",
67+
"configuration": "Release"
68+
},
69+
{
70+
"name": "CPU",
71+
"configurePreset": "Default",
72+
"targets": [ "ggml-cpu" ]
73+
},
74+
{
75+
"name": "CUDA",
76+
"configurePreset": "CUDA",
77+
"targets": [ "ggml-cuda" ]
78+
},
79+
{
80+
"name": "CUDA 11",
81+
"inherits": [ "CUDA" ],
82+
"configurePreset": "CUDA 11"
83+
},
84+
{
85+
"name": "CUDA 12",
86+
"inherits": [ "CUDA" ],
87+
"configurePreset": "CUDA 12"
88+
},
89+
{
90+
"name": "JetPack 5",
91+
"inherits": [ "CUDA" ],
92+
"configurePreset": "JetPack 5"
93+
},
94+
{
95+
"name": "JetPack 6",
96+
"inherits": [ "CUDA" ],
97+
"configurePreset": "JetPack 6"
98+
},
99+
{
100+
"name": "ROCm",
101+
"configurePreset": "ROCm",
102+
"targets": [ "ggml-hip" ]
103+
},
104+
{
105+
"name": "ROCm 6",
106+
"inherits": [ "ROCm" ],
107+
"configurePreset": "ROCm 6"
108+
}
109+
]
110+
}

0 commit comments

Comments
 (0)