Skip to content

Commit bc968f5

Browse files
zhaohbXiake Sun
andauthored
add support gguf reader (#981)
* Update Dockerfile Update the Dockerfile to use Ubuntu 24.04 as the base image. * Show the generated token count * update README Add links to the NPU models for the Qwen3. * add support gguf reader * update README 1. update genai to 2025.3.0.0.dev20250630 2. add a list of models that support the GGUF format. 3. update download link. --------- Co-authored-by: Xiake Sun <[email protected]>
1 parent 9ddc8dd commit bc968f5

File tree

3 files changed

+161
-21
lines changed

3 files changed

+161
-21
lines changed

modules/ollama_openvino/README.md

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -584,17 +584,17 @@ Getting started with large language models and using the [GenAI](https://github.
584584
We provide two ways to download the executable file of Ollama, one is to download it from Google Drive, and the other is to download it from Baidu Drive:
585585
## Google Driver
586586
### Windows
587-
[Download exe](https://drive.google.com/file/d/1Xo3ohbfC852KtJy_4xtn_YrYaH4Y_507/view?usp=sharing) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip)
587+
[Download exe](https://drive.google.com/file/d/12eXPdCSSNx53fmK7KnEZ3WFMSiaX2M-Y/view?usp=sharing) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip)
588588

589589
### Linux(Ubuntu 22.04)
590-
[Download](https://drive.google.com/file/d/1_P7CQqFUqeyx4q5y5bQ-xQsb10T9gzJD/view?usp=sharing) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz)
590+
[Download](https://drive.google.com/file/d/11-Gmk9nEMsr7lrUV2E_gFOAhxXErLsoh/view?usp=sharing) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz)
591591

592592
## 百度云盘
593593
### Windows
594-
[Download exe](https://pan.baidu.com/s/1uIUjji7Mxf594CJy1vbrVw?pwd=36mq) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip)
594+
[Download exe](https://pan.baidu.com/s/1nFok-DqBy-VoiXIwghE71Q?pwd=3m2m) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip)
595595

596596
### Linux(Ubuntu 22.04)
597-
[Download](https://pan.baidu.com/s/1OCq3aKJBiCrtjLKa7kXbMw?pwd=exhz) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz)
597+
[Download](https://pan.baidu.com/s/16roqb9JVN_k1H_fk2JFXHg?pwd=t5q7) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz)
598598

599599
## Docker
600600
### Linux
@@ -608,7 +608,7 @@ docker run -it --rm --entrypoint /bin/bash ollama_openvino_ubuntu24:v1
608608
```
609609
Execute the following inside the container:
610610
```shell
611-
source /home/ollama_ov_server/openvino_genai_ubuntu24_2025.2.0.0.dev20250513_x86_64/setupvars.sh
611+
source /home/ollama_ov_server/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64/setupvars.sh
612612
ollama serve
613613
```
614614

@@ -735,7 +735,7 @@ Let's take [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://hf-mirror.com/deeps
735735

736736
4. Unzip OpenVINO GenAI package and set environment
737737
```shell
738-
cd openvino_genai_windows_2025.2.0.0.dev20250513_x86_64
738+
cd openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64
739739
setupvars.bat
740740
```
741741

@@ -762,6 +762,44 @@ Let's take [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://hf-mirror.com/deeps
762762
ollama run DeepSeek-R1-Distill-Qwen-7B-int4-ov:v1
763763
```
764764

765+
### Import from GGUF file(Experimental feature, not recommended for production use.)
766+
767+
| GGUF | Model Link | GGUF Size |Precision | Status | Device |
768+
| ------------------ | ---------- | ----- | -----------|-------------------- |----------|
769+
| DeepSeek-R1-Distill-Qwen-1.5B-GGUF | [HuggingFace](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF) | 1.12GB | Q4_K_M | ✔️ | CPU, GPU |
770+
| DeepSeek-R1-Distill-Qwen-7B-GGUF | [HuggingFace](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) | 4.68GB | Q4_K_M | ✔️ | CPU, GPU |
771+
| Qwen2.5-1.5B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct-GGUF) | 1.12GB | Q4_K_M | ✔️ | CPU, GPU |
772+
| Qwen2.5-3B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-3B-Instruct-GGUF) | 2.1GB | Q4_K_M | ✔️ | CPU, GPU |
773+
| Qwen2.5-7B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-GGUF) | 4.68GB | Q4_K_M | ✔️ | CPU, GPU |
774+
| llama-3.2-1b-instruct-GGUF | [HuggingFace](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/Llama-3.2-1B-Instruct-GGUF) | 0.75GB | Q4_K_M | ✔️ | CPU, GPU |
775+
| llama-3.2-3b-instruct-GGUF | [HuggingFace](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/Llama-3.2-3B-Instruct-GGUF) | 2.02GB | Q4_K_M | ✔️ | CPU, GPU |
776+
| llama-3.1-8b-instruct-GGUF | [HuggingFace](https://huggingface.co/modularai/Llama-3.1-8B-Instruct-GGUF) | 4.92GB | Q4_K_M | ✔️ | CPU, GPU |
777+
778+
#### Example
779+
Using the qwen2.5-3b-instruct-q4_k_m.gguf model as an example:
780+
1. the corresponding Modelfile is as follows
781+
```shell
782+
FROM qwen2.5-3b-instruct-q4_k_m.gguf
783+
ModelType "OpenVINO"
784+
InferDevice "GPU"
785+
PARAMETER stop "<|im_end|>"
786+
PARAMETER repeat_penalty 1.0
787+
PARAMETER top_p 1.0
788+
PARAMETER temperature 1.0
789+
```
790+
2. Create the model in Ollama
791+
792+
```shell
793+
ollama create qwen2.5-3b-gguf-ov-gpu:v1 -f Modelfile
794+
```
795+
796+
3. Run the model
797+
798+
```shell
799+
ollama run qwen2.5-3b-gguf-ov-gpu:v1
800+
```
801+
Reference link: [openvino-genai-supports-gguf-models](https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models)
802+
765803
## CLI Reference
766804

767805
### Show model information
@@ -813,9 +851,9 @@ Then build and run Ollama from the root directory of the repository:
813851

814852
3. Initialize the GenAI environment
815853

816-
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip), then extract it to a directory openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.
854+
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip), then extract it to a directory openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.
817855
```shell
818-
cd openvino_genai_windows_2025.2.0.0.dev20250513_x86_64
856+
cd openvino_genai_windows_2025.3.0.0.dev20250630_x86_64
819857
setupvars.bat
820858
```
821859

@@ -852,9 +890,9 @@ Then build and run Ollama from the root directory of the repository:
852890

853891
3. Initialize the GenAI environment
854892

855-
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz), then extract it to a directory openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.
893+
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz), then extract it to a directory openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.
856894
```shell
857-
cd openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64
895+
cd openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64
858896
source setupvars.sh
859897
```
860898

modules/ollama_openvino/genai/genai.go

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ import "C"
4040

4141
import (
4242
"archive/tar"
43+
"bufio"
44+
"bytes"
4345
"compress/gzip"
4446
"fmt"
4547
"io"
@@ -67,6 +69,74 @@ type SamplingParams struct {
6769

6870
type Model *C.ov_genai_llm_pipeline
6971

72+
func IsGGUF(filePath string) (bool, error) {
73+
file, err := os.Open(filePath)
74+
if err != nil {
75+
return false, fmt.Errorf("failed to open file: %v", err)
76+
}
77+
defer file.Close()
78+
79+
// Read the first 4 bytes (magic number for GGUF)
80+
reader := bufio.NewReader(file)
81+
magicBytes := make([]byte, 4)
82+
_, err = reader.Read(magicBytes)
83+
if err != nil {
84+
return false, fmt.Errorf("failed to read magic number: %v", err)
85+
}
86+
87+
// Compare the magic number (GGUF in ASCII)
88+
expectedMagic := []byte{0x47, 0x47, 0x55, 0x46} // "GGUF" in hex
89+
for i := 0; i < 4; i++ {
90+
if magicBytes[i] != expectedMagic[i] {
91+
return false, nil
92+
}
93+
}
94+
95+
return true, nil
96+
}
97+
98+
func IsGzipByMagicBytes(filepath string) (bool, error) {
99+
file, err := os.Open(filepath)
100+
if err != nil {
101+
return false, err
102+
}
103+
defer file.Close()
104+
105+
magicBytes := make([]byte, 2)
106+
_, err = file.Read(magicBytes)
107+
if err != nil {
108+
return false, err
109+
}
110+
111+
return bytes.Equal(magicBytes, []byte{0x1F, 0x8B}), nil
112+
}
113+
114+
func CopyFile(src, dst string) error {
115+
srcFile, err := os.Open(src)
116+
if err != nil {
117+
return err
118+
}
119+
defer srcFile.Close()
120+
121+
dstFile, err := os.Create(dst)
122+
if err != nil {
123+
return err
124+
}
125+
defer dstFile.Close()
126+
127+
_, err = io.Copy(dstFile, srcFile)
128+
if err != nil {
129+
return err
130+
}
131+
132+
err = dstFile.Sync()
133+
if err != nil {
134+
return err
135+
}
136+
137+
return nil
138+
}
139+
70140
func UnpackTarGz(tarGzPath string, destDir string) error {
71141
file, err := os.Open(tarGzPath)
72142
if err != nil {

modules/ollama_openvino/genai/runner/runner.go

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -381,24 +381,56 @@ func (s *Server) loadModel(mpath string, mname string, device string) {
381381
var err error
382382
ov_ir_dir := strings.ReplaceAll(mname, ":", "_")
383383
tempDir := filepath.Join("/tmp", ov_ir_dir)
384+
ov_model_path := ""
384385

385-
_, err = os.Stat(tempDir)
386-
if os.IsNotExist(err) {
387-
err = genai.UnpackTarGz(mpath, tempDir)
388-
if err != nil {
389-
panic(err)
386+
isGGUF, err := genai.IsGGUF(mpath)
387+
if err != nil {
388+
fmt.Printf("Error checking file: %v\n", err)
389+
panic(err)
390+
}
391+
if isGGUF {
392+
log.Printf("The model is a GGUF file.")
393+
ov_model_path = filepath.Join(tempDir, "tmp.gguf")
394+
// for GGUF reader
395+
if _, err := os.Stat(tempDir); os.IsNotExist(err) {
396+
err := os.MkdirAll(tempDir, 0755)
397+
if err != nil {
398+
fmt.Errorf("Error creating dir: %v", err)
399+
panic(err)
400+
}
401+
err = genai.CopyFile(mpath, ov_model_path)
402+
if err != nil {
403+
panic(err)
404+
}
390405
}
391406
}
392407

393-
entries, err := os.ReadDir(tempDir)
394-
var subdirs []string
395-
for _, entry := range entries {
396-
if entry.IsDir() {
397-
subdirs = append(subdirs, entry.Name())
408+
isGzip, err := genai.IsGzipByMagicBytes(mpath)
409+
if err != nil {
410+
fmt.Printf("Error checking file: %v\n", err)
411+
}
412+
if isGzip {
413+
log.Printf("The model is a OpenVINO IR file.")
414+
// for OpenVINO IR
415+
_, err = os.Stat(tempDir)
416+
if os.IsNotExist(err) {
417+
err = genai.UnpackTarGz(mpath, tempDir)
418+
if err != nil {
419+
panic(err)
420+
}
398421
}
422+
423+
entries, _ := os.ReadDir(tempDir)
424+
var subdirs []string
425+
for _, entry := range entries {
426+
if entry.IsDir() {
427+
subdirs = append(subdirs, entry.Name())
428+
}
429+
}
430+
431+
ov_model_path = filepath.Join(tempDir, subdirs[0])
399432
}
400433

401-
ov_model_path := filepath.Join(tempDir, subdirs[0])
402434
s.model = genai.CreatePipeline(ov_model_path, device)
403435
log.Printf("The model had been load by GenAI, ov_model_path: %s , %s", ov_model_path, device)
404436
s.status = ServerStatusReady

0 commit comments

Comments
 (0)