You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build.md
+61-2Lines changed: 61 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -614,7 +614,7 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
614
614
- Follow the guide to install OpenVINO Runtime from an archive file: [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-linux.html) | [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/install-openvino-archive-windows.html)
615
615
616
616
<details>
617
-
<summary>📦 Click to expand OpenVINO 2025.3 installation on Ubuntu</summary>
617
+
<summary>📦 Click to expand OpenVINO 2025.3 installation from an archive file on Ubuntu</summary>
618
618
<br>
619
619
620
620
```bash
@@ -700,9 +700,68 @@ Control OpenVINO behavior using these environment variables:
700
700
export GGML_OPENVINO_CACHE_DIR=/tmp/ov_cache
701
701
export GGML_OPENVINO_PROFILING=1
702
702
703
-
./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
703
+
GGML_OPENVINO_DEVICE=GPU ./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct.fp16.gguf -n 50 "The story of AI is "
704
+
```
705
+
706
+
### Docker build Llama.cpp with OpenVINO Backend
707
+
You can build and run llama.cpp with OpenVINO backend using Docker.
708
+
709
+
```bash
710
+
# Build the base runtime image with compiled shared libraries and minimal dependencies.
# If you are behind a proxy, make sure to set NO_PROXY to avoid proxy for localhost
752
+
export NO_PROXY=localhost,127.0.0.1
753
+
754
+
# Test health endpoint
755
+
curl -f http://localhost:8080/health
756
+
757
+
# Test with a simple prompt
758
+
curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" \
759
+
-d '{"messages":[{"role":"user","content":"Write a poem about OpenVINO"}],"max_tokens":100}'| jq .
760
+
761
+
```
762
+
763
+
764
+
---
706
765
## Notes about GPU-accelerated backends
707
766
708
767
The GPU may still be used to accelerate some parts of the computation even when using the `-ngl 0` option. You can fully disable GPU acceleration by using `--device none`.
0 commit comments