You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,9 @@ This package provides:
17
17
18
18
Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest](https://llama-cpp-python.readthedocs.io/en/latest).
19
19
20
+
> [!WARNING]
21
+
> Starting with version 0.1.79 the model format has changed from `ggmlv3` to `gguf`. Old model files can be converted using the `convert-llama-ggmlv3-to-gguf.py` script in [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
22
+
20
23
21
24
## Installation from PyPI (recommended)
22
25
@@ -201,7 +204,7 @@ This package is under active development and I welcome any contributions.
201
204
To get started, clone the repository and install the package in editable / development mode:
**Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
8
7
9
-
# Simple Dockerfiles for building the llama-cpp-python server with external model bin files
10
-
## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image
8
+
## Simple Dockerfiles for building the llama-cpp-python server with external model bin files
9
+
### openblas_simple
10
+
A simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image:
11
11
```
12
12
cd ./openblas_simple
13
13
docker build -t openblas_simple .
14
-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t openblas_simple
where `<model-root-path>/<model-path>` is the full path to the model file on the Docker host system.
17
17
18
-
## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image
18
+
### cuda_simple
19
+
> [!WARNING]
20
+
> Nvidia GPU CuBLAS support requires an Nvidia GPU with sufficient VRAM (approximately as much as the size in the table below) and Docker Nvidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) <br>
21
+
22
+
A simple Dockerfile for CUDA-accelerated CuBLAS, where the model is located outside the Docker image:
23
+
19
24
```
20
25
cd ./cuda_simple
21
26
docker build -t cuda_simple .
22
-
docker run -e USE_MLOCK=0 -e MODEL=/var/model/<model-path> -v <model-root-path>:/var/model -t cuda_simple
0 commit comments