You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTIFY_MSG = @echo -e '\n***\nYou did a basic CPU build. For faster speeds, consider installing and linking a GPU BLAS library. For example, set LLAMA_VULKAN=1 to compile with Vulkan support. Read the KoboldCpp Wiki for more information. This is just a reminder, not an error.\n***\n'
469
-
endif
470
-
endif
471
-
endif
472
-
endif
473
-
endif
471
+
ifndefLLAMA_CLBLAST
472
+
ifndefLLAMA_CUBLAS
473
+
ifndefLLAMA_HIPBLAS
474
+
ifndefLLAMA_VULKAN
475
+
ifndefLLAMA_METAL
476
+
NOTIFY_MSG = @echo -e '\n***\nYou did a basic CPU build. For faster speeds, consider installing and linking a GPU BLAS library. For example, set LLAMA_CLBLAST=1 LLAMA_VULKAN=1 to compile with Vulkan and CLBlast support. Add LLAMA_PORTABLE=1 to make a sharable build that other devices can use. Read the KoboldCpp Wiki for more information. This is just a reminder, not an error.\n***\n'
Copy file name to clipboardExpand all lines: README.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,13 +83,16 @@ when you can't use the precompiled binary directly, we provide an automated buil
83
83
- For Debian: Install `libclblast-dev`.
84
84
- You can attempt a CuBLAS build with `LLAMA_CUBLAS=1`, (or `LLAMA_HIPBLAS=1` for AMD). You will need CUDA Toolkit installed. Some have also reported success with the CMake file, though that is more for windows.
85
85
- For a full featured build (all backends), do `make LLAMA_CLBLAST=1 LLAMA_CUBLAS=1 LLAMA_VULKAN=1`. (Note that `LLAMA_CUBLAS=1` will not work on windows, you need visual studio)
86
+
- To make your build sharable and capable of working on other devices, you must use `LLAMA_PORTABLE=1`
86
87
- After all binaries are built, you can run the python script with the command `koboldcpp.py [ggml_model.gguf] [port]`
87
88
88
89
### Compiling on Windows
89
90
- You're encouraged to use the .exe released, but if you want to compile your binaries from source at Windows, the easiest way is:
90
91
- Get the latest release of w64devkit (https://github.com/skeeto/w64devkit). Be sure to use the "vanilla one", not i686 or other different stuff. If you try they will conflit with the precompiled libs!
91
92
- Clone the repo with `git clone https://github.com/LostRuins/koboldcpp.git`
92
-
- Make sure you are using the w64devkit integrated terminal, then run `make` at the KoboldCpp source folder. This will create the .dll files.
93
+
- Make sure you are using the w64devkit integrated terminal, then run `make` at the KoboldCpp source folder. This will create the .dll files for a pure CPU native build.
94
+
- For a full featured build (all backends), do `make LLAMA_CLBLAST=1 LLAMA_VULKAN=1`. (Note that `LLAMA_CUBLAS=1` will not work on windows, you need visual studio)
95
+
- To make your build sharable and capable of working on other devices, you must use `LLAMA_PORTABLE=1`
93
96
- If you want to generate the .exe file, make sure you have the python module PyInstaller installed with pip (`pip install PyInstaller`). Then run the script `make_pyinstaller.bat`
94
97
- The koboldcpp.exe file will be at your dist folder.
95
98
-**Building with CUDA**: Visual Studio, CMake and CUDA Toolkit is required. Clone the repo, then open the CMake file and compile it in Visual Studio. Copy the `koboldcpp_cublas.dll` generated into the same directory as the `koboldcpp.py` file. If you are bundling executables, you may need to include CUDA dynamic libraries (such as `cublasLt64_11.dll` and `cublas64_11.dll`) in order for the executable to work correctly on a different PC.
@@ -104,6 +107,7 @@ when you can't use the precompiled binary directly, we provide an automated buil
104
107
- You can compile your binaries from source. You can clone the repo with `git clone https://github.com/LostRuins/koboldcpp.git`
105
108
- A makefile is provided, simply run `make`.
106
109
- If you want Metal GPU support, instead run `make LLAMA_METAL=1`, note that MacOS metal libraries need to be installed.
110
+
- To make your build sharable and capable of working on other devices, you must use `LLAMA_PORTABLE=1`
107
111
- After all binaries are built, you can run the python script with the command `koboldcpp.py --model [ggml_model.gguf]` (and add `--gpulayers (number of layer)` if you wish to offload layers to GPU).
108
112
109
113
### Compiling on Android (Termux Installation)
@@ -114,6 +118,7 @@ when you can't use the precompiled binary directly, we provide an automated buil
114
118
- Clone the repo `git clone https://github.com/LostRuins/koboldcpp.git`
115
119
- Navigate to the koboldcpp folder `cd koboldcpp`
116
120
- Build the project `make`
121
+
- To make your build sharable and capable of working on other devices, you must use `LLAMA_PORTABLE=1`, this disables usage of ARM instrinsics.
117
122
- Grab a small GGUF model, such as `wget https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf`
118
123
- Start the python server `python koboldcpp.py --model KobbleTiny-Q4_K.gguf`
119
124
- Connect to `http://localhost:5001` on your mobile browser
Copy file name to clipboardExpand all lines: ggml/src/ggml-backend.cpp
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -748,6 +748,7 @@ static int ggml_backend_sched_backend_id_from_cur(ggml_backend_sched_t sched, st
748
748
if (!backend_prealloc_warn) {
749
749
backend_prealloc_warn = true;
750
750
printf("\nCaution: pre-allocated tensor (%s) in a buffer (%s) that cannot run the operation (%s)\n", tensor->name, ggml_backend_buffer_name(buffer), ggml_op_name(tensor->op));
751
+
printf("\nNote that if you are using Quantized KV, not all backends support it!\n");
0 commit comments