Skip to content

Commit a682809

Browse files
committed
more build.md updates
1 parent 0ec5b62 commit a682809

File tree

1 file changed

+16
-8
lines changed

1 file changed

+16
-8
lines changed

docs/build.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,6 @@ cmake --build build --config Release
5252
```
5353
Building for arm64 can also be done with the MSVC compiler with the build-arm64-windows-MSVC preset, or the standard CMake build instructions. However, note that the MSVC compiler does not support inline ARM assembly code, used e.g. for the accelerated Q4_0_4_8 CPU kernels.
5454

55-
## Metal Build
56-
57-
On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU.
58-
To disable the Metal build at compile time use the `-DGGML_METAL=OFF` cmake option.
59-
60-
When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers 0` command-line argument.
61-
6255
## BLAS Build
6356

6457
Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). Using BLAS doesn't affect the generation performance. There are currently several different BLAS implementations available for build and use:
@@ -103,6 +96,13 @@ Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/conte
10396

10497
Any other BLAS library can be used by setting the `GGML_BLAS_VENDOR` option. See the [CMake documentation](https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors) for a list of supported vendors.
10598

99+
## Metal Build
100+
101+
On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU.
102+
To disable the Metal build at compile time use the `-DGGML_METAL=OFF` cmake option.
103+
104+
When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers 0` command-line argument.
105+
106106
## SYCL
107107

108108
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators.
@@ -113,7 +113,7 @@ For detailed info, please refer to [llama.cpp for SYCL](./backend/SYCL.md).
113113

114114
## CUDA
115115

116-
This provides GPU acceleration using an NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
116+
This provides GPU acceleration using an NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from the [NVIDIA developer site](https://developer.nvidia.com/cuda-downloads).
117117
118118
- Using `CMake`:
119119
@@ -339,3 +339,11 @@ For detailed info, such as model/device supports, CANN install, please refer to
339339
## Android
340340

341341
To read documentation for how to build on Android, [click here](./android.md)
342+
343+
## Notes about GPU-accelerated backends
344+
345+
The GPU may still be used to accelerate some parts of the computation even when using the `-ngl 0` option. You can fully disable GPU acceleration by using `--device none`.
346+
347+
In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the `-DGGML_CUDA=ON -DGGML_VULKAN=ON` options with CMake. At runtime, you can specify which backend devices to use with the `--device` option. To see a list of available devices, use the `--list-devices` option.
348+
349+
Backends can be built as dynamic libraries that can be loaded dynamically at runtime. This allows you to use the same llama.cpp binary on different machines with different GPUs. To enable this feature, use the `GGML_BACKEND_DL` option when building.

0 commit comments

Comments
 (0)