|
| 1 | +# `node-llama-cpp` CUDA support |
| 2 | +## Prerequisites |
| 3 | +* [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.0 or higher |
| 4 | +* [`cmake-js` dependencies](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake) |
| 5 | +* [CMake](https://cmake.org/download/) 3.26 or higher (optional, recommended if you have build issues) |
| 6 | + |
| 7 | +## Building `node-llama-cpp` with CUDA support |
| 8 | +Run this command inside of your project: |
| 9 | +```bash |
| 10 | +npx --no node-llama-cpp download --cuda |
| 11 | +``` |
| 12 | + |
| 13 | +> If `cmake` is not installed on your machine, `node-llama-cpp` will automatically download `cmake` to an internal directory and try to use it to build `llama.cpp` from source. |
| 14 | +
|
| 15 | +> If you see the message `cuBLAS not found` during the build process, |
| 16 | +> it means that CUDA Toolkit is not installed on your machine or that it is not detected by the build process. |
| 17 | +
|
| 18 | +### Custom `llama.cpp` cmake options |
| 19 | +`llama.cpp` has some options you can use to customize your CUDA build, you can find these [here](https://github.com/ggerganov/llama.cpp/tree/master#cublas). |
| 20 | + |
| 21 | +To build `node-llama-cpp` with any of these options, set an environment variable of an option prefixed with `NODE_LLAMA_CPP_CMAKE_OPTION_`. |
| 22 | + |
| 23 | +### Fix the `Failed to detect a default CUDA architecture` build error |
| 24 | +To fix this issue you have to set the `CUDACXX` environment variable to the path of the `nvcc` compiler. |
| 25 | + |
| 26 | +For example, if you installed CUDA Toolkit 12.2 on Windows, you have to run the following command: |
| 27 | +```bash |
| 28 | +set CUDACXX=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe |
| 29 | +``` |
| 30 | + |
| 31 | +On Linux, it would be something like this: |
| 32 | +```bash |
| 33 | +export CUDACXX=/usr/local/cuda-12.2/bin/nvcc |
| 34 | +``` |
| 35 | + |
| 36 | +Then run the build command again to check whether setting the `CUDACXX` environment variable fixed the issue. |
| 37 | + |
| 38 | +## Using `node-llama-cpp` with CUDA |
| 39 | +After you build `node-llama-cpp` with CUDA support, you can use it normally. |
| 40 | + |
| 41 | +To configure how much layers of the model are run on the GPU, configure `gpuLayers` on `LlamaModel` in your code: |
| 42 | +```typescript |
| 43 | +const model = new LlamaModel({ |
| 44 | + modelPath, |
| 45 | + gpuLayers: 64 // or any other number of layers you want |
| 46 | +}); |
| 47 | +``` |
| 48 | + |
| 49 | +You'll see logs like these in the console when the model loads: |
| 50 | +``` |
| 51 | +llm_load_tensors: ggml ctx size = 0.09 MB |
| 52 | +llm_load_tensors: using CUDA for GPU acceleration |
| 53 | +llm_load_tensors: mem required = 41.11 MB (+ 2048.00 MB per state) |
| 54 | +llm_load_tensors: offloading 32 repeating layers to GPU |
| 55 | +llm_load_tensors: offloading non-repeating layers to GPU |
| 56 | +llm_load_tensors: offloading v cache to GPU |
| 57 | +llm_load_tensors: offloading k cache to GPU |
| 58 | +llm_load_tensors: offloaded 35/35 layers to GPU |
| 59 | +llm_load_tensors: VRAM used: 4741 MB |
| 60 | +``` |
| 61 | + |
| 62 | +On Linux, you can monitor GPU usage with this command: |
| 63 | +```bash |
| 64 | +watch -d nvidia-smi |
| 65 | +``` |
0 commit comments