|
| 1 | +# Using Vulkan |
| 2 | +> Vulkan is a low-overhead, cross-platform 3D graphics and computing API |
| 3 | +
|
| 4 | +`node-llama-cpp` ships with prebuilt binaries with Vulkan support for Windows and Linux, and these are automatically used when Vulkan support is detected on your machine. |
| 5 | + |
| 6 | +**Windows:** Vulkan drivers are usually provided together with your GPU drivers, so most chances are that you don't have to install anything. |
| 7 | + |
| 8 | +**Linux:** you have to [install the Vulkan SDK](#vulkan-sdk-ubuntu). |
| 9 | + |
| 10 | +## Testing Vulkan support |
| 11 | +To check whether the Vulkan support works on your machine, run this command: |
| 12 | +```bash |
| 13 | +npx --no node-llama-cpp inspect gpu |
| 14 | +``` |
| 15 | + |
| 16 | +You should see an output like this: |
| 17 | +```ansi |
| 18 | +[33mVulkan:[39m [32mavailable[39m |
| 19 | +
|
| 20 | +[33mVulkan used VRAM:[39m 0% [90m(64KB/21.33GB)[39m |
| 21 | +[33mVulkan free VRAM:[39m 99.99% [90m(21.33GB/21.33GB)[39m |
| 22 | +
|
| 23 | +[33mUsed RAM:[39m 97.37% [90m(31.16GB/32GB)[39m |
| 24 | +[33mFree RAM:[39m 2.62% [90m(860.72MB/32GB)[39m |
| 25 | +``` |
| 26 | + |
| 27 | +If you see `Vulkan used VRAM` in the output, it means that Vulkan support is working on your machine. |
| 28 | + |
| 29 | +## Building `node-llama-cpp` with Vulkan support |
| 30 | +### Prerequisites |
| 31 | +* [`cmake-js` dependencies](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake) |
| 32 | +* [CMake](https://cmake.org/download/) 3.26 or higher (optional, recommended if you have build issues) |
| 33 | +* <a id="vulkan-sdk" />[Vulkan SDK](https://vulkan.lunarg.com/sdk/home): |
| 34 | + > |
| 35 | + #### Windows: [Vulkan SDK installer](https://sdk.lunarg.com/sdk/download/latest/windows/vulkan-sdk.exe) {#vulkan-sdk-windows} |
| 36 | + > |
| 37 | + #### Ubuntu {#vulkan-sdk-ubuntu} |
| 38 | + ::: code-group |
| 39 | + |
| 40 | + ```bash [Ubuntu 22.04] |
| 41 | + wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc |
| 42 | + sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list |
| 43 | + sudo apt update |
| 44 | + sudo apt install vulkan-sdk |
| 45 | + ``` |
| 46 | + |
| 47 | + ```bash [Ubuntu 20.04] |
| 48 | + wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add - |
| 49 | + sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-focal.list https://packages.lunarg.com/vulkan/lunarg-vulkan-focal.list |
| 50 | + sudo apt update |
| 51 | + sudo apt install vulkan-sdk |
| 52 | + ``` |
| 53 | + |
| 54 | + ::: |
| 55 | + |
| 56 | +## Building from source |
| 57 | +When you use the [`getLlama`](../api/functions/getLlama) method, if there's no binary that matches the provided options, it'll automatically build `llama.cpp` from source. |
| 58 | + |
| 59 | +Manually building from source using the [`download`](./cli/download) command is recommended for troubleshooting build issues. |
| 60 | + |
| 61 | +To manually build from source, run this command inside of your project: |
| 62 | +```bash |
| 63 | +npx --no node-llama-cpp download --gpu vulkan |
| 64 | +``` |
| 65 | + |
| 66 | +> If `cmake` is not installed on your machine, `node-llama-cpp` will automatically download `cmake` to an internal directory and try to use it to build `llama.cpp` from source. |
| 67 | +
|
| 68 | +> If you see the message `Vulkan not found` during the build process, |
| 69 | +> it means that the Vulkan SDK is not installed on your machine or that it is not detected by the build process. |
| 70 | +
|
| 71 | +## Using `node-llama-cpp` with Vulkan |
| 72 | +It's recommended to use [`getLlama`](../api/functions/getLlama) without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically. |
| 73 | + |
| 74 | +To do this, just use [`getLlama`](../api/functions/getLlama) without any parameters: |
| 75 | +```typescript |
| 76 | +import {getLlama} from "node-llama-cpp"; |
| 77 | + |
| 78 | +const llama = await getLlama(); |
| 79 | +``` |
| 80 | + |
| 81 | +To force it to use Vulkan, you can use the [`gpu`](../api/type-aliases/LlamaOptions#gpu) option: |
| 82 | +```typescript |
| 83 | +import {getLlama} from "node-llama-cpp"; |
| 84 | + |
| 85 | +const llama = await getLlama({ |
| 86 | + gpu: "vulkan" |
| 87 | +}); |
| 88 | +``` |
| 89 | +To configure how much layers of the model are run on the GPU, configure `gpuLayers` on `LlamaModel` in your code: |
| 90 | +```typescript |
| 91 | +const model = new LlamaModel({ |
| 92 | + llama, |
| 93 | + modelPath, |
| 94 | + gpuLayers: 64 // or any other number of layers you want |
| 95 | +}); |
| 96 | +``` |
| 97 | + |
| 98 | +You'll see logs like these in the console when the model loads: |
| 99 | +``` |
| 100 | +llm_load_tensors: ggml ctx size = 0.09 MB |
| 101 | +llm_load_tensors: mem required = 41.11 MB (+ 2048.00 MB per state) |
| 102 | +llm_load_tensors: offloading 32 repeating layers to GPU |
| 103 | +llm_load_tensors: offloading non-repeating layers to GPU |
| 104 | +llm_load_tensors: offloading v cache to GPU |
| 105 | +llm_load_tensors: offloading k cache to GPU |
| 106 | +llm_load_tensors: offloaded 35/35 layers to GPU |
| 107 | +llm_load_tensors: VRAM used: 4741 MB |
| 108 | +``` |
| 109 | + |
| 110 | +On Linux, you can monitor GPU usage with this command: |
| 111 | +```bash |
| 112 | +watch -d "npx --no node-llama-cpp inspect gpu" |
| 113 | +``` |
0 commit comments