Skip to content

Commit c94a7fa

Browse files
authored
docs: improve CUDA documentation (#52)
1 parent 4e274ce commit c94a7fa

File tree

6 files changed

+75
-7
lines changed

6 files changed

+75
-7
lines changed

.config/typedoc.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"$schema": "https://typedoc.org/schema.json",
33
"entryPoints": ["../src/index.ts"],
4-
"out": "../docs",
4+
"out": "../docs-site",
55
"tsconfig": "../tsconfig.json",
66
"customCss": "./typedoc.css",
77
"readme": "../README.md",

.github/workflows/build.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ jobs:
3434
- name: Upload build artifact
3535
uses: actions/upload-artifact@v3
3636
with:
37-
name: "docs"
38-
path: "docs"
37+
name: "docs-site"
38+
path: "docs-site"
3939
- name: Upload llama.cpp artifact
4040
uses: actions/upload-artifact@v3
4141
with:
@@ -230,7 +230,7 @@ jobs:
230230
mkdir -p llamaBins
231231
mv artifacts/bins-*/* llamaBins/
232232
mv artifacts/build dist/
233-
mv artifacts/docs docs/
233+
mv artifacts/docs-site docs-site/
234234
235235
cp -r artifacts/llama.cpp/grammars llama/grammars
236236
@@ -257,7 +257,7 @@ jobs:
257257
uses: actions/upload-pages-artifact@v2
258258
with:
259259
name: pages-docs
260-
path: docs
260+
path: docs-site
261261
- name: Deploy docs to GitHub Pages
262262
if: steps.set-npm-url.outputs.npm-url != ''
263263
uses: actions/deploy-pages@v2

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ node_modules
44
.DS_Store
55

66
/dist
7-
/docs
7+
/docs-site
88

99
/.env
1010
/.eslintcache

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,9 @@ npx node-llama-cpp download --cuda
221221

222222
> If `cmake` is not installed on your machine, `node-llama-cpp` will automatically download `cmake` to an internal directory and try to use it to build `llama.cpp` from source.
223223
>
224-
> If the build fails, make sure you have the required dependencies of `cmake` installed on your machine. More info is available [here](https://github.com/cmake-js/cmake-js#installation:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake) (you don't have to install `cmake` or `cmake-js`, just the dependencies).
224+
> If the build fails, make sure you have the required dependencies of `cmake` installed on your machine. More info is available [here](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake) (you don't have to install `cmake` or `cmake-js`, just the dependencies).
225+
226+
To troubleshoot CUDA issues, visit the [CUDA documentation](https://github.com/withcatai/node-llama-cpp/blob/master/docs/CUDA.md).
225227

226228
### CLI
227229
```

docs/CUDA.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# `node-llama-cpp` CUDA support
2+
## Prerequisites
3+
* [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.0 or higher
4+
* [`cmake-js` dependencies](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake)
5+
* [CMake](https://cmake.org/download/) 3.26 or higher (optional, recommended if you have build issues)
6+
7+
## Building `node-llama-cpp` with CUDA support
8+
Run this command inside of your project:
9+
```bash
10+
npx --no node-llama-cpp download --cuda
11+
```
12+
13+
> If `cmake` is not installed on your machine, `node-llama-cpp` will automatically download `cmake` to an internal directory and try to use it to build `llama.cpp` from source.
14+
15+
> If you see the message `cuBLAS not found` during the build process,
16+
> it means that CUDA Toolkit is not installed on your machine or that it is not detected by the build process.
17+
18+
### Custom `llama.cpp` cmake options
19+
`llama.cpp` has some options you can use to customize your CUDA build, you can find these [here](https://github.com/ggerganov/llama.cpp/tree/master#cublas).
20+
21+
To build `node-llama-cpp` with any of these options, set an environment variable of an option prefixed with `NODE_LLAMA_CPP_CMAKE_OPTION_`.
22+
23+
### Fix the `Failed to detect a default CUDA architecture` build error
24+
To fix this issue you have to set the `CUDACXX` environment variable to the path of the `nvcc` compiler.
25+
26+
For example, if you installed CUDA Toolkit 12.2 on Windows, you have to run the following command:
27+
```bash
28+
set CUDACXX=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe
29+
```
30+
31+
On Linux, it would be something like this:
32+
```bash
33+
export CUDACXX=/usr/local/cuda-12.2/bin/nvcc
34+
```
35+
36+
Then run the build command again to check whether setting the `CUDACXX` environment variable fixed the issue.
37+
38+
## Using `node-llama-cpp` with CUDA
39+
After you build `node-llama-cpp` with CUDA support, you can use it normally.
40+
41+
To configure how much layers of the model are run on the GPU, configure `gpuLayers` on `LlamaModel` in your code:
42+
```typescript
43+
const model = new LlamaModel({
44+
modelPath,
45+
gpuLayers: 64 // or any other number of layers you want
46+
});
47+
```
48+
49+
You'll see logs like these in the console when the model loads:
50+
```
51+
llm_load_tensors: ggml ctx size = 0.09 MB
52+
llm_load_tensors: using CUDA for GPU acceleration
53+
llm_load_tensors: mem required = 41.11 MB (+ 2048.00 MB per state)
54+
llm_load_tensors: offloading 32 repeating layers to GPU
55+
llm_load_tensors: offloading non-repeating layers to GPU
56+
llm_load_tensors: offloading v cache to GPU
57+
llm_load_tensors: offloading k cache to GPU
58+
llm_load_tensors: offloaded 35/35 layers to GPU
59+
llm_load_tensors: VRAM used: 4741 MB
60+
```
61+
62+
On Linux, you can monitor GPU usage with this command:
63+
```bash
64+
watch -d nvidia-smi
65+
```

src/utils/compileLLamaCpp.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ export async function compileLlamaCpp({
3737
if (process.env.LLAMA_CUDA_MMV_Y != null) cmakeCustomOptions.push("LLAMA_CUDA_MMV_Y=" + process.env.LLAMA_CUDA_MMV_Y);
3838
if (process.env.LLAMA_CUDA_F16 != null) cmakeCustomOptions.push("LLAMA_CUDA_F16=" + process.env.LLAMA_CUDA_F16);
3939
if (process.env.LLAMA_CUDA_KQUANTS_ITER != null) cmakeCustomOptions.push("LLAMA_CUDA_KQUANTS_ITER=" + process.env.LLAMA_CUDA_KQUANTS_ITER);
40+
if (process.env.LLAMA_CUDA_PEER_MAX_BATCH_SIZE != null) cmakeCustomOptions.push("LLAMA_CUDA_PEER_MAX_BATCH_SIZE=" + process.env.LLAMA_CUDA_PEER_MAX_BATCH_SIZE);
4041
if (process.env.LLAMA_HIPBLAS === "1") cmakeCustomOptions.push("LLAMA_HIPBLAS=1");
4142
if (process.env.LLAMA_CLBLAST === "1") cmakeCustomOptions.push("LLAMA_CLBLAST=1");
4243

0 commit comments

Comments
 (0)