Skip to content

Commit 12749c0

Browse files
authored
fix(Vulkan): context creation edge cases (#492)
* fix(Vulkan): context creation edge cases * fix: don't share loaded shared libraries between backends * fix(`inspect gpu` command): validate the loaded GPU type before printing info * fix: more CUDA compilation issues * fix: remove unused dependency * docs: change CUDA 12.2 to CUDA 12.4
1 parent f849cd9 commit 12749c0

File tree

15 files changed

+821
-751
lines changed

15 files changed

+821
-751
lines changed

docs/blog/v3.12-gpt-oss.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ image:
2424

2525
Here are a few highlights of these models:
2626
* Due to the low number of active parameters, these models are very fast
27-
* These are reasoning models, and you can adjust their reasoning efforts
27+
* These are reasoning models, and you can adjust their reasoning effort
2828
* They are very good at function calling, and are built with agentic capabilities in mind
2929
* These models were trained with native MXFP4 precision, so no need to quantize them further.
3030
They're small compared to their capabilities already
@@ -74,7 +74,7 @@ but offers better precision and thus better quality.
7474
To quickly try out [`gpt-oss-20b`](https://huggingface.co/giladgd/gpt-oss-20b-GGUF), you can use the [CLI `chat` command](../cli/chat.md):
7575

7676
```shell
77-
npx -y node-llama-cpp chat --ef --prompt "Hi there" hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
77+
npx -y node-llama-cpp chat --prompt "Hi there" hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
7878
```
7979

8080

docs/guide/CUDA.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ description: CUDA support in node-llama-cpp
99
and these are automatically used when CUDA is detected on your machine.
1010

1111
To use `node-llama-cpp`'s CUDA support with your NVIDIA GPU,
12-
make sure you have [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.2 or higher installed on your machine.
12+
make sure you have [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.4 or higher installed on your machine.
1313

1414
If the pre-built binaries don't work with your CUDA installation,
1515
`node-llama-cpp` will automatically download a release of `llama.cpp` and build it from source with CUDA support.
1616
Building from source with CUDA support is slow and can take up to an hour.
1717

18-
The pre-built binaries are compiled with CUDA Toolkit 12.2,
19-
so any version of CUDA Toolkit that is 12.2 or higher should work with the pre-built binaries.
18+
The pre-built binaries are compiled with CUDA Toolkit 12.4,
19+
so any version of CUDA Toolkit that is 12.4 or higher should work with the pre-built binaries.
2020
If you have an older version of CUDA Toolkit installed on your machine,
2121
consider updating it to avoid having to wait the long build time.
2222

@@ -42,7 +42,7 @@ You should see an output like this:
4242
If you see `CUDA used VRAM` in the output, it means that CUDA support is working on your machine.
4343

4444
## Prerequisites
45-
* [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.2 or higher
45+
* [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) 12.4 or higher
4646
* [`cmake-js` dependencies](https://github.com/cmake-js/cmake-js#:~:text=projectRoot/build%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5Bstring%5D-,Requirements%3A,-CMake)
4747
* [CMake](https://cmake.org/download/) 3.26 or higher (optional, recommended if you have build issues)
4848

@@ -79,20 +79,23 @@ const cudaCmakeOptionsTable = data.cudaCmakeOptionsTable;
7979
To build `node-llama-cpp` with any of these options, set an environment variable of an option prefixed with `NODE_LLAMA_CPP_CMAKE_OPTION_`.
8080

8181
### Fix the `Failed to detect a default CUDA architecture` Build Error
82-
To fix this issue you have to set the `CUDACXX` environment variable to the path of the `nvcc` compiler.
82+
To fix this issue you have to set the `CUDACXX` environment variable to the path of the `nvcc` compiler,
83+
and the `CUDA_PATH` environment variable to the path of the CUDA home directory that contains the `nvcc` compiler.
8384

84-
For example, if you have installed CUDA Toolkit 12.2, you have to run a command like this:
85+
For example, if you have installed CUDA Toolkit 12.4, you have to run a command like this:
8586
::: code-group
8687
```shell [Linux]
87-
export CUDACXX=/usr/local/cuda-12.2/bin/nvcc
88+
export CUDACXX=/usr/local/cuda-12.4/bin/nvcc
89+
export CUDA_PATH=/usr/local/cuda-12.4
8890
```
8991

9092
```cmd [Windows]
91-
set CUDACXX=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe
93+
set CUDACXX=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe
94+
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
9295
```
9396
:::
9497

95-
Then run the build command again to check whether setting the `CUDACXX` environment variable fixed the issue.
98+
Then run the build command again to check whether setting the `CUDACXX` and `CUDA_PATH` environment variables fixed the issue.
9699

97100
### Fix the `The CUDA compiler identification is unknown` Build Error
98101
The solution to this error is the same as [the solution to the `Failed to detect a default CUDA architecture` error](#fix-the-failed-to-detect-a-default-cuda-architecture-build-error).

llama/CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ if (NLC_CURRENT_PLATFORM STREQUAL "win-x64" OR NLC_CURRENT_PLATFORM STREQUAL "wi
44
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
55
endif()
66

7+
include("./cmake/addVariantSuffix.cmake")
8+
79
if (NLC_CURRENT_PLATFORM STREQUAL "win-x64")
810
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
911
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreadedDebugDLL" CACHE STRING "" FORCE)
@@ -109,6 +111,9 @@ list(REMOVE_DUPLICATES GPU_INFO_HEADERS)
109111
list(REMOVE_DUPLICATES GPU_INFO_SOURCES)
110112
list(REMOVE_DUPLICATES GPU_INFO_EXTRA_LIBS)
111113

114+
addVariantSuffix(llama ${NLC_VARIANT})
115+
addVariantSuffix(ggml ${NLC_VARIANT})
116+
112117
file(GLOB SOURCE_FILES "addon/*.cpp" "addon/**/*.cpp" ${GPU_INFO_SOURCES})
113118

114119
add_library(${PROJECT_NAME} SHARED ${SOURCE_FILES} ${CMAKE_JS_SRC} ${GPU_INFO_HEADERS})

llama/addon/globals/getGpuInfo.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,13 @@ Napi::Value getGpuVramInfo(const Napi::CallbackInfo& info) {
5454
// this means that we counted memory from devices that aren't used by llama.cpp
5555
vulkanDeviceUnifiedVramSize = 0;
5656
}
57-
57+
5858
unifiedVramSize += vulkanDeviceUnifiedVramSize;
5959
}
60+
61+
if (used == 0 && vulkanDeviceUsed != 0) {
62+
used = vulkanDeviceUsed;
63+
}
6064
#endif
6165

6266
Napi::Object result = Napi::Object::New(info.Env());
@@ -93,7 +97,7 @@ std::pair<ggml_backend_dev_t, std::string> getGpuDevice() {
9397
for (size_t i = 0; i < ggml_backend_dev_count(); i++) {
9498
ggml_backend_dev_t device = ggml_backend_dev_get(i);
9599
const auto deviceName = std::string(ggml_backend_dev_name(device));
96-
100+
97101
if (deviceName == "Metal") {
98102
return std::pair<ggml_backend_dev_t, std::string>(device, "metal");
99103
} else if (std::string(deviceName).find("Vulkan") == 0) {
@@ -106,7 +110,7 @@ std::pair<ggml_backend_dev_t, std::string> getGpuDevice() {
106110
for (size_t i = 0; i < ggml_backend_dev_count(); i++) {
107111
ggml_backend_dev_t device = ggml_backend_dev_get(i);
108112
const auto deviceName = std::string(ggml_backend_dev_name(device));
109-
113+
110114
if (deviceName == "CPU") {
111115
return std::pair<ggml_backend_dev_t, std::string>(device, "cpu");
112116
}
@@ -119,7 +123,7 @@ Napi::Value getGpuType(const Napi::CallbackInfo& info) {
119123
const auto gpuDeviceRes = getGpuDevice();
120124
const auto device = gpuDeviceRes.first;
121125
const auto deviceType = gpuDeviceRes.second;
122-
126+
123127
if (deviceType == "cpu") {
124128
return Napi::Boolean::New(info.Env(), false);
125129
} else if (device != nullptr && deviceType != "") {

llama/cmake/addVariantSuffix.cmake

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
function(addVariantSuffix originalTarget variantSuffix)
2+
if (NOT TARGET ${originalTarget} OR variantSuffix STREQUAL "")
3+
return()
4+
endif()
5+
6+
set(_name "${originalTarget}.${variantSuffix}")
7+
8+
set_target_properties(${originalTarget} PROPERTIES
9+
OUTPUT_NAME "${_name}"
10+
RUNTIME_OUTPUT_NAME "${_name}" # Windows .dll
11+
LIBRARY_OUTPUT_NAME "${_name}" # Unix shared lib
12+
ARCHIVE_OUTPUT_NAME "${_name}" # static / import lib
13+
)
14+
15+
if (APPLE)
16+
set_target_properties(${originalTarget} PROPERTIES
17+
MACOSX_RPATH ON
18+
INSTALL_NAME_DIR "@rpath"
19+
)
20+
endif()
21+
endfunction()

0 commit comments

Comments
 (0)