Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
11b5404
fix: adapt to breaking `llama.cpp` changes
giladgd May 11, 2025
8b98cf0
fix: improve GPU backend loading error description
giladgd May 11, 2025
1e8111c
chore: update template dependencies
giladgd May 11, 2025
2f9858a
test: Qwen 3 template
giladgd May 11, 2025
4c6e2b1
feat: configure Hugging Face remote endpoint for resolving URIs
giladgd May 11, 2025
d39d261
fix: race condition when reading extremely long gguf metadata
giladgd May 11, 2025
e740078
docs: typo
giladgd May 11, 2025
d6e852e
fix: update gguf types
giladgd May 11, 2025
9ab3c6d
fix: capture multi-token segment separators
giladgd May 11, 2025
656f2be
docs: solutions to more CUDA issues
giladgd May 11, 2025
6926425
feat: stream function call parameters
giladgd May 11, 2025
b369eaf
docs: update the awesome list
giladgd May 11, 2025
72c30dc
chore: update modules
giladgd May 11, 2025
df05d70
docs: more clear default values for custom cmake options
giladgd May 11, 2025
b3d510e
chore: reorder Vitepress config keys
giladgd May 11, 2025
3233603
fix: update gguf types
giladgd May 11, 2025
96c78da
docs: document new env vars
giladgd May 11, 2025
f7063d8
chore: module versions
giladgd May 12, 2025
123e524
chore: update GitHub issue templates
giladgd May 12, 2025
53a5206
test: check recommended model URIs
giladgd May 13, 2025
2e1a7ce
test: fix tests
giladgd May 14, 2025
9463ccc
feat(`QwenChatWrapper`): support discouraging the generation of thoughts
giladgd May 15, 2025
631a7e7
test: fix tests
giladgd May 15, 2025
a0cc198
feat: save and restore context sequence state
giladgd May 15, 2025
185b734
docs: save and restore context sequence state
giladgd May 15, 2025
d36670c
fix: adapt memory estimation to new added model architectures
giladgd May 15, 2025
a68590a
feat(`getLlama`): `dryRun` option
giladgd May 16, 2025
8c6134d
feat: `getLlamaGpuTypes` to get the list of available GPU types for t…
giladgd May 16, 2025
71babfa
fix: skip binary testing on certain problematic conditions
giladgd May 16, 2025
12cec69
docs: fix dead link
giladgd May 16, 2025
de3a360
fix: Paperspace tests setup script nodejs version
giladgd May 16, 2025
8eff306
fix: Windows build
giladgd May 17, 2025
f76e899
fix: types
giladgd May 17, 2025
0cbb572
test: fix tests
giladgd May 17, 2025
2c01084
fix: performance improvements
giladgd May 17, 2025
5d4c8c3
fix: remove unused files from the build dir
giladgd May 17, 2025
69d30cd
fix: remove unused line
giladgd May 17, 2025
62c8020
fix: performance improvements
giladgd May 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ description: Report a reproducible bug
labels:
- requires triage
- bug
title: "bug: "
type: "Bug"
body:
- type: markdown
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ description: Documentation is unclear or otherwise insufficient.
labels:
- requires triage
- documentation
title: "docs: "
type: "Documentation"
body:
- type: markdown
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ description: Suggest an new idea for this project
labels:
- requires triage
- new feature
title: "feat: "
type: "Feature"
body:
- type: markdown
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ node_modules
/.vitepress/.cache
/test/.models
/test/temp
/test/.temp
/temp
/coverage
/test-runner-profile

/llama/compile_commands.json
/llama/llama.cpp
Expand Down
6 changes: 3 additions & 3 deletions .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -470,8 +470,6 @@ export default defineConfig({
}
},
sidebar: {
"/api/": getApiReferenceSidebar(),

"/guide/": [{
text: "Guide",
base: "/guide",
Expand Down Expand Up @@ -550,7 +548,9 @@ export default defineConfig({
]
}
]
}]
}],

"/api/": getApiReferenceSidebar()
},
socialLinks: [
{icon: "npm", link: "https://www.npmjs.com/package/node-llama-cpp"},
Expand Down
2 changes: 1 addition & 1 deletion docs/cli/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ If a file already exists and its size matches the expected size, it will not be

The supported URI schemes are:
- **HTTP:** `https://`, `http://`
- **Hugging Face:** `hf:<user>/<model>:<quant>` (`#<quant>` is optional, [but recommended](../guide/downloading-models.md#hf-scheme-specify-quant))
- **Hugging Face:** `hf:<user>/<model>:<quant>` (`:<quant>` is optional, [but recommended](../guide/downloading-models.md#hf-scheme-specify-quant))
- **Hugging Face:** `hf:<user>/<model>/<file-path>#<branch>` (`#<branch>` is optional)

Learn more about using model URIs in the [Downloading Models guide](../guide/downloading-models.md#model-uris).
Expand Down
27 changes: 27 additions & 0 deletions docs/guide/CUDA.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,33 @@ set NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_GENERATOR_TOOLSET=%CUDA_PATH%

Then run the build command again to check whether setting the `CMAKE_GENERATOR_TOOLSET` cmake option fixed the issue.

### Fix the `forward compatibility was attempted on non supported HW` Error {#fix-cuda-forward-compatibility}
This error usually happens when the CUDA version you have installed on your machine is older than the CUDA version used in the prebuilt binaries supplied by `node-llama-cpp`.

To resolve this issue, you can either [update your CUDA installation](https://developer.nvidia.com/cuda-downloads) to the latest version (recommended) or [build `node-llama-cpp` on your machine](#building) against the CUDA version you have installed.

### Fix the `Binary GPU type mismatch. Expected: cuda, got: false` Error {#fix-cuda-gpu-type-mismatch}
This error usually happens when you have multiple conflicting CUDA versions installed on your machine.

To fix it, uninstall older CUDA versions and restart your machine (important).

:::: details Check which CUDA libraries are picked up by `node-llama-cpp`'s prebuilt binaries on your machine

Run this command inside of your project:

::: code-group
```shell [Linux]
ldd ./node_modules/@node-llama-cpp/linux-x64-cuda/bins/linux-x64-cuda/libggml-cuda.so
```

```cmd [Windows]
"C:\Program Files\Git\usr\bin\ldd.exe" node_modules\@node-llama-cpp\win-x64-cuda\bins\win-x64-cuda\ggml-cuda.dll
```
:::

::::


## Using `node-llama-cpp` With CUDA
It's recommended to use [`getLlama`](../api/functions/getLlama) without specifying a GPU type,
so it'll detect the available GPU types and use the best one automatically.
Expand Down
19 changes: 17 additions & 2 deletions docs/guide/awesome.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,32 @@
description: Awesome projects that use node-llama-cpp
---
# Awesome `node-llama-cpp`
Awesome projects that use `node-llama-cpp`.
:sunglasses: Awesome projects that use `node-llama-cpp`.

<script setup lang="ts">
import DataBadge from "../../.vitepress/components/DataBadge/DataBadge.vue";
</script>

## Open Source
* [CatAI](https://github.com/withcatai/catai) - a simplified AI assistant API for Node.js, with REST API support
<br /><DataBadge title="License" content="MIT"/>

* [Manzoni](https://manzoni.app/) ([GitHub](https://github.com/gems-platforms/manzoni-app)) - a text editor running local LLMs
<br /><DataBadge title="License" content="AGPL-3.0"/>


## Proprietary
> List your project here!
* [BashBuddy](https://bashbuddy.run) ([GitHub](https://github.com/wosherco/bashbuddy)) - write bash commands with natural language
<br /><DataBadge title="Partially open source" content="Source available" href="https://github.com/wosherco/bashbuddy/blob/main/LICENSE.md"/>

* [nutshell](https://withnutshell.com) - Private AI meeting notes processed completely on your device



<br />

---

> To add a project to this list, [open a PR](https://github.com/withcatai/node-llama-cpp/edit/master/docs/guide/awesome.md).
>
> To have a project listed here, it should clearly state that it uses `node-llama-cpp`.
81 changes: 81 additions & 0 deletions docs/guide/chat-session.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,87 @@ console.log("AI: " + a2);
```
:::

:::: details Saving and restoring a context sequence evaluation state {#save-and-restore-with-context-sequence-state}
You can also save and restore the context sequence evaluation state to avoid re-evaluating the chat history
when you load it on a new context sequence.

Please note that context sequence state files can get very large (109MB for only 1K tokens).
Using this feature is only recommended when the chat history is very long and you plan to load it often,
or when the evaluation is too slow due to hardware limitations.

::: warning
When loading a context sequence state from a file,
always ensure that the model used to create the context sequence is exactly the same as the one used to save the state file.

Loading a state file created from a different model can crash the process,
thus you have to pass `{acceptRisk: true}` to the [`loadStateFromFile`](../api/classes/LlamaContextSequence.md#loadstatefromfile) method to use it.

Use with caution.
:::

::: code-group
```typescript [Save chat history and context sequence state]
import {fileURLToPath} from "url";
import path from "path";
import fs from "fs/promises";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const contextSequence = context.getSequence();
const session = new LlamaChatSession({contextSequence});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);

const chatHistory = session.getChatHistory();// [!code highlight]
await Promise.all([// [!code highlight]
contextSequence.saveStateToFile("state.bin"),// [!code highlight]
fs.writeFile("chatHistory.json", JSON.stringify(chatHistory), "utf8")// [!code highlight]
]);// [!code highlight]
```
:::

::: code-group
```typescript [Restore chat history and context sequence state]
import {fileURLToPath} from "url";
import path from "path";
import fs from "fs/promises";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));
// ---cut---
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const contextSequence = context.getSequence();
const session = new LlamaChatSession({contextSequence});

await contextSequence.loadStateFromFile("state.bin", {acceptRisk: true});// [!code highlight]
const chatHistory = JSON.parse(await fs.readFile("chatHistory.json", "utf8"));// [!code highlight]
session.setChatHistory(chatHistory);// [!code highlight]

const q2 = "Summarize what you said";
console.log("User: " + q2);

const a2 = await session.prompt(q2);
console.log("AI: " + a2);
```
:::

::::

## Prompt Without Updating Chat History {#prompt-without-updating-chat-history}
Prompt without saving the prompt to the chat history.

Expand Down
6 changes: 6 additions & 0 deletions docs/guide/cmakeOptions.data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,12 @@ function parseCmakeOptions(cmakeListsTxt: string, optionFilter: ((key: string) =
}
} else if (option.defaultValue === "${BUILD_SHARED_LIBS_DEFAULT}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF` on MinGW, `ON` otherwise");
else if (option.defaultValue === "${GGML_CUDA_GRAPHS_DEFAULT}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`ON`");
else if (option.defaultValue === "${GGML_NATIVE_DEFAULT}")
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF` when building for a different architecture,\n`ON` otherwise");
else if (option.key === "LLAMA_CURL")
option.defaultValue = htmlEscapeWithCodeMarkdown("`OFF`");
else
option.defaultValue = htmlEscapeWithCodeMarkdown(
option.defaultValue != null
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/downloading-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ You can reference models using a URI instead of their full download URL when usi
When downloading a model from a URI, the model files will be prefixed with a corresponding adaptation of the URI.

To reference a model from Hugging Face, you can use one of these schemes:
* `hf:<user>/<model>:<quant>` (`#<quant>` is optional, [but recommended](#hf-scheme-specify-quant))
* `hf:<user>/<model>:<quant>` (`:<quant>` is optional, [but recommended](#hf-scheme-specify-quant))
* `hf:<user>/<model>/<file-path>#<branch>` (`#<branch>` is optional)

Here are example usages of the Hugging Face URI scheme:
Expand Down
1 change: 1 addition & 0 deletions docs/guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,4 +316,5 @@ Explore the [API reference](../api/functions/getLlama.md) to learn more about th
and use the search bar (press <kbd class="doc-kbd">/</kbd>) to find documentation for a specific topic or API.

Check out the [roadmap](https://github.com/orgs/withcatai/projects/1) to see what's coming next,<br/>
visit the [awesome list](./awesome.md) to find great projects that use `node-llama-cpp`,<br/>
and consider [sponsoring `node-llama-cpp`](https://github.com/sponsors/giladgd) to accelerate the development of new features.
82 changes: 82 additions & 0 deletions docs/guide/low-level-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,3 +391,85 @@ console.log(
newTokens
);
```

### Save and Restore State {#save-and-restore-state}
You can save the evaluation state of a context sequence to then later load it back.

This is useful for avoiding the evaluation of tokens that you've already evaluated in the past.

::: warning
When loading a context sequence state from a file,
always ensure that the model used to create the context sequence is exactly the same as the one used to save the state file.

Loading a state file created from a different model can crash the process,
thus you have to pass `{acceptRisk: true}` to the [`loadStateFromFile`](../api/classes/LlamaContextSequence.md#loadstatefromfile) method to use it.

Use with caution.
:::

::: code-group
```typescript [Save state]
import {fileURLToPath} from "url";
import path from "path";
import {getLlama} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const sequence = context.getSequence();

const input = "The best way to";
const tokens = model.tokenize(input);
await sequence.evaluateWithoutGeneratingNewTokens(tokens);

console.log(
"Current state:",
model.detokenize(sequence.contextTokens, true),
sequence.contextTokens
);

await sequence.saveStateToFile("state.bin");// [!code highlight]
```
:::

::: code-group
```typescript [Load state]
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, Token} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));
// ---cut---
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const sequence = context.getSequence();

await sequence.loadStateFromFile("state.bin", {acceptRisk: true});// [!code highlight]

console.log(
"Loaded state:",
model.detokenize(sequence.contextTokens, true),
sequence.contextTokens
);

const input = " find";
const inputTokens = model.tokenize(input);
const maxTokens = 10;
const res: Token[] = [];
for await (const token of sequence.evaluate(inputTokens)) {
res.push(token);

if (res.length >= maxTokens)
break;
}

console.log("Result:", model.detokenize(res));
```
:::
13 changes: 12 additions & 1 deletion llama/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
cmake_minimum_required(VERSION 3.14)
cmake_minimum_required(VERSION 3.19)

if (NLC_CURRENT_PLATFORM STREQUAL "win-x64" OR NLC_CURRENT_PLATFORM STREQUAL "win-arm64")
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
endif()

if (NLC_CURRENT_PLATFORM STREQUAL "win-x64")
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreadedDebugDLL" CACHE STRING "" FORCE)
else()
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreadedDLL" CACHE STRING "" FORCE)
endif()
endif()

if (NLC_TARGET_PLATFORM STREQUAL "win-arm64" AND (CMAKE_GENERATOR STREQUAL "Ninja" OR CMAKE_GENERATOR STREQUAL "Ninja Multi-Config") AND NOT MINGW)
if(NLC_CURRENT_PLATFORM STREQUAL "win-x64")
include("./profiles/llvm.win32.host-x64.target-arm64.cmake")
Expand Down Expand Up @@ -70,6 +78,9 @@ add_subdirectory("llama.cpp")
include_directories("llama.cpp")
include_directories("./llama.cpp/common")

# This is needed to use methods in "llama-grammar.h" and "unicode.h"
target_include_directories(llama PUBLIC "./llama.cpp/src")

unset(GPU_INFO_HEADERS)
unset(GPU_INFO_SOURCES)
unset(GPU_INFO_EXTRA_LIBS)
Expand Down
Loading
Loading