Skip to content

Commit ea12dc5

Browse files
authored
feat: chat session response prefix (#375)
* feat: chat session response prefix * feat: improve context shift strategy * feat: use RAM and swap sizes in memory usage estimations * feat(`inspect gguf` command): print a single key flag * feat: faster building from source * fix: Electron crash with some models on macOS when not using Metal * fix: adapt to `llama.cpp` breaking changes * fix: improve CPU compatibility score
1 parent 8145c94 commit ea12dc5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+2797
-833
lines changed

.github/ISSUE_TEMPLATE/bug-report.yml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,10 @@ body:
3535
id: steps
3636
attributes:
3737
label: Steps to reproduce
38-
description: >-
38+
description: |-
3939
Your bug can be investigated much faster if your code can be run without any dependencies other than `node-llama-cpp`.
4040
Issues without reproduction steps or code examples may be closed as not actionable.
41-
Please try to provide a Minimal, Complete, and Verifiable example ([link](http://stackoverflow.com/help/mcve)).
42-
Please include a link to the model file you used if possible.
41+
Please try to provide a Minimal, Complete, and Verifiable example ([link](http://stackoverflow.com/help/mcve)), including a link to the model file you used if possible.
4342
Also, please enable enable debug logs by using `getLlama({debug: true})` to get more information.
4443
placeholder: >-
4544
Please try to provide a Minimal, Complete, and Verifiable example.
@@ -50,10 +49,9 @@ body:
5049
id: env
5150
attributes:
5251
label: My Environment
53-
description: >-
52+
description: |-
5453
Please include the result of the command `npx --yes node-llama-cpp inspect gpu`.
55-
Please also add any other relevant dependencies to this table at the end.
56-
For example: Electron, Bun, Webpack.
54+
Please also add any other relevant dependencies to this table at the end. For example: Electron, Bun, Webpack.
5755
value: |
5856
| Dependency | Version |
5957
| --- | --- |

.github/ISSUE_TEMPLATE/documentation-issue.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ body:
1313
id: details
1414
attributes:
1515
label: What was unclear or otherwise insufficient?
16-
description: >-
16+
description: |-
1717
If relevant, please be clear about the documentation URL, as well as the location within the page.
1818
Add a link to the relevant documentation you're referring to.
1919
placeholder: >-

.github/ISSUE_TEMPLATE/feature-request.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,12 @@ body:
5151
required: false
5252
- label: CUDA support
5353
required: false
54+
- label: Vulkan support
55+
required: false
5456
- label: Grammar
5557
required: false
58+
- label: Function calling
59+
required: false
5660
- type: dropdown
5761
id: pr
5862
attributes:

.github/workflows/build.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,7 @@ jobs:
383383

384384
model-dependent-tests:
385385
name: Model dependent tests
386-
runs-on: macos-13
386+
runs-on: macos-12
387387
env:
388388
NODE_LLAMA_CPP_GPU: false
389389
needs:
@@ -417,6 +417,9 @@ jobs:
417417
- name: Build binary
418418
run: node ./dist/cli/cli.js source build --noUsageExample
419419

420+
- name: Inspect hardware
421+
run: node ./dist/cli/cli.js inspect gpu
422+
420423
- name: Cache models
421424
id: cache-test-models
422425
uses: actions/cache@v4

.vitepress/config.ts

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ const packageVersion = env.get("DOCS_PACKAGE_VERSION")
3434
.default(packageJson.version)
3535
.asString();
3636

37-
const hostname = "https://node-llama-cpp.withcat.ai/";
37+
const hostname = "https://node-llama-cpp.withcat.ai/"
38+
const buildDate = new Date();
3839

3940
const socialPosterLink = hostname + "social.poster.jpg";
4041
const defaultPageTitle = "node-llama-cpp - node.js bindings for llama.cpp";
@@ -90,7 +91,7 @@ export default defineConfig({
9091
base: urlBase,
9192
sitemap: {
9293
hostname,
93-
transformItems(items) {
94+
async transformItems(items) {
9495
function priorityMatch(a: {url: string}, b: {url: string}, matchers: ((url: string) => boolean)[]): number {
9596
for (const matcher of matchers) {
9697
const aMatch = matcher(a.url);
@@ -105,13 +106,38 @@ export default defineConfig({
105106
return 0;
106107
}
107108

109+
const blogPosts = await createContentLoader("blog/*.md", {
110+
excerpt: true,
111+
render: true
112+
})
113+
.load();
114+
const blogPostMap = new Map<string, typeof blogPosts[number]>();
115+
for (const blogPost of blogPosts) {
116+
let url = blogPost.url;
117+
if (url.startsWith("/"))
118+
url = url.slice("/".length);
119+
120+
blogPostMap.set(url, blogPost);
121+
}
122+
108123
return items
109124
.map((item) => {
110-
if (item.url.startsWith("api/") || item.url.startsWith("cli/")) {
125+
if (item.url === "" || item.url === "blog/") {
126+
item.lastmod = new Date(buildDate);
127+
} else if (item.url.startsWith("api/") || item.url.startsWith("cli/")) {
111128
item = {
112129
...item,
113-
lastmod: undefined
130+
lastmod: new Date(buildDate)
114131
};
132+
} else if (item.lastmod == null && item.url.startsWith("blog/")) {
133+
const postDate = blogPostMap.get(item.url)?.frontmatter.date;
134+
if (postDate != null) {
135+
const parsedDate = new Date(postDate);
136+
if (Number.isFinite(parsedDate.getTime()))
137+
item.lastmod = parsedDate;
138+
}
139+
} else if (item.lastmod == null) {
140+
item.lastmod = new Date(buildDate);
115141
}
116142

117143
return item;

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<div align="center">
2-
<img alt="node-llama-cpp Logo" src="https://raw.githubusercontent.com/withcatai/node-llama-cpp/master/assets/logo.v3.roundEdges.avif" width="360px" />
2+
<a href="https://node-llama-cpp.withcat.ai" target="_blank"><img alt="node-llama-cpp Logo" src="https://raw.githubusercontent.com/withcatai/node-llama-cpp/master/assets/logo.v3.roundEdges.avif" width="360px" /></a>
33
<h1>node-llama-cpp</h1>
44
<p>Run AI models locally on your machine</p>
55
<sub>Pre-built bindings are provided with a fallback to building from source with cmake</sub>

docs/guide/chat-session.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -671,3 +671,34 @@ await new Promise(resolve => setTimeout(resolve, 1500));
671671
const cachedCompletion = completionEngine.complete("Hi there! How");
672672
console.log("Cached completion:", cachedCompletion);
673673
```
674+
675+
## Response Prefix {#response-prefix}
676+
You can force the model response to start with a specific prefix,
677+
to make the model follow a certain direction in its response.
678+
679+
```typescript
680+
import {fileURLToPath} from "url";
681+
import path from "path";
682+
import {getLlama, LlamaChatSession, GeneralChatWrapper} from "node-llama-cpp";
683+
684+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
685+
686+
const llama = await getLlama();
687+
const model = await llama.loadModel({
688+
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
689+
});
690+
const context = await model.createContext();
691+
const session = new LlamaChatSession({
692+
contextSequence: context.getSequence(),
693+
chatWrapper: new GeneralChatWrapper()
694+
});
695+
696+
697+
const q1 = "Hi there, how are you?";
698+
console.log("User: " + q1);
699+
700+
const a1 = await session.prompt(q1, {
701+
responsePrefix: "The weather today is"
702+
});
703+
console.log("AI: " + a1);
704+
```

docs/guide/electron.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,27 @@ so that `node-llama-cpp` can find them.
3737
Cross packaging from one platform to another is not supported, since binaries for other platforms are not downloaded to you machine when your run `npm install`.
3838

3939
Packaging an `arm64` app on an `x64` machine is supported, but packaging an `x64` app on an `arm64` machine is not.
40+
41+
## Bundling
42+
When bundling your code for Electron using [Electron Vite](https://electron-vite.org) or Webpack,
43+
ensure that `node-llama-cpp` is not bundled, and is instead treated as an external module.
44+
45+
Marking `node-llama-cpp` as an external module will prevent its code from being bundled with your application code,
46+
and instead, it'll be loaded from the `node_modules` directory at runtime (which should be packed into a `.asar` archive).
47+
48+
The file structure of `node-llama-cpp` is crucial for it to function correctly,
49+
so bundling it will break its functionality.
50+
Moreover, since `node-llama-cpp` includes prebuilt binaries (and also local builds from source),
51+
those files must be retained in their original structure for it to work.
52+
53+
Electron has [its own bundling solution called ASAR](https://www.electronjs.org/docs/latest/tutorial/asar-archives) that is designed to work with node modules.
54+
ASAR retains the original file structure of node modules by packing all the files into a single `.asar` archive file that Electron will read from at runtime like it would from the file system.
55+
This method ensures node modules work as intended in Electron applications, even though they are bundled into a single file.
56+
57+
Using ASAR is the recommended way to bundle `node-llama-cpp` in your Electron app.
58+
59+
If you're using the scaffolded Electron app, this is already taken care of.
60+
61+
::: tip NOTE
62+
We recommend using [Electron Vite](https://electron-vite.org) over Webpack for your Electron app due to to Vite's speed and Webpack's lack of proper ESM support in the output bundle, which complicates the bundling process.
63+
:::

docs/guide/tips-and-tricks.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,37 @@ npx --no node-llama-cpp source download
8585
```
8686

8787
Now, just use `node-llama-cpp` as you normally would.
88+
89+
## Intel AMX {#intel-amx}
90+
> Intel AMX (Advanced Matrix Extensions) is a dedicated hardware block found on Intel Xeon processors
91+
> that helps optimize and accelerate matrix multiplication operations.
92+
>
93+
> It's available on the 4th Gen and newer Intel Xeon processors.
94+
95+
Intel AMX can improve CPU inference performance [by 2x and up to even 14x](https://github.com/ggerganov/llama.cpp/pull/7707) faster inference times on supported CPUs (on specific conditions).
96+
97+
If you're using a 4th Gen or newer Intel Xeon processor,
98+
you might want to [build `llama.cpp` from source](./building-from-source.md) to utilize these hardware-specific optimizations available on your hardware.
99+
100+
To do this, run this command inside your project on the machine you run your project on:
101+
```shell
102+
npx --no node-llama-cpp source download
103+
```
104+
105+
Alternatively, you can force `node-llama-cpp` to not use its prebuilt binaries
106+
and instead build from source when calling [`getLlama`](../api/functions/getLlama.md) for the first time on a Xeon CPU:
107+
108+
```typescript
109+
import os from "os";
110+
import {getLlama} from "node-llama-cpp";
111+
112+
const llama = await getLlama({
113+
usePrebuiltBinaries: !os.cpus().some((cpu) => (
114+
cpu.model.toLowerCase().includes("Xeon".toLowerCase())
115+
))
116+
});
117+
```
118+
::: info NOTE
119+
Building from source can take some time (when using CUDA even up to an hour in extreme cases),
120+
so ensure you dedicate some time for this as part of the deployment process.
121+
:::

llama/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ execute_process(COMMAND node -p "require('node-addon-api').include.slice(1,-1)"
2222
OUTPUT_VARIABLE NODE_ADDON_API_DIR
2323
OUTPUT_STRIP_TRAILING_WHITESPACE)
2424

25+
set(LLAMA_BUILD_COMMON ON)
26+
27+
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang" OR CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
28+
add_compile_options(-Wno-c++17-extensions)
29+
endif()
30+
2531
include_directories(${NODE_ADDON_API_DIR} ${CMAKE_JS_INC})
2632

2733
add_subdirectory("llama.cpp")

0 commit comments

Comments
 (0)