Skip to content

Commit c28d2de

Browse files
authored
feat: add grammar support (#13)
* feat: add grammar support * feat: add `temperature`, `topK`, and `topP` parameters support * feat: add `maxTokens` support in `LlamaChatSession` * feat: make `GeneralChatPromptWrapper` and `LlamaChatPromptWrapper` more stable * feat: add `ChatMLPromptWrapper` * feat: add aliases to commands * docs: improve `README.md`
1 parent 95b7c43 commit c28d2de

28 files changed

+810
-116
lines changed

.eslintrc.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"browser": false,
66
"es6": true
77
},
8-
"ignorePatterns": ["/dist", "/llama"],
8+
"ignorePatterns": ["/dist", "/llama", "/docs"],
99
"extends": [
1010
"eslint:recommended"
1111
],

.github/workflows/build.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,8 @@ jobs:
232232
mv artifacts/build dist/
233233
mv artifacts/docs docs/
234234
235+
cp -r artifacts/llama.cpp/grammars llama/grammars
236+
235237
rm -f ./llama/binariesGithubRelease.json
236238
mv artifacts/binariesGithubRelease/binariesGithubRelease.json ./llama/binariesGithubRelease.json
237239

README.md

Lines changed: 97 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
1+
<div align="center">
2+
13
# Node Llama.cpp
24
Node.js bindings for llama.cpp.
35

4-
Pre-built bindings are provided with a fallback to building from source with `node-gyp`.
6+
<sub>Pre-built bindings are provided with a fallback to building from source with `node-gyp`.<sub>
57

68
[![Build](https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml/badge.svg)](https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml)
9+
[![License](https://badgen.net/badge/color/MIT/green?label=license)](https://www.npmjs.com/package/node-llama-cpp)
10+
[![License](https://badgen.net/badge/color/TypeScript/blue?label=types)](https://www.npmjs.com/package/node-llama-cpp)
711
[![Version](https://badgen.net/npm/v/node-llama-cpp)](https://www.npmjs.com/package/node-llama-cpp)
812

13+
</div>
914

1015
## Installation
1116
```bash
@@ -113,8 +118,8 @@ console.log("AI: " + q1);
113118

114119
const tokens = context.encode(q1);
115120
const res: number[] = [];
116-
for await (const chunk of context.evaluate(tokens)) {
117-
res.push(chunk);
121+
for await (const modelToken of context.evaluate(tokens)) {
122+
res.push(modelToken);
118123

119124
// it's important to not concatinate the results as strings,
120125
// as doing so will break some characters (like some emojis) that are made of multiple tokens.
@@ -130,15 +135,55 @@ const a1 = context.decode(Uint32Array.from(res)).split("USER:")[0];
130135
console.log("AI: " + a1);
131136
```
132137

138+
#### With grammar
139+
Use this to direct the model to generate a specific format of text, like `JSON` for example.
140+
141+
> **Note:** there's an issue with some grammars where the model won't stop generating output,
142+
> so it's advised to use it together with `maxTokens` set to the context size of the model
143+
144+
```typescript
145+
import {fileURLToPath} from "url";
146+
import path from "path";
147+
import {LlamaModel, LlamaGrammar, LlamaContext, LlamaChatSession} from "node-llama-cpp";
148+
149+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
150+
151+
const model = new LlamaModel({
152+
modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
153+
})
154+
const grammar = await LlamaGrammar.getFor("json");
155+
const context = new LlamaContext({
156+
model,
157+
grammar
158+
});
159+
const session = new LlamaChatSession({context});
160+
161+
162+
const q1 = 'Create a JSON that contains a message saying "hi there"';
163+
console.log("User: " + q1);
164+
165+
const a1 = await session.prompt(q1, {maxTokens: context.getContextSize()});
166+
console.log("AI: " + a1);
167+
console.log(JSON.parse(a1));
168+
169+
170+
const q2 = 'Add another field to the JSON with the key being "author" and the value being "LLama"';
171+
console.log("User: " + q2);
172+
173+
const a2 = await session.prompt(q2, {maxTokens: context.getContextSize()});
174+
console.log("AI: " + a2);
175+
console.log(JSON.parse(a2));
176+
```
177+
133178
### CLI
134179
```
135180
Usage: node-llama-cpp <command> [options]
136181
137182
Commands:
138-
node-llama-cpp download Download a release of llama.cpp and compile it
139-
node-llama-cpp build Compile the currently downloaded llama.cpp
140-
node-llama-cpp clear [type] Clear files created by llama-cli
141-
node-llama-cpp chat Chat with a LLama model
183+
node-llama-cpp download Download a release of llama.cpp and compile it
184+
node-llama-cpp build Compile the currently downloaded llama.cpp
185+
node-llama-cpp clear [type] Clear files created by node-llama-cpp
186+
node-llama-cpp chat Chat with a LLama model
142187
143188
Options:
144189
-h, --help Show help [boolean]
@@ -152,15 +197,17 @@ node-llama-cpp download
152197
Download a release of llama.cpp and compile it
153198
154199
Options:
155-
-h, --help Show help [boolean]
156-
--repo The GitHub repository to download a release of llama.cpp from. Can also be set v
157-
ia the NODE_LLAMA_CPP_REPO environment variable
200+
-h, --help Show help [boolean]
201+
--repo The GitHub repository to download a release of llama.cpp from. Can also be
202+
set via the NODE_LLAMA_CPP_REPO environment variable
158203
[string] [default: "ggerganov/llama.cpp"]
159-
--release The tag of the llama.cpp release to download. Can also be set via the NODE_LLAMA
160-
_CPP_REPO_RELEASE environment variable [string] [default: "latest"]
161-
--arch The architecture to compile llama.cpp for [string]
162-
--nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
163-
-v, --version Show version number [boolean]
204+
--release The tag of the llama.cpp release to download. Set to "latest" to download t
205+
he latest release. Can also be set via the NODE_LLAMA_CPP_REPO_RELEASE envi
206+
ronment variable [string] [default: "latest"]
207+
-a, --arch The architecture to compile llama.cpp for [string]
208+
-t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
209+
--skipBuild, --sb Skip building llama.cpp after downloading it [boolean] [default: false]
210+
-v, --version Show version number [boolean]
164211
```
165212

166213
#### `build` command
@@ -171,16 +218,16 @@ Compile the currently downloaded llama.cpp
171218
172219
Options:
173220
-h, --help Show help [boolean]
174-
--arch The architecture to compile llama.cpp for [string]
175-
--nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
221+
-a, --arch The architecture to compile llama.cpp for [string]
222+
-t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
176223
-v, --version Show version number [boolean]
177224
```
178225

179226
#### `clear` command
180227
```
181228
node-llama-cpp clear [type]
182229
183-
Clear files created by llama-cli
230+
Clear files created by node-llama-cpp
184231
185232
Options:
186233
-h, --help Show help [boolean]
@@ -195,20 +242,45 @@ node-llama-cpp chat
195242
Chat with a LLama model
196243
197244
Required:
198-
--model LLama model file to use for the chat [string] [required]
245+
-m, --model LLama model file to use for the chat [string] [required]
199246
200247
Optional:
201-
--systemInfo Print llama.cpp system info [boolean] [default: false]
202-
--systemPrompt System prompt to use against the model. [default value: You are a helpful, res
203-
pectful and honest assistant. Always answer as helpfully as possible. If a que
204-
stion does not make any sense, or is not factually coherent, explain why inste
205-
ad of answering something not correct. If you don't know the answer to a quest
206-
ion, please don't share false information.]
248+
-i, --systemInfo Print llama.cpp system info [boolean] [default: false]
249+
-s, --systemPrompt System prompt to use against the model. [default value: You are a helpful,
250+
respectful and honest assistant. Always answer as helpfully as possible. If
251+
a question does not make any sense, or is not factually coherent, explain
252+
why instead of answering something not correct. If you don't know the answe
253+
r to a question, please don't share false information.]
207254
[string] [default: "You are a helpful, respectful and honest assistant. Always answer as helpfully
208255
as possible.
209256
If a question does not make any sense, or is not factually coherent, explain why ins
210257
tead of answering something not correct. If you don't know the answer to a question, please don't
211258
share false information."]
259+
-w, --wrapper Chat wrapper to use. Use `auto` to automatically select a wrapper based on
260+
the model's BOS token
261+
[string] [choices: "auto", "general", "llamaChat", "chatML"] [default: "general"]
262+
-c, --contextSize Context size to use for the model [number] [default: 4096]
263+
-g, --grammar Restrict the model response to a specific grammar, like JSON for example
264+
[string] [choices: "text", "json", "list", "arithmetic", "japanese", "chess"] [default: "text"]
265+
-t, --temperature Temperature is a hyperparameter that controls the randomness of the generat
266+
ed text. It affects the probability distribution of the model's output toke
267+
ns. A higher temperature (e.g., 1.5) makes the output more random and creat
268+
ive, while a lower temperature (e.g., 0.5) makes the output more focused, d
269+
eterministic, and conservative. The suggested temperature is 0.8, which pro
270+
vides a balance between randomness and determinism. At the extreme, a tempe
271+
rature of 0 will always pick the most likely next token, leading to identic
272+
al outputs in each run. Set to `0` to disable. [number] [default: 0]
273+
-k, --topK Limits the model to consider only the K most likely next tokens for samplin
274+
g at each step of sequence generation. An integer number between `1` and th
275+
e size of the vocabulary. Set to `0` to disable (which uses the full vocabu
276+
lary). Only relevant when `temperature` is set to a value greater than 0.
277+
[number] [default: 40]
278+
-p, --topP Dynamically selects the smallest set of tokens whose cumulative probability
279+
exceeds the threshold P, and samples the next token only from this set. A
280+
float number between `0` and `1`. Set to `1` to disable. Only relevant when
281+
`temperature` is set to a value greater than `0`. [number] [default: 0.95]
282+
--maxTokens, --mt Maximum number of tokens to generate in responses. Set to `0` to disable. S
283+
et to `-1` to set to the context size [number] [default: 0]
212284
213285
Options:
214286
-h, --help Show help [boolean]

0 commit comments

Comments
 (0)