1
+ <div align =" center " >
2
+
1
3
# Node Llama.cpp
2
4
Node.js bindings for llama.cpp.
3
5
4
- Pre-built bindings are provided with a fallback to building from source with ` node-gyp ` .
6
+ < sub > Pre-built bindings are provided with a fallback to building from source with ` node-gyp ` .< sub >
5
7
6
8
[ ![ Build] ( https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml/badge.svg )] ( https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml )
9
+ [ ![ License] ( https://badgen.net/badge/color/MIT/green?label=license )] ( https://www.npmjs.com/package/node-llama-cpp )
10
+ [ ![ License] ( https://badgen.net/badge/color/TypeScript/blue?label=types )] ( https://www.npmjs.com/package/node-llama-cpp )
7
11
[ ![ Version] ( https://badgen.net/npm/v/node-llama-cpp )] ( https://www.npmjs.com/package/node-llama-cpp )
8
12
13
+ </div >
9
14
10
15
## Installation
11
16
``` bash
@@ -113,8 +118,8 @@ console.log("AI: " + q1);
113
118
114
119
const tokens = context .encode (q1 );
115
120
const res: number [] = [];
116
- for await (const chunk of context .evaluate (tokens )) {
117
- res .push (chunk );
121
+ for await (const modelToken of context .evaluate (tokens )) {
122
+ res .push (modelToken );
118
123
119
124
// it's important to not concatinate the results as strings,
120
125
// as doing so will break some characters (like some emojis) that are made of multiple tokens.
@@ -130,15 +135,55 @@ const a1 = context.decode(Uint32Array.from(res)).split("USER:")[0];
130
135
console .log (" AI: " + a1 );
131
136
```
132
137
138
+ #### With grammar
139
+ Use this to direct the model to generate a specific format of text, like ` JSON ` for example.
140
+
141
+ > ** Note:** there's an issue with some grammars where the model won't stop generating output,
142
+ > so it's advised to use it together with ` maxTokens ` set to the context size of the model
143
+
144
+ ``` typescript
145
+ import {fileURLToPath } from " url" ;
146
+ import path from " path" ;
147
+ import {LlamaModel , LlamaGrammar , LlamaContext , LlamaChatSession } from " node-llama-cpp" ;
148
+
149
+ const __dirname = path .dirname (fileURLToPath (import .meta .url ));
150
+
151
+ const model = new LlamaModel ({
152
+ modelPath: path .join (__dirname , " models" , " codellama-13b.Q3_K_M.gguf" )
153
+ })
154
+ const grammar = await LlamaGrammar .getFor (" json" );
155
+ const context = new LlamaContext ({
156
+ model ,
157
+ grammar
158
+ });
159
+ const session = new LlamaChatSession ({context });
160
+
161
+
162
+ const q1 = ' Create a JSON that contains a message saying "hi there"' ;
163
+ console .log (" User: " + q1 );
164
+
165
+ const a1 = await session .prompt (q1 , {maxTokens: context .getContextSize ()});
166
+ console .log (" AI: " + a1 );
167
+ console .log (JSON .parse (a1 ));
168
+
169
+
170
+ const q2 = ' Add another field to the JSON with the key being "author" and the value being "LLama"' ;
171
+ console .log (" User: " + q2 );
172
+
173
+ const a2 = await session .prompt (q2 , {maxTokens: context .getContextSize ()});
174
+ console .log (" AI: " + a2 );
175
+ console .log (JSON .parse (a2 ));
176
+ ```
177
+
133
178
### CLI
134
179
```
135
180
Usage: node-llama-cpp <command> [options]
136
181
137
182
Commands:
138
- node-llama-cpp download Download a release of llama.cpp and compile it
139
- node-llama-cpp build Compile the currently downloaded llama.cpp
140
- node-llama-cpp clear [type] Clear files created by llama-cli
141
- node-llama-cpp chat Chat with a LLama model
183
+ node-llama-cpp download Download a release of llama.cpp and compile it
184
+ node-llama-cpp build Compile the currently downloaded llama.cpp
185
+ node-llama-cpp clear [type] Clear files created by node- llama-cpp
186
+ node-llama-cpp chat Chat with a LLama model
142
187
143
188
Options:
144
189
-h, --help Show help [boolean]
@@ -152,15 +197,17 @@ node-llama-cpp download
152
197
Download a release of llama.cpp and compile it
153
198
154
199
Options:
155
- -h, --help Show help [boolean]
156
- --repo The GitHub repository to download a release of llama.cpp from. Can also be set v
157
- ia the NODE_LLAMA_CPP_REPO environment variable
200
+ -h, --help Show help [boolean]
201
+ --repo The GitHub repository to download a release of llama.cpp from. Can also be
202
+ set via the NODE_LLAMA_CPP_REPO environment variable
158
203
[string] [default: "ggerganov/llama.cpp"]
159
- --release The tag of the llama.cpp release to download. Can also be set via the NODE_LLAMA
160
- _CPP_REPO_RELEASE environment variable [string] [default: "latest"]
161
- --arch The architecture to compile llama.cpp for [string]
162
- --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
163
- -v, --version Show version number [boolean]
204
+ --release The tag of the llama.cpp release to download. Set to "latest" to download t
205
+ he latest release. Can also be set via the NODE_LLAMA_CPP_REPO_RELEASE envi
206
+ ronment variable [string] [default: "latest"]
207
+ -a, --arch The architecture to compile llama.cpp for [string]
208
+ -t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
209
+ --skipBuild, --sb Skip building llama.cpp after downloading it [boolean] [default: false]
210
+ -v, --version Show version number [boolean]
164
211
```
165
212
166
213
#### ` build ` command
@@ -171,16 +218,16 @@ Compile the currently downloaded llama.cpp
171
218
172
219
Options:
173
220
-h, --help Show help [boolean]
174
- --arch The architecture to compile llama.cpp for [string]
175
- --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
221
+ -a, --arch The architecture to compile llama.cpp for [string]
222
+ -t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
176
223
-v, --version Show version number [boolean]
177
224
```
178
225
179
226
#### ` clear ` command
180
227
```
181
228
node-llama-cpp clear [type]
182
229
183
- Clear files created by llama-cli
230
+ Clear files created by node- llama-cpp
184
231
185
232
Options:
186
233
-h, --help Show help [boolean]
@@ -195,20 +242,45 @@ node-llama-cpp chat
195
242
Chat with a LLama model
196
243
197
244
Required:
198
- --model LLama model file to use for the chat [string] [required]
245
+ -m, --model LLama model file to use for the chat [string] [required]
199
246
200
247
Optional:
201
- --systemInfo Print llama.cpp system info [boolean] [default: false]
202
- --systemPrompt System prompt to use against the model. [default value: You are a helpful, res
203
- pectful and honest assistant. Always answer as helpfully as possible. If a que
204
- stion does not make any sense, or is not factually coherent, explain why inste
205
- ad of answering something not correct. If you don't know the answer to a quest
206
- ion , please don't share false information.]
248
+ -i, --systemInfo Print llama.cpp system info [boolean] [default: false]
249
+ -s, --systemPrompt System prompt to use against the model. [default value: You are a helpful,
250
+ respectful and honest assistant. Always answer as helpfully as possible. If
251
+ a question does not make any sense, or is not factually coherent, explain
252
+ why instead of answering something not correct. If you don't know the answe
253
+ r to a question , please don't share false information.]
207
254
[string] [default: "You are a helpful, respectful and honest assistant. Always answer as helpfully
208
255
as possible.
209
256
If a question does not make any sense, or is not factually coherent, explain why ins
210
257
tead of answering something not correct. If you don't know the answer to a question, please don't
211
258
share false information."]
259
+ -w, --wrapper Chat wrapper to use. Use `auto` to automatically select a wrapper based on
260
+ the model's BOS token
261
+ [string] [choices: "auto", "general", "llamaChat", "chatML"] [default: "general"]
262
+ -c, --contextSize Context size to use for the model [number] [default: 4096]
263
+ -g, --grammar Restrict the model response to a specific grammar, like JSON for example
264
+ [string] [choices: "text", "json", "list", "arithmetic", "japanese", "chess"] [default: "text"]
265
+ -t, --temperature Temperature is a hyperparameter that controls the randomness of the generat
266
+ ed text. It affects the probability distribution of the model's output toke
267
+ ns. A higher temperature (e.g., 1.5) makes the output more random and creat
268
+ ive, while a lower temperature (e.g., 0.5) makes the output more focused, d
269
+ eterministic, and conservative. The suggested temperature is 0.8, which pro
270
+ vides a balance between randomness and determinism. At the extreme, a tempe
271
+ rature of 0 will always pick the most likely next token, leading to identic
272
+ al outputs in each run. Set to `0` to disable. [number] [default: 0]
273
+ -k, --topK Limits the model to consider only the K most likely next tokens for samplin
274
+ g at each step of sequence generation. An integer number between `1` and th
275
+ e size of the vocabulary. Set to `0` to disable (which uses the full vocabu
276
+ lary). Only relevant when `temperature` is set to a value greater than 0.
277
+ [number] [default: 40]
278
+ -p, --topP Dynamically selects the smallest set of tokens whose cumulative probability
279
+ exceeds the threshold P, and samples the next token only from this set. A
280
+ float number between `0` and `1`. Set to `1` to disable. Only relevant when
281
+ `temperature` is set to a value greater than `0`. [number] [default: 0.95]
282
+ --maxTokens, --mt Maximum number of tokens to generate in responses. Set to `0` to disable. S
283
+ et to `-1` to set to the context size [number] [default: 0]
212
284
213
285
Options:
214
286
-h, --help Show help [boolean]
0 commit comments