Skip to content

Commit 288fb82

Browse files
committed
docs: response segments, stop generation
1 parent e9e94c8 commit 288fb82

File tree

4 files changed

+167
-8
lines changed

4 files changed

+167
-8
lines changed

docs/guide/chat-session.md

Lines changed: 116 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,10 @@ const a1 = await session.prompt(q1, {
100100
process.stdout.write(chunk);
101101
}
102102
});
103-
104103
```
105104

105+
> To stream `thought` segment, see [Stream Response Segments](#stream-response-segments)
106+
106107
## Repeat Penalty Customization {#repeat-penalty}
107108
You can see all the possible options of the [`prompt`](../api/classes/LlamaChatSession.md#prompt) function [here](../api/type-aliases/LLamaChatPromptOptions.md).
108109
```typescript
@@ -682,7 +683,7 @@ to make the model follow a certain direction in its response.
682683
```typescript
683684
import {fileURLToPath} from "url";
684685
import path from "path";
685-
import {getLlama, LlamaChatSession, GeneralChatWrapper} from "node-llama-cpp";
686+
import {getLlama, LlamaChatSession} from "node-llama-cpp";
686687

687688
const __dirname = path.dirname(fileURLToPath(import.meta.url));
688689

@@ -692,8 +693,7 @@ const model = await llama.loadModel({
692693
});
693694
const context = await model.createContext();
694695
const session = new LlamaChatSession({
695-
contextSequence: context.getSequence(),
696-
chatWrapper: new GeneralChatWrapper()
696+
contextSequence: context.getSequence()
697697
});
698698

699699

@@ -705,3 +705,115 @@ const a1 = await session.prompt(q1, {
705705
});
706706
console.log("AI: " + a1);
707707
```
708+
709+
## Stop Response Generation {#stop-response-generation}
710+
To stop the generation of the current response, without removing the existing partial generation from the chat history,
711+
you can use the [`stopOnAbortSignal`](../api/type-aliases/LLamaChatPromptOptions.md#stoponabortsignal) option
712+
to configure what happens when the given [`signal`](../api/type-aliases/LLamaChatPromptOptions.md#signal) is aborted.
713+
714+
```typescript
715+
import {fileURLToPath} from "url";
716+
import path from "path";
717+
import {getLlama, LlamaChatSession} from "node-llama-cpp";
718+
719+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
720+
721+
const llama = await getLlama();
722+
const model = await llama.loadModel({
723+
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
724+
});
725+
const context = await model.createContext();
726+
const session = new LlamaChatSession({
727+
contextSequence: context.getSequence()
728+
});
729+
730+
731+
const abortController = new AbortController();
732+
const q1 = "Hi there, how are you?";
733+
console.log("User: " + q1);
734+
735+
let response = "";
736+
737+
const a1 = await session.prompt(q1, {
738+
// stop the generation, instead of cancelling it
739+
stopOnAbortSignal: true,
740+
741+
signal: abortController.signal,
742+
onTextChunk(chunk) {
743+
response += chunk;
744+
745+
if (response.length >= 10)
746+
abortController.abort();
747+
}
748+
});
749+
console.log("AI: " + a1);
750+
```
751+
752+
753+
## Stream Response Segments {#stream-response-segments}
754+
The raw model response is automatically segmented into different types of segments.
755+
The main response is not segmented, but other kinds of sections, like thoughts (chain of thought), are segmented.
756+
757+
To stream response segments you can use the [`onResponseChunk`](../api/type-aliases/LLamaChatPromptOptions.md#onresponsechunk) option.
758+
759+
```typescript
760+
import {fileURLToPath} from "url";
761+
import path from "path";
762+
import {getLlama, LlamaChatSession} from "node-llama-cpp";
763+
764+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
765+
766+
const llama = await getLlama();
767+
const model = await llama.loadModel({
768+
modelPath: path.join(__dirname, "models", "DeepSeek-R1-Distill-Qwen-14B.Q4_K_M.gguf")
769+
});
770+
const context = await model.createContext();
771+
const session = new LlamaChatSession({
772+
contextSequence: context.getSequence()
773+
});
774+
775+
776+
const q1 = "Hi there, how are you?";
777+
console.log("User: " + q1);
778+
779+
process.stdout.write("AI: ");
780+
const a1 = await session.promptWithMeta(q1, {
781+
onResponseChunk(chunk) {
782+
const isThoughtSegment = chunk.type === "segment" &&
783+
chunk.segmentType === "thought";
784+
785+
if (chunk.type === "segment" && chunk.segmentStartTime != null)
786+
process.stdout.write(` [segment start: ${chunk.segmentType}] `);
787+
788+
process.stdout.write(chunk.text);
789+
790+
if (chunk.type === "segment" && chunk.segmentEndTime != null)
791+
process.stdout.write(` [segment end: ${chunk.segmentType}] `);
792+
}
793+
});
794+
795+
const fullResponse = a1.response
796+
.map((item) => {
797+
if (typeof item === "string")
798+
return item;
799+
else if (item.type === "segment") {
800+
const isThoughtSegment = item.segmentType === "thought";
801+
let res = "";
802+
803+
if (item.startTime != null)
804+
res += ` [segment start: ${item.segmentType}] `;
805+
806+
res += item.text;
807+
808+
if (item.endTime != null)
809+
res += ` [segment end: ${item.segmentType}] `;
810+
811+
return res;
812+
}
813+
814+
return "";
815+
})
816+
.join("");
817+
818+
console.log("Full response: " + fullResponse);
819+
```

docs/guide/external-chat-state.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,29 @@ const res = await llamaChat.generateResponse(chatHistory, {
6666
}
6767
});
6868

69+
const fullResponse = res.fullResponse
70+
.map((item) => {
71+
if (typeof item === "string")
72+
return item;
73+
else if (item.type === "segment") {
74+
let res = "";
75+
if (item.startTime != null)
76+
res += ` [segment start: ${item.segmentType}] `;
77+
78+
res += item.text;
79+
80+
if (item.endTime != null)
81+
res += ` [segment end: ${item.segmentType}] `;
82+
83+
return res;
84+
}
85+
86+
return "";
87+
})
88+
.join("");
89+
6990
console.log("AI: " + res.response);
91+
console.log("Full response:", fullResponse);
7092
```
7193

7294
Now, let's say we want to ask the model a follow-up question based on the previous response.
@@ -169,6 +191,7 @@ const res2 = await llamaChat.generateResponse(chatHistory, {
169191
});
170192

171193
console.log("AI: " + res2.response);
194+
console.log("Full response:", res2.fullResponse);
172195
```
173196

174197
## Handling Function Calling {#function-calling}
@@ -270,8 +293,31 @@ while (true) {
270293
lastContextShiftMetadata = res.lastEvaluation.contextShiftMetadata;
271294

272295
// print the text the model generated before calling functions
273-
if (res.response !== "")
296+
if (res.response !== "") {
297+
const fullResponse = res.fullResponse
298+
.map((item) => {
299+
if (typeof item === "string")
300+
return item;
301+
else if (item.type === "segment") {
302+
let res = "";
303+
if (item.startTime != null)
304+
res += ` [segment start: ${item.segmentType}] `;
305+
306+
res += item.text;
307+
308+
if (item.endTime != null)
309+
res += ` [segment end: ${item.segmentType}] `;
310+
311+
return res;
312+
}
313+
314+
return "";
315+
})
316+
.join("");
317+
274318
console.log("AI: " + res.response);
319+
console.log("Full response:", fullResponse);
320+
}
275321

276322
// when there are no function calls,
277323
// it means the model has finished generating the response

docs/guide/function-calling.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ description: Using function calling
77
When prompting a model using a [`LlamaChatSession`](../api/classes/LlamaChatSession.md), you can provide a list of functions that a model can call during generation to retrieve information or perform actions.
88

99
For this to work, `node-llama-cpp` tells the model what functions are available and what parameters they take, and instructs it to call those as needed.
10-
It also ensures that the model can only call functions with the correct parameters.
10+
It also ensures that when the model calls a function, it always uses the correct parameters.
1111

1212
Some models have built-in support for function calling, and some of them are not trained for that.
1313

1414
For example, _Llama 3_ is not trained for function calling.
1515
When using a _Llama 3_ model, the [`Llama3ChatWrapper`](../api/classes/Llama3ChatWrapper.md) is automatically used, and it includes a custom handling for function calling,
1616
which contains a fine-tuned instruction for explaining the model how to call functions and when to do so.
1717

18-
There are also model that do have built-in support for function calling, like _Llama 3.1_.
18+
There are also models that do have built-in support for function calling, like _Llama 3.1_.
1919
When using a _Llama 3.1_ model, the [`Llama3_1ChatWrapper`](../api/classes/Llama3_1ChatWrapper.md) is automatically used, and it knows how to handle function calling for this model.
2020

2121
In order for the model to know what functions can do and what they return, you need to provide this information in the function description.

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ features:
4242
linkText: Learn more
4343
- icon: <svg xmlns="http://www.w3.org/2000/svg" height="24" viewBox="0 -960 960 960" width="24" fill="currentColor"><path d="M600-160q-17 0-28.5-11.5T560-200q0-17 11.5-28.5T600-240h80q17 0 28.5-11.5T720-280v-80q0-38 22-69t58-44v-14q-36-13-58-44t-22-69v-80q0-17-11.5-28.5T680-720h-80q-17 0-28.5-11.5T560-760q0-17 11.5-28.5T600-800h80q50 0 85 35t35 85v80q0 17 11.5 28.5T840-560t28.5 11.5Q880-537 880-520v80q0 17-11.5 28.5T840-400t-28.5 11.5Q800-377 800-360v80q0 50-35 85t-85 35h-80Zm-320 0q-50 0-85-35t-35-85v-80q0-17-11.5-28.5T120-400t-28.5-11.5Q80-423 80-440v-80q0-17 11.5-28.5T120-560t28.5-11.5Q160-583 160-600v-80q0-50 35-85t85-35h80q17 0 28.5 11.5T400-760q0 17-11.5 28.5T360-720h-80q-17 0-28.5 11.5T240-680v80q0 38-22 69t-58 44v14q36 13 58 44t22 69v80q0 17 11.5 28.5T280-240h80q17 0 28.5 11.5T400-200q0 17-11.5 28.5T360-160h-80Z"/></svg>
4444
title: Powerful features
45-
details: Enforce a model to generate output according to a JSON schema, provide a model with functions it can call on demand, and much more
45+
details: Force a model to generate output according to a JSON schema, provide a model with functions it can call on demand, and much more
4646
link: /guide/grammar#json-schema
4747
linkText: Learn more
4848
---
@@ -98,6 +98,7 @@ npx -y node-llama-cpp inspect gpu
9898
* [User input safety](./guide/llama-text.md#input-safety-in-node-llama-cpp)
9999
* [Token prediction](./guide/token-prediction.md)
100100
* [Reranking](./guide/embedding.md#reranking)
101+
* [Thought segmentation](./guide/chat-session.md#stream-response-segments)
101102

102103
</template>
103104
<template v-slot:simple-code>

0 commit comments

Comments
 (0)