Skip to content

Commit c23bfc5

Browse files
authored
docs: update server streaming mode documentation
Provide more documentation for streaming mode, including an example script.
1 parent 23e0d70 commit c23bfc5

File tree

1 file changed

+44
-7
lines changed

1 file changed

+44
-7
lines changed

examples/server/README.md

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -249,23 +249,23 @@ mkdir llama-client
249249
cd llama-client
250250
```
251251

252-
Create a index.js file and put this inside:
252+
Create an index.js file and put this inside:
253253

254254
```javascript
255-
const prompt = `Building a website can be done in 10 simple steps:`;
255+
const prompt = `Building a website can be done in 10 simple steps:`
256256

257-
async function Test() {
257+
async function test() {
258258
let response = await fetch("http://127.0.0.1:8080/completion", {
259-
method: 'POST',
259+
method: "POST",
260260
body: JSON.stringify({
261261
prompt,
262-
n_predict: 512,
262+
n_predict: 64,
263263
})
264264
})
265265
console.log((await response.json()).content)
266266
}
267267

268-
Test()
268+
test()
269269
```
270270

271271
And run it:
@@ -274,6 +274,33 @@ And run it:
274274
node index.js
275275
```
276276

277+
Alternative script to test streaming mode (chunk splitting and error handling should be enhanced for production use):
278+
279+
```javascript
280+
(async () => {
281+
const response = await fetch("http://localhost:8080/completion", {
282+
method: "POST",
283+
body: JSON.stringify({
284+
prompt: "To write an essay quickly",
285+
n_predict: 256,
286+
stream: true
287+
})
288+
})
289+
for await (const chunk of response.body.pipeThrough(new TextDecoderStream("utf-8"))) {
290+
for (const event of chunk.split(/(?<=\n\n)/v)) {
291+
if (event.startsWith("error")) {
292+
break
293+
}
294+
const data = JSON.parse(event.substring(6))
295+
if (data.stop) {
296+
break
297+
}
298+
process.stdout.write(data.content)
299+
}
300+
}
301+
})()
302+
```
303+
277304
## API Endpoints
278305

279306
### GET `/health`: Returns heath check result
@@ -314,7 +341,7 @@ node index.js
314341
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
315342
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
316343

317-
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
344+
`stream`: Allows receiving each predicted token in real-time instead of waiting for the completion to finish (uses a different response format). To enable this, set to `true`.
318345

319346
`stop`: Specify a JSON array of stopping strings.
320347
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
@@ -402,6 +429,16 @@ Notice that each `probs` is an array of length `n_probs`.
402429
- `tokens_evaluated`: Number of tokens evaluated in total from the prompt
403430
- `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
404431

432+
In streaming mode, response chunks currently use the following format, with chunks separated by `\n\n`:
433+
434+
```
435+
data: {"content":" token","stop":false,"id_slot":0,"multimodal":false,"index":0}
436+
437+
data: {"content":",","stop":false,"id_slot":0,"multimodal":false,"index":0}
438+
```
439+
440+
Although this resembles the [Server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events) standard, the `EventSource` interface cannot be used due to its lack of `POST` request support.
441+
405442
### POST `/tokenize`: Tokenize a given text
406443

407444
*Options:*

0 commit comments

Comments
 (0)