You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+32-7Lines changed: 32 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -343,6 +343,10 @@ node index.js
343
343
344
344
### POST `/completion`: Given a `prompt`, it returns the predicted completion.
345
345
346
+
> [!IMPORTANT]
347
+
>
348
+
> This endpoint is **not** OAI-compatible
349
+
346
350
*Options:*
347
351
348
352
`prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, if `cache_prompt` is `true`, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. A `BOS` token is inserted at the start, if all of the following conditions are true:
@@ -448,27 +452,48 @@ These words will not be included in the completion, so make sure to add them to
448
452
449
453
- Note: When using streaming mode (`stream`), only `content` and `stop` will be returned until end of completion.
450
454
451
-
-`completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
455
+
-`completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has a nested array `top_logprobs`. It contains at **maximum**`n_probs` elements:
452
456
453
457
```json
454
458
{
455
-
"content": "<the token selected by the model>",
456
-
"probs": [
459
+
"content": "<the generated completion text>",
460
+
...
461
+
"completion_probabilities": [
457
462
{
463
+
"id": <token id>,
458
464
"prob": float,
459
-
"tok_str": "<most likely token>"
465
+
"token": "<most likely token>",
466
+
"bytes": [int, int, ...],
467
+
"top_logprobs": [
468
+
{
469
+
"id": <token id>,
470
+
"prob": float,
471
+
"token": "<token text>",
472
+
"bytes": [int, int, ...],
473
+
},
474
+
{
475
+
"id": <token id>,
476
+
"prob": float,
477
+
"token": "<token text>",
478
+
"bytes": [int, int, ...],
479
+
},
480
+
...
481
+
]
460
482
},
461
483
{
484
+
"id": <token id>,
462
485
"prob": float,
463
-
"tok_str": "<second most likely token>"
486
+
"token": "<most likely token>",
487
+
"bytes": [int, int, ...],
488
+
"top_logprobs": [
489
+
...
490
+
]
464
491
},
465
492
...
466
493
]
467
494
},
468
495
```
469
496
470
-
Notice that each `probs` is an array of length `n_probs`.
471
-
472
497
-`content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
473
498
-`stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
474
499
-`generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`. These options may differ from the original ones in some way (e.g. bad values filtered out, strings converted to tokens, etc.).
0 commit comments