You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+44-40Lines changed: 44 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -449,52 +449,56 @@ These words will not be included in the completion, so make sure to add them to
449
449
450
450
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
451
451
452
+
`post_sampling_probs`: Returns the probabilities of top `n_probs` tokens after applying sampling chain.
453
+
452
454
**Response format**
453
455
454
456
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
455
457
456
458
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has a nested array `top_logprobs`. It contains at **maximum** `n_probs` elements:
457
-
458
-
```json
459
-
{
460
-
"content": "<the generated completion text>",
461
-
"tokens": [ generated token ids if requested ],
462
-
...
463
-
"probs": [
464
-
{
465
-
"id": <token id>,
466
-
"logprob": float,
467
-
"token": "<most likely token>",
468
-
"bytes": [int, int, ...],
469
-
"top_logprobs": [
470
-
{
471
-
"id": <token id>,
472
-
"logprob": float,
473
-
"token": "<token text>",
474
-
"bytes": [int, int, ...],
475
-
},
476
-
{
477
-
"id": <token id>,
478
-
"logprob": float,
479
-
"token": "<token text>",
480
-
"bytes": [int, int, ...],
481
-
},
482
-
...
483
-
]
484
-
},
485
-
{
486
-
"id": <token id>,
487
-
"logprob": float,
488
-
"token": "<most likely token>",
489
-
"bytes": [int, int, ...],
490
-
"top_logprobs": [
491
-
...
492
-
]
493
-
},
459
+
```json
460
+
{
461
+
"content": "<the generated completion text>",
462
+
"tokens": [ generated token ids if requested ],
494
463
...
495
-
]
496
-
},
497
-
```
464
+
"probs": [
465
+
{
466
+
"id": <token id>,
467
+
"logprob": float,
468
+
"token": "<most likely token>",
469
+
"bytes": [int, int, ...],
470
+
"top_logprobs": [
471
+
{
472
+
"id": <token id>,
473
+
"logprob": float,
474
+
"token": "<token text>",
475
+
"bytes": [int, int, ...],
476
+
},
477
+
{
478
+
"id": <token id>,
479
+
"logprob": float,
480
+
"token": "<token text>",
481
+
"bytes": [int, int, ...],
482
+
},
483
+
...
484
+
]
485
+
},
486
+
{
487
+
"id": <token id>,
488
+
"logprob": float,
489
+
"token": "<most likely token>",
490
+
"bytes": [int, int, ...],
491
+
"top_logprobs": [
492
+
...
493
+
]
494
+
},
495
+
...
496
+
]
497
+
},
498
+
```
499
+
Please note that if `post_sampling_probs` is set to `true`:
500
+
- `logprob`will be replace with `prob`, with the value between 0.0 and 1.0
501
+
- Returned number of probabilities may be less than `n_probs`
498
502
499
503
- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
500
504
- `tokens`: Same as `content` but represented as raw token ids. Only populated if `"return_tokens": true` or `"stream": true` in the request.
0 commit comments