You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
315
342
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
316
343
317
-
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
344
+
`stream`: Allows receiving each predicted token in real-time instead of waiting for the completion to finish (uses a different response format). To enable this, set to `true`.
318
345
319
346
`stop`: Specify a JSON array of stopping strings.
320
347
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
@@ -402,6 +429,16 @@ Notice that each `probs` is an array of length `n_probs`.
402
429
-`tokens_evaluated`: Number of tokens evaluated in total from the prompt
403
430
-`truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`)
404
431
432
+
In streaming mode, response chunks currently use the following format, with chunks separated by `\n\n`:
Although this resembles the [Server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events) standard, the `EventSource` interface cannot be used due to its lack of `POST` request support.
0 commit comments