You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -460,7 +460,7 @@ These words will not be included in the completion, so make sure to add them to
460
460
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
461
461
462
462
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has a nested array `top_logprobs`. It contains at **maximum** `n_probs` elements:
463
-
```json
463
+
```
464
464
{
465
465
"content": "<the generated completion text>",
466
466
"tokens": [ generated token ids if requested ],
@@ -561,7 +561,7 @@ If `with_pieces` is `true`:
561
561
```
562
562
563
563
With input 'á' (utf8 hex: C3 A1) on tinyllama/stories260k
564
-
```json
564
+
```
565
565
{
566
566
"tokens": [
567
567
{"id": 198, "piece": [195]}, // hex C3
@@ -576,6 +576,18 @@ With input 'á' (utf8 hex: C3 A1) on tinyllama/stories260k
576
576
577
577
`tokens`: Set the tokens to detokenize.
578
578
579
+
### POST `/apply-template`: Apply chat template to a conversation
580
+
581
+
Uses the server's prompt template formatting functionality to convert chat messages to a single string expected by a chat model as input, but does not perform inference. Instead, the prompt string is returned in the `prompt` field of the JSON response. The prompt can then be modified as desired (for example, to insert "Sure!" at the beginning of the model's response) before sending to `/completion` to generate the chat response.
582
+
583
+
*Options:*
584
+
585
+
`messages`: (Required) Chat turns in the same format as `/v1/chat/completions`.
586
+
587
+
**Response format**
588
+
589
+
Returns a JSON object with a field `prompt` containing a string of the input messages formatted according to the model's chat template format.
590
+
579
591
### POST `/embedding`: Generate embedding of a given text
580
592
581
593
> [!IMPORTANT]
@@ -768,7 +780,7 @@ Same as the `/v1/embeddings` endpoint.
Copy file name to clipboardExpand all lines: examples/server/tests/unit/test_chat_completion.py
+15Lines changed: 15 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -121,6 +121,21 @@ def test_chat_template():
121
121
assertres.body["__verbose"]["prompt"] =="<s> <|start_header_id|>system<|end_header_id|>\n\nBook<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is the best book<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
assertres.body["prompt"] =="<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a test.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hi there<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>"
0 commit comments