You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -452,6 +452,8 @@ These words will not be included in the completion, so make sure to add them to
452
452
453
453
`response_fields`: A list of response fields, for example: `"response_fields": ["content", "generation_settings/n_predict"]`. If the specified field is missing, it will simply be omitted from the response without triggering an error. Note that fields with a slash will be unnested; for example, `generation_settings/n_predict` will move the field `n_predict` from the `generation_settings` object to the root of the response and give it a new name.
454
454
455
+
`lora`: A list of LoRA adapters to be applied to this specific request. Each object in the list must contain `id` and `scale` fields. For example: `[{"id": 0, "scale": 0.5}, {"id": 1, "scale": 1.1}]`. If a LoRA adapter is not specified in the list, its scale will default to `0.0`. Please note that requests with different LoRA configurations will not be batched together, which may result in performance degradation.
456
+
455
457
**Response format**
456
458
457
459
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
@@ -945,6 +947,8 @@ This endpoint returns the loaded LoRA adapters. You can add adapters using `--lo
945
947
946
948
By default, all adapters will be loaded with scale set to 1. To initialize all adapters scale to 0, add `--lora-init-without-apply`
947
949
950
+
Please note that this value will be overwritten by the `lora` field for each request.
951
+
948
952
If an adapter is disabled, the scale will be set to 0.
949
953
950
954
**Response format**
@@ -966,6 +970,8 @@ If an adapter is disabled, the scale will be set to 0.
966
970
967
971
### POST `/lora-adapters`: Set list of LoRA adapters
968
972
973
+
This sets the global scale for LoRA adapters. Please note that this value will be overwritten by the `lora` field for each request.
974
+
969
975
To disable an adapter, either remove it from the list below, or set scale to 0.
0 commit comments