File tree Expand file tree Collapse file tree 2 files changed +2
-2
lines changed
003-model-server-protocol
004-endpoint-picker-protocol Expand file tree Collapse file tree 2 files changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -43,7 +43,7 @@ The model server MUST expose the following LoRA adapter metrics via the same Pro
4343* Metric value: The last updated timestamp (so the EPP can find the latest).
4444* Metric labels:
4545 * ` max_lora ` : The maximum number of adapters that can be loaded to GPU memory to serve a batch.
46- Requests will be queued if the model server has reached MaxActiveAdapter and canno load the
46+ Requests will be queued if the model server has reached MaxActiveAdapter and cannot load the
4747 requested adapter. Example: ` "max_lora": "8" ` .
4848 * ` running_lora_adapters ` : A comma separated list of adapters that are currently loaded in GPU
4949 memory and ready to serve requests. Example: ` "running_lora_adapters": "adapter1, adapter2" `
Original file line number Diff line number Diff line change @@ -7,7 +7,7 @@ found [here](../../../pkg/epp/).
77This doc defines the protocol between the EPP and the proxy (e.g, Envoy).
88
99The EPP MUST implement the Envoy
10- [ external processing service] ( https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor ) protocol.
10+ [ external processing service] ( https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor ) protocol.
1111
1212For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via:
1313
You can’t perform that action at this time.
0 commit comments