You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| fallback_strategy | string | False |instance_health_and_rate_limiting| instance_health_and_rate_limiting | Fallback strategy. When set, the Plugin will check whether the specified instance’s token has been exhausted when a request is forwarded. If so, forward the request to the next instance regardless of the instance priority. When not set, the Plugin will not forward the request to low priority instances when token of the high priority instance is exhausted. |
54
+
| fallback_strategy | string or array| False ||string: "instance_health_and_rate_limiting", "http_429", "http_5xx"<br>array: ["rate_limiting", "http_429", "http_5xx"]| Fallback strategy. When set, the Plugin will check whether the specified instance’s token has been exhausted when a request is forwarded. If so, forward the request to the next instance regardless of the instance priority. When not set, the Plugin will not forward the request to low priority instances when token of the high priority instance is exhausted. |
| balancer.algorithm | string | False | roundrobin |[roundrobin, chash]| Load balancing algorithm. When set to `roundrobin`, weighted round robin algorithm is used. When set to `chash`, consistent hashing algorithm is used. |
57
57
| balancer.hash_on | string | False ||[vars, headers, cookie, consumer, vars_combinations]| Used when `type` is `chash`. Support hashing on [NGINX variables](https://nginx.org/en/docs/varindex.html), headers, cookie, consumer, or a combination of [NGINX variables](https://nginx.org/en/docs/varindex.html). |
@@ -186,7 +186,7 @@ DeepSeek responses: 2
186
186
187
187
### Configure Instance Priority and Rate Limiting
188
188
189
-
The following example demonstrates how you can configure two models with different priorities and apply rate limiting on the instance with a higher priority. In the case where `fallback_strategy` is set to `instance_health_and_rate_limiting`, the Plugin should continue to forward requests to the low priority instance once the high priority instance's rate limiting quota is fully consumed.
189
+
The following example demonstrates how you can configure two models with different priorities and apply rate limiting on the instance with a higher priority. In the case where `fallback_strategy` is set to `["rate_limiting"]`, the Plugin should continue to forward requests to the low priority instance once the high priority instance's rate limiting quota is fully consumed.
190
190
191
191
Create a Route as such and update with your LLM providers, models, API keys, and endpoints if applicable:
192
192
@@ -199,7 +199,7 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
The following example demonstrates how you can configure two models with different priorities and apply rate limiting on the instance with a higher priority. In the case where `fallback_strategy` is set to `instance_health_and_rate_limiting`, the Plugin should continue to forward requests to the low priority instance once the high priority instance's rate limiting quota is fully consumed.
416
+
The following example demonstrates how you can configure two models with different priorities and apply rate limiting on the instance with a higher priority. In the case where `fallback_strategy` is set to `["rate_limiting"]`, the Plugin should continue to forward requests to the low priority instance once the high priority instance's rate limiting quota is fully consumed.
417
417
418
-
Create a Route as such to set rate limiting and a higher priority on `openai-instance` instance and set the `fallback_strategy` to `instance_health_and_rate_limiting`. Update with your LLM providers, models, API keys, and endpoints, if applicable:
418
+
Create a Route as such to set rate limiting and a higher priority on `openai-instance` instance and set the `fallback_strategy` to `["rate_limiting"]`. Update with your LLM providers, models, API keys, and endpoints, if applicable:
419
419
420
420
```shell
421
421
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -426,7 +426,7 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
0 commit comments