You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -211,15 +207,11 @@ it will result in the following [ShouldRateLimit gRPC](https://www.envoyproxy.io
211
207
}
212
208
```
213
209
214
-
- Implement the rate-limiting logic during the processing of the **downstream** request body, as it must be parsed to determine which model is being targeted.
215
-
- Initial descriptors would include the request path, user id (if available) and the requested model.
216
210
- Implement the rate-limiting logic during the processing of the **upstream** response body, as it must be parsed to determine the counter increment based on usage metrics.
217
211
- Look at ways to avoid 2 requests to limitador per single request to a model. This is not ideal to have a limit check and counter increment happen separately due to scaling concerns. However, this approach is sufficient for an initial implementation.
218
212
- A new action type is not being considered. The WASM module will only initiate a ShouldRateLimit gRPC call to Limitador when all associated CEL expressions (namely `predicates` and `data`) can be evaluated.
219
213
- The order of actions is important and will be enforced:
220
-
- If any CEL expression references the `requestBodyJSON()` CEL function, the gRPC request will be triggered after the **downstream** request body has been parsed.
221
214
- If any CEL expression references the `responseBodyJSON()` CEL function, the gRPC request will be triggered after the **upstream** response body has been parsed.
222
-
- If one action requires evaluation of the `requestBodyJSON()` and a subsequent action can be performed during the request headers phase, both actions will be executed during the request body phase.
223
215
- If one action requires evaluation of the `responseBodyJSON()` and a subsequent action can be performed during any of the previous request phases, both actions will be executed during the response body phase.
224
216
- Usage metrics are flushed as part of the body of LLM responses (either complete responses, or when streamed). Some additional notes on our existing filters, including our "internal to WASM" http filter chain, in this thread: https://kubernetes.slack.com/archives/C05J0D0V525/p1744098001098719. A flow diagram below attempts to capture this flow at a high level.
0 commit comments