Skip to content

Commit d0a64f9

Browse files
authored
Merge pull request #134 from Kuadrant/update-trlp-design
Remove references to requestBodyJSON cel function as not supported
2 parents 0842c00 + da1ae12 commit d0a64f9

File tree

1 file changed

+2
-11
lines changed

1 file changed

+2
-11
lines changed

rfcs/0013-ai-policies.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -179,9 +179,6 @@ actions:
179179
- service: ratelimit-service
180180
scope: ratelimit-scope-a
181181
data:
182-
- expression:
183-
key: model
184-
value: requestBodyJSON('model')
185182
- expression:
186183
key: limit.low_limit__346b5e73
187184
value: "1"
@@ -193,16 +190,15 @@ actions:
193190
value: responseBodyJSON('usage.total_tokens')
194191
```
195192

196-
When a prompt request for model `gpt-4.1`, whose response generates `35` tokens, reaches the WASM module with the configuration above,
197-
it will result in the following [ShouldRateLimit gRPC](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ratelimit/v3/rls.proto) call:
193+
When a prompt request, whose response generates `35` tokens, reaches the WASM module with the configuration above,
194+
it will result in the following [Report gRPC](https://github.com/Kuadrant/limitador/blob/server-v2.1.0/limitador-server/proto/kuadrantrls.proto) call:
198195

199196
```json
200197
{
201198
"domain": "foobar",
202199
"descriptors": [
203200
{
204201
"entries": [
205-
{"key": "model", "value": "gpt-4.1"},
206202
{"key": "limit.low_limit__346b5e73", "value": "1"}
207203
]
208204
}
@@ -211,15 +207,11 @@ it will result in the following [ShouldRateLimit gRPC](https://www.envoyproxy.io
211207
}
212208
```
213209

214-
- Implement the rate-limiting logic during the processing of the **downstream** request body, as it must be parsed to determine which model is being targeted.
215-
- Initial descriptors would include the request path, user id (if available) and the requested model.
216210
- Implement the rate-limiting logic during the processing of the **upstream** response body, as it must be parsed to determine the counter increment based on usage metrics.
217211
- Look at ways to avoid 2 requests to limitador per single request to a model. This is not ideal to have a limit check and counter increment happen separately due to scaling concerns. However, this approach is sufficient for an initial implementation.
218212
- A new action type is not being considered. The WASM module will only initiate a ShouldRateLimit gRPC call to Limitador when all associated CEL expressions (namely `predicates` and `data`) can be evaluated.
219213
- The order of actions is important and will be enforced:
220-
- If any CEL expression references the `requestBodyJSON()` CEL function, the gRPC request will be triggered after the **downstream** request body has been parsed.
221214
- If any CEL expression references the `responseBodyJSON()` CEL function, the gRPC request will be triggered after the **upstream** response body has been parsed.
222-
- If one action requires evaluation of the `requestBodyJSON()` and a subsequent action can be performed during the request headers phase, both actions will be executed during the request body phase.
223215
- If one action requires evaluation of the `responseBodyJSON()` and a subsequent action can be performed during any of the previous request phases, both actions will be executed during the response body phase.
224216
- Usage metrics are flushed as part of the body of LLM responses (either complete responses, or when streamed). Some additional notes on our existing filters, including our "internal to WASM" http filter chain, in this thread: https://kubernetes.slack.com/archives/C05J0D0V525/p1744098001098719. A flow diagram below attempts to capture this flow at a high level.
225217

@@ -252,7 +244,6 @@ sequenceDiagram
252244
end
253245
254246
%% pre-model-server token rate limiting check
255-
GW->>GW: Parse model from request body
256247
GW->>L: CheckRateLimit (read only op)
257248
alt Limit not reached
258249
L-->>GW: Rate limit OK

0 commit comments

Comments
 (0)