Skip to content

Commit 63d7d83

Browse files
authored
Merge pull request #27 from krakend/sc-999/plugin-quota-inject-example
Plugin quota processor injection example
2 parents 056abe0 + c01fe41 commit 63d7d83

File tree

10 files changed

+776
-0
lines changed

10 files changed

+776
-0
lines changed

12.plugins-and-quotas/Dockerfile

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
ARG KRAKEND_BUILDER
2+
ARG KRAKEND_VERSION
3+
4+
FROM krakend/builder-ee:$KRAKEND_BUILDER AS builder
5+
6+
WORKDIR /app
7+
COPY --chown=krakend:nogroup plugins /app
8+
9+
RUN cd quota-control-mw && make amd64
10+
11+
FROM krakend/krakend-ee:$KRAKEND_VERSION
12+
13+
WORKDIR /etc/krakend
14+
15+
COPY config/krakend .
16+
COPY LICENSE LICENSE
17+
COPY --from=builder --chown=krakend:nogroup /app/*.so /opt/krakend/plugins/
18+
19+
CMD [ "run", "-c", "/etc/krakend/krakend.json" ]

12.plugins-and-quotas/README.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# KrakenD Plugins and Quotas Example
2+
3+
This example demonstrates how to implement **token-based quota management** for streaming AI/LLM endpoints using KrakenD Enterprise Edition with a custom middleware plugin.
4+
5+
## Overview
6+
7+
The example showcases:
8+
9+
- **Custom Middleware Plugin**: A Go plugin that intercepts streaming responses and tracks token usage in real-time
10+
- **Quota Management**: Multi-tier rate limiting (gold, silver, bronze) using KrakenD's governance/processors
11+
- **Stream Processing**: Parsing streaming responses to extract token usage from the response body
12+
- **Persistent Tracking**: Quota consumption tracked across requests with automatic storage management
13+
- **Mock Backend**: An OpenAI-compatible mock server that streams responses with token metadata
14+
15+
## Architecture
16+
17+
```
18+
┌──────────┐ ┌─────────────────────────────┐ ┌──────────┐
19+
│ Client │────▶│ KrakenD EE │────▶│ Backend │
20+
└──────────┘ │ - Quota Pre-check │ │ (Mock │
21+
│ - Stream Wrapper │ │ OpenAI) │
22+
│ - Token Extraction │ └──────────┘
23+
│ - Quota Update │
24+
│ │
25+
│ Quota Processor handles │
26+
│ storage transparently │
27+
└─────────────────────────────┘
28+
```
29+
30+
## Components
31+
32+
### 1. KrakenD Configuration (`config/krakend/krakend.json`)
33+
- Defines quota rules for three tiers (gold, silver, bronze)
34+
- Configures the quota processor with storage backend
35+
- Sets up endpoint with the custom middleware plugin
36+
37+
### 2. Middleware Plugin (`plugins/quota-control-mw/middleware.go`)
38+
- **Pre-check**: Validates if user has quota before forwarding request
39+
- **Stream Wrapping**: Intercepts the response stream from the backend
40+
- **Token Extraction**: Uses regex to parse token usage from streaming chunks
41+
- **Quota Update**: Dynamically updates quota based on actual token consumption
42+
43+
### 3. Mock Backend (`backend/openai-mock/main.go`)
44+
- Simulates OpenAI's streaming API format
45+
- Returns Server-Sent Events (SSE) with mock responses
46+
- Includes token usage metadata in the final chunk
47+
48+
## Prerequisites
49+
50+
- Docker and Docker Compose
51+
- KrakenD Enterprise Edition license (place `LICENSE` file in the root directory)
52+
53+
## Running the Example
54+
55+
1. **Ensure you have a KrakenD EE license file** named `LICENSE` in this directory
56+
57+
2. **Start all services**:
58+
```bash
59+
docker compose up --build
60+
```
61+
62+
This will start:
63+
- KrakenD EE on port `8080`
64+
- Mock backend on port `8090`
65+
- Supporting services (quota storage)
66+
67+
3. **Test the endpoint**:
68+
```bash
69+
curl -X POST http://localhost:8080/ \
70+
-H "Content-Type: application/json" \
71+
-d '{
72+
"messages": [{"role": "user", "content": "Hello!"}]
73+
}'
74+
```
75+
76+
You should see a streaming response with the mock data.
77+
78+
## How It Works
79+
80+
1. **Request arrives** at KrakenD's `/` endpoint
81+
2. **Middleware pre-checks** quota limits (weightless check with 0 tokens)
82+
3. If quota allows, **request is forwarded** to the backend
83+
4. **Backend streams** the response in OpenAI-compatible SSE format
84+
5. **Middleware wraps** the response stream with a custom reader
85+
6. As **chunks are read**, the plugin:
86+
- Searches for usage metadata in the stream
87+
- Extracts token counts using regex pattern
88+
- Updates the quota processor with actual consumption
89+
7. **Client receives** the full streaming response
90+
8. **Quota is accurately tracked** based on real token usage
91+
92+
## Quota Tiers
93+
94+
The configuration defines three quota tiers:
95+
96+
| Tier | Hourly Limit | Daily Limit |
97+
|--------|--------------|-------------|
98+
| Gold | 1,000 tokens | 5,000 tokens|
99+
| Silver | 500 tokens | 2,000 tokens|
100+
| Bronze | 200 tokens | 1,000 tokens|
101+
102+
**Note**: In the plugin code (`middleware.go:73`), the tier is hardcoded to `"admin"` (which maps to the `gold` tier). In a production setup, you would extract this from:
103+
- JWT claims
104+
- Request headers (e.g., `X-User-Tier`)
105+
- API key lookup
106+
- Query parameters
107+
108+
## Key Features
109+
110+
### 1. **Streaming-Aware Quota Management**
111+
Unlike traditional rate limiting that counts requests, this example counts tokens consumed during streaming, making it ideal for AI/LLM APIs where cost is token-based.
112+
113+
### 2. **Pre-check + Post-update Pattern**
114+
- Pre-check (weight=0): Fast rejection of users who've already exceeded quota
115+
- Post-update (weight=actual): Accurate quota deduction after response completes
116+
117+
### 3. **Regex-based Token Extraction**
118+
The plugin uses a regex pattern to extract token usage from the streaming response:
119+
```go
120+
usagePattern: `{"prompt_tokens":(\d+),"completion_tokens":(\d+),"total_tokens":(\d+)}`
121+
```
122+
123+
### 4. **Bloom Filter Optimization**
124+
The configuration includes a rejecter cache (Bloom filter) to quickly deny previously blocked users without storage lookups.
125+
126+
## Customization
127+
128+
### Change User/Tier Extraction
129+
Modify `middleware.go:73-74` to extract tier and user ID from the request:
130+
131+
```go
132+
// Example: Extract from JWT claims
133+
tier := extractFromJWT(reqw, "tier")
134+
userId := extractFromJWT(reqw, "user_id")
135+
136+
// Example: Extract from headers
137+
tier := reqw.Headers()["X-User-Tier"][0]
138+
userId := reqw.Headers()["X-User-Id"][0]
139+
```
140+
141+
### Adjust Quota Limits
142+
Edit `config/krakend/krakend.json` in the `governance/processors.quotas.rules` section.
143+
144+
### Change Token Pattern
145+
If your backend uses a different format for token usage, update the regex pattern in `middleware.go:62`.
146+
147+
## Building the Plugin Manually
148+
149+
If you want to build the plugin outside Docker:
150+
151+
```bash
152+
cd plugins/quota-control-mw
153+
make go.mod # Initialize Go module
154+
make amd64 # Build for AMD64
155+
# or
156+
make arm64 # Build for ARM64
157+
```
158+
159+
## Troubleshooting
160+
161+
**Plugin not loading**: Check KrakenD logs for plugin-related errors. Ensure the `.so` file is in `/opt/krakend/plugins/` inside the container.
162+
163+
**Quota not working**: Verify all services are running with `docker compose ps`. Check KrakenD logs for quota processor initialization errors.
164+
165+
**Tokens not being tracked**: Check that the token usage format in the backend response matches the regex pattern in the plugin.
166+
167+
## License
168+
169+
This example requires a valid KrakenD Enterprise Edition license. Place your `LICENSE` file in the root directory.
170+
171+
## References
172+
173+
- [KrakenD Plugins Documentation](https://www.krakend.io/docs/enterprise/extending/)
174+
- [KrakenD Quota Management](https://www.krakend.io/docs/enterprise/governance/quota/)
175+
- [OpenAI Streaming Format](https://platform.openai.com/docs/api-reference/streaming)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
FROM golang:1.25.7-alpine3.23
2+
3+
WORKDIR /app
4+
COPY openai-mock .
5+
RUN go build -o openai-mock .
6+
7+
EXPOSE 8090
8+
CMD ["/app/openai-mock"]
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
module openai-mock-server
2+
3+
go 1.25.7

0 commit comments

Comments
 (0)