Skip to content

Commit 65c9b9a

Browse files
authored
blog: add load-balancing-between-ai-ml-api-with-apisix (#1940)
1 parent a08ee2c commit 65c9b9a

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: "Load Balancing AI/ML API with Apache APISIX"
3+
authors:
4+
- name: "Sergey Nuzhnyy"
5+
title: "Author"
6+
url: "https://github.com/OctavianTheI"
7+
image_url: "https://github.com/OctavianTheI.png"
8+
- name: "Yilia Lin"
9+
title: "Technical Writer"
10+
url: "https://github.com/Yilialinn"
11+
image_url: "https://github.com/Yilialinn.png"
12+
keywords:
13+
- API gateway
14+
- Apache APISIX
15+
- AI
16+
- AI/ML API
17+
- traffic management
18+
description: This blog provides a step-by-step guide to configure Apache APISIX for AI traffic splitting and load balancing between API versions, covering security setup, canary testing, and deployment monitoring.
19+
tags: [Ecosystem]
20+
image: https://static.api7.ai/uploads/2025/07/23/d1O3mllW_apisix-ai-ml-api.webp
21+
---
22+
23+
> This blog provides a step-by-step guide to configure Apache APISIX for AI traffic splitting and load balancing between API versions, covering security setup, canary testing, and deployment monitoring.
24+
25+
<!--truncate-->
26+
27+
## Overview
28+
29+
[**AI/ML API**](https://aimlapi.com/) is a one-stop, OpenAI-compatible endpoint that is trusted by 150,000+ developers to 300+ state-of-the-art models—chat, vision, image/video/music generation, embeddings, OCR, and more—from Google, Meta, OpenAI, Anthropic, Mistral, and others.
30+
31+
[**Apache APISIX**](https://github.com/apache/apisix) is a dynamic, real-time, high-performance API Gateway. APISIX API Gateway provides rich traffic management features and can serve as an AI Gateway through its flexible plugin system.
32+
33+
Modern AI workloads often require smooth version migrations, A/B testing, and rolling updates. This guide shows you how to:
34+
35+
1. **Install** Apache APISIX with Docker quickstart.
36+
2. **Secure** the Admin API with keys and IP whitelisting.
37+
3. **Define** separate routes for API versions v1 and v2.
38+
4. **Implement** weighted traffic splitting (50/50) via the `traffic-split` plugin.
39+
5. **Verify** the newly created split endpoint functionality.
40+
6. **Load test** and **monitor** distribution accuracy.
41+
42+
To perform authenticated requests, you'll need an AI/ML API key. You can get one at [https://aimlapi.com/app/keys/](https://aimlapi.com/app/keys?utm_source=apisix&utm_medium=guide&utm_campaign=integration) and use it as a Bearer token in your Authorization headers.
43+
44+
![Generate AI/ML API Key](https://static.api7.ai/uploads/2025/07/30/XdAXZUT6_generate-ai-ml-api-key.webp)
45+
46+
## Quickstart Installation
47+
48+
```bash
49+
# 1. Download and run the quickstart script (includes etcd + APISIX)
50+
curl -sL https://run.api7.ai/apisix/quickstart | sh
51+
52+
# 2. Confirm APISIX is up and running
53+
curl -I http://127.0.0.1:9080 | grep Server
54+
# ➜ Server: APISIX/3.13.0
55+
```
56+
57+
> **Tip:** If you encounter port conflicts, adjust Docker host networking or map to different ports in the quickstart script.
58+
59+
## Secure the Admin API
60+
61+
By default, quickstart bypasses Admin API authentication. For any non-development environment, enforce security:
62+
63+
### 1. Set an Admin Key
64+
65+
Edit `conf/config.yaml` inside the APISIX container or local install directory, replacing the example key with your own API key obtained from the link above:
66+
67+
```yaml
68+
apisix:
69+
enable_admin: true # Enable Admin API
70+
admin_key_required: true # Reject unauthenticated Admin requests
71+
admin_key:
72+
- name: admin
73+
key: YOUR_ADMIN_KEY_HERE # Generated admin key - you can replace this with a secure key as you wish
74+
role: admin
75+
```
76+
77+
> **Security Best Practice:** Use at least 32 characters, mix letters/numbers/symbols, and rotate keys quarterly.
78+
79+
### 2. Whitelist Management IPs (allow\_admin)
80+
81+
Add your management or local networks under the `admin:` section:
82+
83+
```yaml
84+
admin:
85+
allow_admin:
86+
- 127.0.0.0/24 # Localhost & host network
87+
- 0.0.0.0/0 # Allow all (temporary/testing only)
88+
```
89+
90+
> **Warning:** `0.0.0.0/0` opens Admin API to the world! Lock this down to specific subnets in production.
91+
92+
### 3. Restart APISIX
93+
94+
```bash
95+
docker restart apisix-quickstart
96+
```
97+
98+
> **Check Logs:** `docker logs apisix-quickstart --tail 50` to ensure no errors about admin authentication.
99+
100+
## Define Basic Routes for v1 and v2
101+
102+
Before splitting traffic, ensure each version works individually.
103+
104+
### 1. Route for v1
105+
106+
```bash
107+
curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v1 \
108+
-X PUT \
109+
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
110+
-d '{
111+
"uri": "/test/v1",
112+
"upstream": {
113+
"type": "roundrobin",
114+
"nodes": {"api.aimlapi.com:443": 1},
115+
"scheme": "https",
116+
"pass_host": "node"
117+
}
118+
}'
119+
```
120+
121+
> **Tip:** Use `id` fields if you want to manage or delete routes easily later.
122+
123+
### 2. Route for v2
124+
125+
```bash
126+
curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v2 \
127+
-X PUT \
128+
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
129+
-d '{
130+
"uri": "/test/v2",
131+
"upstream": {
132+
"type": "roundrobin",
133+
"nodes": {"api.aimlapi.com:443": 1},
134+
"scheme": "https",
135+
"pass_host": "node"
136+
}
137+
}'
138+
```
139+
140+
## Implement Traffic Splitting (50/50)
141+
142+
Use the [`traffic-split`](https://apisix.apache.org/docs/apisix/plugins/traffic-split/) plugin for controlled distribution between v1 and v2. In the admin request below, replace `YOUR_ADMIN_KEY_HERE` with your actual key.
143+
144+
```bash
145+
curl -i http://127.0.0.1:9180/apisix/admin/routes/aimlapi-split \
146+
-X PUT \
147+
-H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \
148+
-d '{
149+
"id": "aimlapi-split",
150+
"uri": "/chat/completions",
151+
"upstream": {
152+
"type": "roundrobin",
153+
"nodes": {"api.aimlapi.com:443": 1},
154+
"scheme": "https",
155+
"pass_host": "node"
156+
},
157+
"plugins": {
158+
"traffic-split": {
159+
"rules": [
160+
{
161+
"weight": 50,
162+
"upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"},
163+
"rewrite": {"uri":"/v1/chat/completions"}
164+
},
165+
{
166+
"weight": 50,
167+
"upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"},
168+
"rewrite": {"uri":"/v2/chat/completions"}
169+
}
170+
]
171+
}
172+
}
173+
}'
174+
```
175+
176+
> **Tip:** Adjust the `weight` values to shift traffic ratios (e.g., 80/20 for canary).
177+
>
178+
> **Note:** `rewrite` must match the internal API path exactly.
179+
180+
## Verify Split Endpoint Functionality
181+
182+
Test the `/chat/completions` endpoint you just created. Replace `<AIML_API_KEY>` with the key obtained earlier and use it as a Bearer token:
183+
184+
```bash
185+
curl -v -X POST http://127.0.0.1:9080/chat/completions \
186+
-H "Authorization: Bearer <AIML_API_KEY>" \
187+
-H "Content-Type: application/json" \
188+
-d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}'
189+
```
190+
191+
**Expected Output:**
192+
193+
```json
194+
{"content":"Pong! How can I assist you today?"}
195+
```
196+
197+
> **Tip:** Use `-v` for verbose output to troubleshoot headers or TLS issues.
198+
199+
## Load Test & Distribution Validation
200+
201+
After configuring the split route, use the following commands to validate distribution. Replace `<AIML_API_KEY>` with your Bearer token.
202+
203+
```bash
204+
# 1. Send 100 test requests
205+
time seq 100 | xargs -I {} curl -s -o /dev/null -X POST http://127.0.0.1:9080/chat/completions \
206+
-H "Authorization: Bearer <AIML_API_KEY>" \
207+
-H "Content-Type: application/json" \
208+
-d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}'
209+
210+
# 2. Check APISIX logs for upstream hits (replace IPs with actual resolved IPs)
211+
echo "v1 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.97.3:443')"
212+
echo "v2 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.96.3:443')"
213+
```
214+
215+
**Expected:** Approximately 50 requests to each upstream.
216+
217+
> **Tip:** Use Prometheus or OpenTelemetry plugins for real‑time metrics instead of manual log parsing.
218+
219+
## Best Practices & Next Steps
220+
221+
* **Rate Limiting & Quotas**: Add [`limit-count`](https://apisix.apache.org/docs/apisix/plugins/limit-count/) plugin to protect your upstream from spikes.
222+
* **Authentication**: Layer on the [`key-auth`](https://apisix.apache.org/docs/apisix/plugins/key-auth/) plugin for consumer management.
223+
* **Circuit Breaker**: Prevent cascading failures with the [`api-breaker`](https://apisix.apache.org/docs/apisix/plugins/api-breaker/) plugin.
224+
* **Observability**: Integrate Prometheus, Skywalking, or Loki for dashboards and alerts.
225+
* **Infrastructure as Code**: Consider managing APISIX config via Kubernetes CRDs or ADC for reproducibility.
226+
227+
## References
228+
229+
* [APISIX Load Balancing Documentation](https://apisix.apache.org/docs/apisix/getting-started/load-balancing/)
230+
* [AI/ML API Documentation](https://docs.aimlapi.com/?utm_source=apisix&utm_medium=guide&utm_campaign=integration)

0 commit comments

Comments
 (0)