|
| 1 | +--- |
| 2 | +title: "Load Balancing AI/ML API with Apache APISIX" |
| 3 | +authors: |
| 4 | + - name: "Sergey Nuzhnyy" |
| 5 | + title: "Author" |
| 6 | + url: "https://github.com/OctavianTheI" |
| 7 | + image_url: "https://github.com/OctavianTheI.png" |
| 8 | + - name: "Yilia Lin" |
| 9 | + title: "Technical Writer" |
| 10 | + url: "https://github.com/Yilialinn" |
| 11 | + image_url: "https://github.com/Yilialinn.png" |
| 12 | +keywords: |
| 13 | +- API gateway |
| 14 | +- Apache APISIX |
| 15 | +- AI |
| 16 | +- AI/ML API |
| 17 | +- traffic management |
| 18 | +description: This blog provides a step-by-step guide to configure Apache APISIX for AI traffic splitting and load balancing between API versions, covering security setup, canary testing, and deployment monitoring. |
| 19 | +tags: [Ecosystem] |
| 20 | +image: https://static.api7.ai/uploads/2025/07/23/d1O3mllW_apisix-ai-ml-api.webp |
| 21 | +--- |
| 22 | + |
| 23 | +> This blog provides a step-by-step guide to configure Apache APISIX for AI traffic splitting and load balancing between API versions, covering security setup, canary testing, and deployment monitoring. |
| 24 | +
|
| 25 | +<!--truncate--> |
| 26 | + |
| 27 | +## Overview |
| 28 | + |
| 29 | +[**AI/ML API**](https://aimlapi.com/) is a one-stop, OpenAI-compatible endpoint that is trusted by 150,000+ developers to 300+ state-of-the-art models—chat, vision, image/video/music generation, embeddings, OCR, and more—from Google, Meta, OpenAI, Anthropic, Mistral, and others. |
| 30 | + |
| 31 | +[**Apache APISIX**](https://github.com/apache/apisix) is a dynamic, real-time, high-performance API Gateway. APISIX API Gateway provides rich traffic management features and can serve as an AI Gateway through its flexible plugin system. |
| 32 | + |
| 33 | +Modern AI workloads often require smooth version migrations, A/B testing, and rolling updates. This guide shows you how to: |
| 34 | + |
| 35 | +1. **Install** Apache APISIX with Docker quickstart. |
| 36 | +2. **Secure** the Admin API with keys and IP whitelisting. |
| 37 | +3. **Define** separate routes for API versions v1 and v2. |
| 38 | +4. **Implement** weighted traffic splitting (50/50) via the `traffic-split` plugin. |
| 39 | +5. **Verify** the newly created split endpoint functionality. |
| 40 | +6. **Load test** and **monitor** distribution accuracy. |
| 41 | + |
| 42 | +To perform authenticated requests, you'll need an AI/ML API key. You can get one at [https://aimlapi.com/app/keys/](https://aimlapi.com/app/keys?utm_source=apisix&utm_medium=guide&utm_campaign=integration) and use it as a Bearer token in your Authorization headers. |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +## Quickstart Installation |
| 47 | + |
| 48 | +```bash |
| 49 | +# 1. Download and run the quickstart script (includes etcd + APISIX) |
| 50 | +curl -sL https://run.api7.ai/apisix/quickstart | sh |
| 51 | + |
| 52 | +# 2. Confirm APISIX is up and running |
| 53 | +curl -I http://127.0.0.1:9080 | grep Server |
| 54 | +# ➜ Server: APISIX/3.13.0 |
| 55 | +``` |
| 56 | + |
| 57 | +> **Tip:** If you encounter port conflicts, adjust Docker host networking or map to different ports in the quickstart script. |
| 58 | +
|
| 59 | +## Secure the Admin API |
| 60 | + |
| 61 | +By default, quickstart bypasses Admin API authentication. For any non-development environment, enforce security: |
| 62 | + |
| 63 | +### 1. Set an Admin Key |
| 64 | + |
| 65 | +Edit `conf/config.yaml` inside the APISIX container or local install directory, replacing the example key with your own API key obtained from the link above: |
| 66 | + |
| 67 | +```yaml |
| 68 | +apisix: |
| 69 | + enable_admin: true # Enable Admin API |
| 70 | + admin_key_required: true # Reject unauthenticated Admin requests |
| 71 | + admin_key: |
| 72 | + - name: admin |
| 73 | + key: YOUR_ADMIN_KEY_HERE # Generated admin key - you can replace this with a secure key as you wish |
| 74 | + role: admin |
| 75 | +``` |
| 76 | +
|
| 77 | +> **Security Best Practice:** Use at least 32 characters, mix letters/numbers/symbols, and rotate keys quarterly. |
| 78 | +
|
| 79 | +### 2. Whitelist Management IPs (allow\_admin) |
| 80 | +
|
| 81 | +Add your management or local networks under the `admin:` section: |
| 82 | + |
| 83 | +```yaml |
| 84 | +admin: |
| 85 | + allow_admin: |
| 86 | + - 127.0.0.0/24 # Localhost & host network |
| 87 | + - 0.0.0.0/0 # Allow all (temporary/testing only) |
| 88 | +``` |
| 89 | + |
| 90 | +> **Warning:** `0.0.0.0/0` opens Admin API to the world! Lock this down to specific subnets in production. |
| 91 | + |
| 92 | +### 3. Restart APISIX |
| 93 | + |
| 94 | +```bash |
| 95 | +docker restart apisix-quickstart |
| 96 | +``` |
| 97 | + |
| 98 | +> **Check Logs:** `docker logs apisix-quickstart --tail 50` to ensure no errors about admin authentication. |
| 99 | + |
| 100 | +## Define Basic Routes for v1 and v2 |
| 101 | + |
| 102 | +Before splitting traffic, ensure each version works individually. |
| 103 | + |
| 104 | +### 1. Route for v1 |
| 105 | + |
| 106 | +```bash |
| 107 | +curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v1 \ |
| 108 | + -X PUT \ |
| 109 | + -H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \ |
| 110 | + -d '{ |
| 111 | + "uri": "/test/v1", |
| 112 | + "upstream": { |
| 113 | + "type": "roundrobin", |
| 114 | + "nodes": {"api.aimlapi.com:443": 1}, |
| 115 | + "scheme": "https", |
| 116 | + "pass_host": "node" |
| 117 | + } |
| 118 | + }' |
| 119 | +``` |
| 120 | + |
| 121 | +> **Tip:** Use `id` fields if you want to manage or delete routes easily later. |
| 122 | + |
| 123 | +### 2. Route for v2 |
| 124 | + |
| 125 | +```bash |
| 126 | +curl -i http://127.0.0.1:9180/apisix/admin/routes/test-v2 \ |
| 127 | + -X PUT \ |
| 128 | + -H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \ |
| 129 | + -d '{ |
| 130 | + "uri": "/test/v2", |
| 131 | + "upstream": { |
| 132 | + "type": "roundrobin", |
| 133 | + "nodes": {"api.aimlapi.com:443": 1}, |
| 134 | + "scheme": "https", |
| 135 | + "pass_host": "node" |
| 136 | + } |
| 137 | + }' |
| 138 | +``` |
| 139 | + |
| 140 | +## Implement Traffic Splitting (50/50) |
| 141 | + |
| 142 | +Use the [`traffic-split`](https://apisix.apache.org/docs/apisix/plugins/traffic-split/) plugin for controlled distribution between v1 and v2. In the admin request below, replace `YOUR_ADMIN_KEY_HERE` with your actual key. |
| 143 | + |
| 144 | +```bash |
| 145 | +curl -i http://127.0.0.1:9180/apisix/admin/routes/aimlapi-split \ |
| 146 | + -X PUT \ |
| 147 | + -H "X-API-KEY: YOUR_ADMIN_KEY_HERE" \ |
| 148 | + -d '{ |
| 149 | + "id": "aimlapi-split", |
| 150 | + "uri": "/chat/completions", |
| 151 | + "upstream": { |
| 152 | + "type": "roundrobin", |
| 153 | + "nodes": {"api.aimlapi.com:443": 1}, |
| 154 | + "scheme": "https", |
| 155 | + "pass_host": "node" |
| 156 | + }, |
| 157 | + "plugins": { |
| 158 | + "traffic-split": { |
| 159 | + "rules": [ |
| 160 | + { |
| 161 | + "weight": 50, |
| 162 | + "upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"}, |
| 163 | + "rewrite": {"uri":"/v1/chat/completions"} |
| 164 | + }, |
| 165 | + { |
| 166 | + "weight": 50, |
| 167 | + "upstream": {"type":"roundrobin","nodes":{"api.aimlapi.com:443":1},"scheme":"https","pass_host":"node"}, |
| 168 | + "rewrite": {"uri":"/v2/chat/completions"} |
| 169 | + } |
| 170 | + ] |
| 171 | + } |
| 172 | + } |
| 173 | + }' |
| 174 | +``` |
| 175 | + |
| 176 | +> **Tip:** Adjust the `weight` values to shift traffic ratios (e.g., 80/20 for canary). |
| 177 | +> |
| 178 | +> **Note:** `rewrite` must match the internal API path exactly. |
| 179 | + |
| 180 | +## Verify Split Endpoint Functionality |
| 181 | + |
| 182 | +Test the `/chat/completions` endpoint you just created. Replace `<AIML_API_KEY>` with the key obtained earlier and use it as a Bearer token: |
| 183 | + |
| 184 | +```bash |
| 185 | +curl -v -X POST http://127.0.0.1:9080/chat/completions \ |
| 186 | + -H "Authorization: Bearer <AIML_API_KEY>" \ |
| 187 | + -H "Content-Type: application/json" \ |
| 188 | + -d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}' |
| 189 | +``` |
| 190 | + |
| 191 | +**Expected Output:** |
| 192 | + |
| 193 | +```json |
| 194 | +{"content":"Pong! How can I assist you today?"} |
| 195 | +``` |
| 196 | + |
| 197 | +> **Tip:** Use `-v` for verbose output to troubleshoot headers or TLS issues. |
| 198 | + |
| 199 | +## Load Test & Distribution Validation |
| 200 | + |
| 201 | +After configuring the split route, use the following commands to validate distribution. Replace `<AIML_API_KEY>` with your Bearer token. |
| 202 | + |
| 203 | +```bash |
| 204 | +# 1. Send 100 test requests |
| 205 | +time seq 100 | xargs -I {} curl -s -o /dev/null -X POST http://127.0.0.1:9080/chat/completions \ |
| 206 | + -H "Authorization: Bearer <AIML_API_KEY>" \ |
| 207 | + -H "Content-Type: application/json" \ |
| 208 | + -d '{"model":"gpt-4","messages":[{"role":"user","content":"ping"}]}' |
| 209 | +
|
| 210 | +# 2. Check APISIX logs for upstream hits (replace IPs with actual resolved IPs) |
| 211 | +echo "v1 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.97.3:443')" |
| 212 | +echo "v2 hits: $(docker logs apisix-quickstart --since 5m | grep -c '188.114.96.3:443')" |
| 213 | +``` |
| 214 | + |
| 215 | +**Expected:** Approximately 50 requests to each upstream. |
| 216 | + |
| 217 | +> **Tip:** Use Prometheus or OpenTelemetry plugins for real‑time metrics instead of manual log parsing. |
| 218 | + |
| 219 | +## Best Practices & Next Steps |
| 220 | + |
| 221 | +* **Rate Limiting & Quotas**: Add [`limit-count`](https://apisix.apache.org/docs/apisix/plugins/limit-count/) plugin to protect your upstream from spikes. |
| 222 | +* **Authentication**: Layer on the [`key-auth`](https://apisix.apache.org/docs/apisix/plugins/key-auth/) plugin for consumer management. |
| 223 | +* **Circuit Breaker**: Prevent cascading failures with the [`api-breaker`](https://apisix.apache.org/docs/apisix/plugins/api-breaker/) plugin. |
| 224 | +* **Observability**: Integrate Prometheus, Skywalking, or Loki for dashboards and alerts. |
| 225 | +* **Infrastructure as Code**: Consider managing APISIX config via Kubernetes CRDs or ADC for reproducibility. |
| 226 | + |
| 227 | +## References |
| 228 | + |
| 229 | +* [APISIX Load Balancing Documentation](https://apisix.apache.org/docs/apisix/getting-started/load-balancing/) |
| 230 | +* [AI/ML API Documentation](https://docs.aimlapi.com/?utm_source=apisix&utm_medium=guide&utm_campaign=integration) |
0 commit comments