Skip to content

Commit a0fae87

Browse files
feat: add completion types and optimize union unmarshal (envoyproxy#1242)
**Description** This PR adds OpenAI's legacy `/completions` endpoint support to Envoy AI Gateway, enabling compatibility with models like `babbage-002` and `gpt-3.5-turbo-instruct`. Ideal for cost-optimized or fine-tuned setups (e.g., vLLM/llama.cpp) needing suffix params or token prompts. Refactored unions (`ContentUnion`, `EmbeddingRequestInput`, `PromptUnion`) with custom unmarshaling for **1.6x–10x speedups**, **60%–95% less memory**, and **52%–93% fewer allocations** vs. openai-go. Gains are critical for batch/token-heavy workloads. *Key Changes* - New types: `CompletionRequest`, `CompletionResponse`, `PromptUnion`. - Model constants for legacy testing. - Unified `Usage` across APIs. - New Benchmarks! **Related Issues/PRs** Starts envoyproxy#1231 --------- Signed-off-by: Adrian Cole <[email protected]>
1 parent cf48588 commit a0fae87

40 files changed

+7207
-4978
lines changed

.codespell.skip

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ go.sum
1111
./tests/e2e/logs
1212
*_for_tests.yaml
1313
./tests/extproc/testdata/server.*
14+
./tests/internal/testopenai/cassettes/*.yaml
15+
./tests/internal/testopeninference/spans/*.json

cmd/aigw/docker-compose.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,6 @@ services:
7777
-X POST http://aigw:1975/v1/embeddings \
7878
-H "Authorization: Bearer unused" \
7979
-H "Content-Type: application/json" \
80-
-d "{\"model\":\"$$EMBEDDINGS_MODEL\",\"input\":\"What is RAG?\"}"
80+
-d "{\"model\":\"$$EMBEDDINGS_MODEL\",\"input\":\"How do I reset my password?\"}"
8181
extra_hosts: # localhost:host-gateway trick doesn't work with aigw
8282
- "host.docker.internal:host-gateway"

internal/apischema/openai/openai.go

Lines changed: 288 additions & 65 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)