⚠️ This is a fork of the original ai-proxy project by Evgeny Shepelyuk. This fork includes significant architectural improvements, enhanced security features, and performance optimizations.
This service acts as a proxy for various Large Language Model (LLM) providers, including OpenAI, Groq, HuggingFace, and others. It allows users to seamlessly interact with multiple LLMs through a unified API, simplifying integration and request management with secure proxy access token authentication.
- Unified API: Work with multiple LLM providers using a single API endpoint.
- Provider Flexibility: Easily switch between different LLM providers without changing your application code.
- Secure Authentication: Proxy access token authentication for API security.
- Request Management: Handles routing and error handling.
- Rate Limiting: Supports per-model request limits (minute/hour/day).
- Simple Configuration: YAML-based setup with support for multiple models.
- Streaming Support: Supports streaming responses for providers that implement this feature.
- Enhanced Error Handling: Structured error types with specific error codes and metadata.
- Memory Efficiency: Buffer pooling for efficient memory management.
- Retry Mechanisms: Streaming retry mechanism with exponential backoff and jitter.
- Modular Architecture: Clean separation of concerns with dedicated packages for each functionality.
This fork includes several significant improvements over the original project:
- Modular Package Structure: Clear separation of concerns with dedicated packages for server, handlers, middleware, config, errors, etc.
- Dependency Injection: Eliminated global state through proper dependency injection patterns.
- Provider Registry: Centralized provider management with runtime registration capability.
- Buffer Pooling: Implemented buffer pooling with
sync.Pool
for efficient memory management.
- Connection Pooling: Enhanced HTTP client configuration with optimized connection pooling settings.
- Memory Management: Improved memory management with buffer pooling across all handlers.
- Streaming Retry: Implemented streaming retry mechanism with exponential backoff and jitter.
- Structured Error Handling: Enhanced error handling with structured error types and retryable error detection.
- Secure Authentication: Improved authentication middleware with proper security headers.
- Configuration Management: Enhanced configuration management with environment variable support.
- Error Handling: Secure error handling without information disclosure.
- Provider Interface: Standardized provider interface contract with
Call
method for all AI providers. - HTTP Client Factory: Centralized HTTP client creation with configurable timeouts and connection pooling.
- Rate Limiting: Improved rate limiting implementation with fine-grained locking.
- Go (version 1.24 or higher)
Clone the repository:
git clone https://github.com/your-username/ai-proxy.git
cd ai-proxy
.env
- Proxy settings (proxy access token, port)- Copy from
scripts/.env.example
- Set
AUTH_TOKEN
to a secure proxy access token for authenticating requests to this proxy
- Copy from
provider-config.yaml
- Provider-specific configurations- Configuration file is required
- Use
--config
CLI flag to specify path - Add API keys for each provider
✅ You can list multiple models from different providers in the same file. 🛡️ Sensitive values like API tokens should be stored securely.
models:
- name: deepseek/deepseek-r1:free
provider: openrouter
priority: 1
requests_per_minute: 20
requests_per_hour: 1000
requests_per_day: 1000
url: "https://openrouter.ai/api/v1/chat/completions"
token: "your_openrouter_api_token"
max_request_length: 167731
model_size: BIG
http_client_config:
timeout_seconds: 30
idle_conn_timeout_seconds: 90
✅ You can list multiple models from different providers in the same file. 🛡️ Sensitive values like API tokens should be stored securely.
- Copy configuration templates:
cp scripts/.env.example .env
- Edit
.env
and set:
AUTH_TOKEN="your_token_here" # REQUIRED
PORT="8080" # Optional (default: 8080)
-
(Optional) Create and edit
provider-config.yaml
with your provider API keys -
Build and run:
go build -o ai-proxy && ./ai-proxy
To start the proxy server, run:
go run main.go
The server will start on http://localhost:8080
by default. You can change the port by setting the PORT
environment variable:
PORT=9090 go run main.go
Replace 9090
with your desired port number.
The service supports several CLI flags for configuration:
go run main.go --help
# Available flags:
# -addr string
# listen address override, e.g., :8080
# -config string
# path to provider-config.yaml (required)
# -env-file string
# path to .env file
# -version
# print version and exit
The project includes Docker support for easy deployment:
# Build the Docker image
docker build -t ai-proxy -f scripts/Dockerfile .
# Run the container
docker run -d \
--name ai-proxy \
-p 8080:80 \
-v $(pwd)/config:/config \
ai-proxy
To use the proxy, you need to include the proxy access token in the Authorization header of your requests:
curl -X POST http://localhost:8080/chat/completions \
-H "Authorization: Bearer ${AUTH_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-r1:free",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me a joke."
}
]
}'
Replace ${AUTH_TOKEN}
with your actual proxy access token.
The proxy supports streaming responses for providers that implement this feature. To use streaming, clients must send "stream": true
in the JSON body of their POST
request to /chat/completions
. You also need to include the proxy access token in the Authorization header.
Here's an example of a streaming request using curl
:
curl -N -X POST http://localhost:8080/chat/completions \
-H "Authorization: Bearer ${AUTH_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-r1:free",
"messages": [
{
"role": "user",
"content": "Tell me a very long story."
}
],
"stream": true
}'
Replace ${AUTH_TOKEN}
with your actual proxy access token.
When streaming is enabled, the response will be a text/event-stream
containing server-sent events (SSE). Each event will contain a chunk of the response as it's generated by the model.
The proxy currently supports the following AI providers:
- OpenRouter - Access to multiple models including DeepSeek, Qwen, and others
- Cloudflare - Cloudflare AI Gateway
- Google Gemini - Google's Gemini models
- Sberbank GigaChat - Russian LLM provider
- Groq - Fast inference models
- OpenAI - OpenAI models (GPT-3.5, GPT-4, etc.)
- Cohere - Cohere models
The proxy exposes the following API endpoints:
POST /chat/completions
- Text generation with streaming supportPOST /image
- Image generation processing (not fully implemented)GET /models
- List available modelsGET /ping
- Health check endpoint
Contributions are welcome! Please submit a pull request or open an issue to discuss improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
- Original project by Evgeny Shepelyuk
- This fork contains significant improvements and modifications