Skip to content

Commit dcfc9b9

Browse files
committed
Add LiteLLM proxy config for Venice API in Cursor
Cursor over-allocates max_tokens for models with 1M context windows (e.g. claude-opus-4-6), causing Venice to reject requests. This adds a LiteLLM proxy config that clamps output tokens to safe limits.
1 parent 6faa0a6 commit dcfc9b9

File tree

2 files changed

+113
-0
lines changed

2 files changed

+113
-0
lines changed

scripts/venice-litellm/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Venice API + Cursor IDE via LiteLLM Proxy
2+
3+
Venice models with 1M token context windows (e.g. `claude-opus-4-6`, `claude-sonnet-4-6`) fail in Cursor because Cursor derives `max_tokens` from the context window and sends values that exceed Venice's output token limits.
4+
5+
Models with ≤200k context (e.g. `claude-opus-45`) work without a proxy.
6+
7+
LiteLLM sits between Cursor and Venice, clamping `max_tokens` to safe values.
8+
9+
## Prerequisites
10+
11+
- `VENICE_API_KEY` exported in your shell (e.g. in `~/.zshrc`)
12+
- Python 3.9+
13+
14+
## Setup
15+
16+
```bash
17+
pip install 'litellm[proxy]'
18+
```
19+
20+
## Start the proxy
21+
22+
```bash
23+
litellm --config /path/to/litellm-config.yaml --port 8765
24+
```
25+
26+
## Configure Cursor
27+
28+
1. Open **Settings > Models**
29+
2. Add custom models by name: `claude-opus-4-6`, `openai-gpt-52`, etc.
30+
3. Under **OpenAI API Key**, enter your Venice API key
31+
4. Set **Override OpenAI Base URL** to `http://localhost:8765`
32+
33+
## Available models
34+
35+
| Model | Max Output Tokens | Notes |
36+
|-------|-------------------|-------|
37+
| `claude-opus-4-6` | 8192 | Anthropic's most capable reasoning model |
38+
| `claude-opus-45` | 8192 | Works without proxy (198k context) |
39+
| `claude-sonnet-4-6` | 8192 | Best speed/intelligence balance |
40+
| `claude-sonnet-45` | 8192 | Works without proxy (198k context) |
41+
| `openai-gpt-52` | 16384 | GPT-5.2 frontier model |
42+
| `openai-gpt-52-codex` | 16384 | GPT-5.2 optimized for code |
43+
44+
## Adding models
45+
46+
Edit `litellm-config.yaml` following the existing pattern. Use Venice model IDs from their [models endpoint](https://docs.venice.ai/api-reference/endpoint/models/list).
47+
48+
## Why not use Venice directly?
49+
50+
Venice advertises `availableContextTokens: 1000000` for newer Claude/Gemini models. Cursor uses this to budget `max_tokens`, often requesting 200k+ output tokens. Venice rejects these with:
51+
52+
```
53+
max_tokens: 232001 > 128000, which is the maximum allowed number of output tokens for claude-opus-4-6
54+
```
55+
56+
The proxy intercepts this by setting `model_info.max_tokens` per model, which LiteLLM uses to constrain requests.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
model_list:
2+
# Claude Opus 4.6 - clamped to safe limits
3+
- model_name: claude-opus-4-6
4+
litellm_params:
5+
model: openai/claude-opus-4-6
6+
api_base: https://api.venice.ai/api/v1
7+
api_key: os.environ/VENICE_API_KEY
8+
model_info:
9+
max_tokens: 8192
10+
11+
# Claude Opus 4.5
12+
- model_name: claude-opus-45
13+
litellm_params:
14+
model: openai/claude-opus-45
15+
api_base: https://api.venice.ai/api/v1
16+
api_key: os.environ/VENICE_API_KEY
17+
model_info:
18+
max_tokens: 8192
19+
20+
# Claude Sonnet 4.6
21+
- model_name: claude-sonnet-4-6
22+
litellm_params:
23+
model: openai/claude-sonnet-4-6
24+
api_base: https://api.venice.ai/api/v1
25+
api_key: os.environ/VENICE_API_KEY
26+
model_info:
27+
max_tokens: 8192
28+
29+
# Claude Sonnet 4.5
30+
- model_name: claude-sonnet-45
31+
litellm_params:
32+
model: openai/claude-sonnet-45
33+
api_base: https://api.venice.ai/api/v1
34+
api_key: os.environ/VENICE_API_KEY
35+
model_info:
36+
max_tokens: 8192
37+
38+
# GPT-5.2
39+
- model_name: openai-gpt-52
40+
litellm_params:
41+
model: openai/openai-gpt-52
42+
api_base: https://api.venice.ai/api/v1
43+
api_key: os.environ/VENICE_API_KEY
44+
model_info:
45+
max_tokens: 16384
46+
47+
# GPT-5.2 Codex
48+
- model_name: openai-gpt-52-codex
49+
litellm_params:
50+
model: openai/openai-gpt-52-codex
51+
api_base: https://api.venice.ai/api/v1
52+
api_key: os.environ/VENICE_API_KEY
53+
model_info:
54+
max_tokens: 16384
55+
56+
router_settings:
57+
enable_pre_call_checks: true # Check context window before call

0 commit comments

Comments
 (0)