|
| 1 | +# Venice API + Cursor IDE via LiteLLM Proxy |
| 2 | + |
| 3 | +Venice models with 1M token context windows (e.g. `claude-opus-4-6`, `claude-sonnet-4-6`) fail in Cursor because Cursor derives `max_tokens` from the context window and sends values that exceed Venice's output token limits. |
| 4 | + |
| 5 | +Models with ≤200k context (e.g. `claude-opus-45`) work without a proxy. |
| 6 | + |
| 7 | +LiteLLM sits between Cursor and Venice, clamping `max_tokens` to safe values. |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +- `VENICE_API_KEY` exported in your shell (e.g. in `~/.zshrc`) |
| 12 | +- Python 3.9+ |
| 13 | + |
| 14 | +## Setup |
| 15 | + |
| 16 | +```bash |
| 17 | +pip install 'litellm[proxy]' |
| 18 | +``` |
| 19 | + |
| 20 | +## Start the proxy |
| 21 | + |
| 22 | +```bash |
| 23 | +litellm --config /path/to/litellm-config.yaml --port 8765 |
| 24 | +``` |
| 25 | + |
| 26 | +## Configure Cursor |
| 27 | + |
| 28 | +1. Open **Settings > Models** |
| 29 | +2. Add custom models by name: `claude-opus-4-6`, `openai-gpt-52`, etc. |
| 30 | +3. Under **OpenAI API Key**, enter your Venice API key |
| 31 | +4. Set **Override OpenAI Base URL** to `http://localhost:8765` |
| 32 | + |
| 33 | +## Available models |
| 34 | + |
| 35 | +| Model | Max Output Tokens | Notes | |
| 36 | +|-------|-------------------|-------| |
| 37 | +| `claude-opus-4-6` | 8192 | Anthropic's most capable reasoning model | |
| 38 | +| `claude-opus-45` | 8192 | Works without proxy (198k context) | |
| 39 | +| `claude-sonnet-4-6` | 8192 | Best speed/intelligence balance | |
| 40 | +| `claude-sonnet-45` | 8192 | Works without proxy (198k context) | |
| 41 | +| `openai-gpt-52` | 16384 | GPT-5.2 frontier model | |
| 42 | +| `openai-gpt-52-codex` | 16384 | GPT-5.2 optimized for code | |
| 43 | + |
| 44 | +## Adding models |
| 45 | + |
| 46 | +Edit `litellm-config.yaml` following the existing pattern. Use Venice model IDs from their [models endpoint](https://docs.venice.ai/api-reference/endpoint/models/list). |
| 47 | + |
| 48 | +## Why not use Venice directly? |
| 49 | + |
| 50 | +Venice advertises `availableContextTokens: 1000000` for newer Claude/Gemini models. Cursor uses this to budget `max_tokens`, often requesting 200k+ output tokens. Venice rejects these with: |
| 51 | + |
| 52 | +``` |
| 53 | +max_tokens: 232001 > 128000, which is the maximum allowed number of output tokens for claude-opus-4-6 |
| 54 | +``` |
| 55 | + |
| 56 | +The proxy intercepts this by setting `model_info.max_tokens` per model, which LiteLLM uses to constrain requests. |
0 commit comments