Skip to content

Commit a9037ab

Browse files
Merge branch 'main' into litellm_dev_10_03_2025_p1
2 parents f5cda40 + 85c4dd1 commit a9037ab

File tree

105 files changed

+4790
-1548
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+4790
-1548
lines changed

.circleci/config.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,24 @@ jobs:
616616
wget https://github.com/jwilder/dockerize/releases/download/v0.6.1/dockerize-linux-amd64-v0.6.1.tar.gz
617617
sudo tar -C /usr/local/bin -xzvf dockerize-linux-amd64-v0.6.1.tar.gz
618618
rm dockerize-linux-amd64-v0.6.1.tar.gz
619+
- run:
620+
name: Start PostgreSQL Database
621+
command: |
622+
docker run -d \
623+
--name postgres-db \
624+
-e POSTGRES_USER=postgres \
625+
-e POSTGRES_PASSWORD=postgres \
626+
-e POSTGRES_DB=circle_test \
627+
-p 5432:5432 \
628+
postgres:14
629+
- run:
630+
name: Wait for PostgreSQL to be ready
631+
command: dockerize -wait tcp://localhost:5432 -timeout 1m
632+
- run:
633+
name: Set DATABASE_URL environment variable
634+
command: |
635+
echo 'export DATABASE_URL="postgresql://postgres:postgres@localhost:5432/circle_test"' >> $BASH_ENV
636+
source $BASH_ENV
619637
- run:
620638
name: Run Security Scans
621639
command: |

docs/my-website/docs/providers/vertex.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,163 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
621621

622622

623623

624+
#### **Google Maps**
625+
626+
Use Google Maps to provide location-based context to your Gemini models.
627+
628+
[**Relevant Vertex AI Docs**](https://ai.google.dev/gemini-api/docs/grounding#google-maps)
629+
630+
<Tabs>
631+
<TabItem value="sdk" label="SDK">
632+
633+
**Basic Usage - Enable Widget Only**
634+
635+
```python showLineNumbers
636+
from litellm import completion
637+
638+
## SETUP ENVIRONMENT
639+
# !gcloud auth application-default login - run this to add vertex credentials to your env
640+
641+
tools = [{"googleMaps": {"enableWidget": "ENABLE_WIDGET"}}] # 👈 ADD GOOGLE MAPS
642+
643+
resp = litellm.completion(
644+
model="vertex_ai/gemini-2.0-flash",
645+
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
646+
tools=tools,
647+
)
648+
649+
print(resp)
650+
```
651+
652+
**With Location Data**
653+
654+
You can specify a location to ground the model's responses with location-specific information:
655+
656+
```python showLineNumbers
657+
from litellm import completion
658+
659+
## SETUP ENVIRONMENT
660+
# !gcloud auth application-default login - run this to add vertex credentials to your env
661+
662+
tools = [{
663+
"googleMaps": {
664+
"enableWidget": "ENABLE_WIDGET",
665+
"latitude": 37.7749, # San Francisco latitude
666+
"longitude": -122.4194, # San Francisco longitude
667+
"languageCode": "en_US" # Optional: language for results
668+
}
669+
}] # 👈 ADD GOOGLE MAPS WITH LOCATION
670+
671+
resp = litellm.completion(
672+
model="vertex_ai/gemini-2.0-flash",
673+
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
674+
tools=tools,
675+
)
676+
677+
print(resp)
678+
```
679+
680+
</TabItem>
681+
<TabItem value="proxy" label="PROXY">
682+
683+
<Tabs>
684+
<TabItem value="openai" label="OpenAI Python SDK">
685+
686+
**Basic Usage - Enable Widget Only**
687+
688+
```python showLineNumbers
689+
from openai import OpenAI
690+
691+
client = OpenAI(
692+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
693+
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
694+
)
695+
696+
response = client.chat.completions.create(
697+
model="gemini-2.0-flash",
698+
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
699+
tools=[{"googleMaps": {"enableWidget": "ENABLE_WIDGET"}}],
700+
)
701+
702+
print(response)
703+
```
704+
705+
**With Location Data**
706+
707+
```python showLineNumbers
708+
from openai import OpenAI
709+
710+
client = OpenAI(
711+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
712+
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
713+
)
714+
715+
response = client.chat.completions.create(
716+
model="gemini-2.0-flash",
717+
messages=[{"role": "user", "content": "What restaurants are nearby?"}],
718+
tools=[{
719+
"googleMaps": {
720+
"enableWidget": "ENABLE_WIDGET",
721+
"latitude": 37.7749, # San Francisco latitude
722+
"longitude": -122.4194, # San Francisco longitude
723+
"languageCode": "en_US" # Optional: language for results
724+
}
725+
}],
726+
)
727+
728+
print(response)
729+
```
730+
</TabItem>
731+
<TabItem value="curl" label="cURL">
732+
733+
**Basic Usage - Enable Widget Only**
734+
735+
```bash showLineNumbers
736+
curl http://localhost:4000/v1/chat/completions \
737+
-H "Content-Type: application/json" \
738+
-H "Authorization: Bearer sk-1234" \
739+
-d '{
740+
"model": "gemini-2.0-flash",
741+
"messages": [
742+
{"role": "user", "content": "What restaurants are nearby?"}
743+
],
744+
"tools": [
745+
{
746+
"googleMaps": {"enableWidget": "ENABLE_WIDGET"}
747+
}
748+
]
749+
}'
750+
```
751+
752+
**With Location Data**
753+
754+
```bash showLineNumbers
755+
curl http://localhost:4000/v1/chat/completions \
756+
-H "Content-Type: application/json" \
757+
-H "Authorization: Bearer sk-1234" \
758+
-d '{
759+
"model": "gemini-2.0-flash",
760+
"messages": [
761+
{"role": "user", "content": "What restaurants are nearby?"}
762+
],
763+
"tools": [
764+
{
765+
"googleMaps": {
766+
"enableWidget": "ENABLE_WIDGET",
767+
"latitude": 37.7749,
768+
"longitude": -122.4194,
769+
"languageCode": "en_US"
770+
}
771+
}
772+
]
773+
}'
774+
```
775+
</TabItem>
776+
</Tabs>
777+
778+
</TabItem>
779+
</Tabs>
780+
624781
#### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**
625782

626783

docs/my-website/docs/proxy/cost_tracking.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ Track spend for keys, users, and teams across 100+ LLMs.
88

99
LiteLLM automatically tracks spend for all known models. See our [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
1010

11+
:::tip Keep Pricing Data Updated
12+
[Sync model pricing data from GitHub](../sync_models_github.md) to ensure accurate cost tracking.
13+
:::
14+
1115
### How to Track Spend with LiteLLM
1216

1317
**Step 1**

docs/my-website/docs/proxy/model_management.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ model_list:
1919

2020
Retrieve detailed information about each model listed in the `/model/info` endpoint, including descriptions from the `config.yaml` file, and additional model info (e.g. max tokens, cost per input token, etc.) pulled from the model_info you set and the [litellm model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). Sensitive details like API keys are excluded for security purposes.
2121

22+
:::tip Sync Model Data
23+
Keep your model pricing data up to date by [syncing models from GitHub](../sync_models_github.md).
24+
:::
25+
2226
<Tabs
2327
defaultValue="curl"
2428
values={[
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Syncing Models to GitHub model_context_window
2+
3+
Sync model pricing data from GitHub's `model_prices_and_context_window.json` file outside of the LiteLLM UI.
4+
5+
> **📹 Video Tutorial**: [Watch how to sync models via the Admin UI](https://www.loom.com/share/ba41acc1882d41b284bbddbb0e9c27ce?sid=bdae351e-2026-4e39-932b-fcb185ff612c)
6+
7+
## Quick Start
8+
9+
**Manual sync:**
10+
```bash
11+
curl -X POST "https://your-proxy-url/reload/model_cost_map" \
12+
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
13+
-H "Content-Type: application/json"
14+
```
15+
16+
**Automatic sync every 6 hours:**
17+
```bash
18+
curl -X POST "https://your-proxy-url/schedule/model_cost_map_reload?hours=6" \
19+
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
20+
-H "Content-Type: application/json"
21+
```
22+
23+
## API Endpoints
24+
25+
| Endpoint | Method | Description |
26+
|----------|--------|-------------|
27+
| `/reload/model_cost_map` | POST | Manual sync |
28+
| `/schedule/model_cost_map_reload?hours={hours}` | POST | Schedule periodic sync |
29+
| `/schedule/model_cost_map_reload` | DELETE | Cancel scheduled sync |
30+
| `/schedule/model_cost_map_reload/status` | GET | Check sync status |
31+
32+
**Authentication:** Requires admin role or master key
33+
34+
## Python Example
35+
36+
```python
37+
import requests
38+
39+
def sync_models(proxy_url, admin_token):
40+
response = requests.post(
41+
f"{proxy_url}/reload/model_cost_map",
42+
headers={"Authorization": f"Bearer {admin_token}"}
43+
)
44+
return response.json()
45+
46+
# Usage
47+
result = sync_models("https://your-proxy-url", "your-admin-token")
48+
print(result['message'])
49+
```
50+
51+
## Configuration
52+
53+
**Custom model cost map URL:**
54+
```bash
55+
export LITELLM_MODEL_COST_MAP_URL="https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json"
56+
```
57+
58+
**Use local model cost map:**
59+
```bash
60+
export LITELLM_LOCAL_MODEL_COST_MAP=True
61+
```

docs/my-website/docs/proxy/ui.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,20 @@ Allow others to create/delete their own keys.
5454

5555
[**Go Here**](./self_serve.md)
5656

57+
## Model Management
58+
59+
The Admin UI provides comprehensive model management capabilities:
60+
61+
- **Add Models**: Add new models through the UI without restarting the proxy
62+
- **Model Hub**: Make models public for developers to discover available models
63+
- **Price Data Sync**: Keep model pricing data up to date by syncing from GitHub
64+
65+
For detailed information on model management, see [Model Management](./model_management.md).
66+
67+
:::tip Sync Model Pricing Data
68+
[Sync model pricing data from GitHub](./sync_models_github.md) to keep your model cost information current.
69+
:::
70+
5771
## Disable Admin UI
5872

5973
Set `DISABLE_ADMIN_UI="True"` in your environment to disable the Admin UI.

docs/my-website/release_notes/v1.75.5-stable/index.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,31 @@ pip install litellm==1.75.5.post2
5151
- **Digital Ocean's Gradient AI** - New LLM provider for calling models on Digital Ocean's Gradient AI platform.
5252

5353

54+
### 54% RPS Improvement
55+
56+
Throughput increased by 54% (1,040 → 1,602 RPS, aggregated) per instance while maintaining a 40 ms median overhead. The improvement comes from fixing major O(n²) inefficiencies in the router, primarily caused by repeated use of in statements inside loops over large arrays. Tests were run with a database-only setup (no cache hits). As a result, p95 latency improved by 30% (2,700 → 1,900 ms), enhancing overall stability and scalability under heavy load.
57+
58+
---
59+
60+
### Test Setup
61+
62+
All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.
63+
64+
**System Specs**
65+
66+
- **CPU:** 8 vCPUs
67+
- **Memory:** 32 GB RAM
68+
69+
**Configuration (config.yaml)**
70+
71+
View the complete configuration: [gist.github.com/AlexsanderHamir/config.yaml](https://gist.github.com/AlexsanderHamir/53f7d554a5d2afcf2c4edb5b6be68ff4)
72+
73+
**Load Script (no_cache_hits.py)**
74+
75+
View the complete load testing script: [gist.github.com/AlexsanderHamir/no_cache_hits.py](https://gist.github.com/AlexsanderHamir/42c33d7a4dc7a57f56a78b560dee3a42)
76+
77+
---
78+
5479
### Risk of Upgrade
5580

5681
If you build the proxy from the pip package, you should hold off on upgrading. This version makes `prisma migrate deploy` our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual `prisma generate` step.

0 commit comments

Comments
 (0)