Skip to content

Commit bb5127b

Browse files
authored
Merge branch 'BerriAI:main' into citation-supported-text-3
2 parents 4497dcf + 63c4a30 commit bb5127b

File tree

56 files changed

+3076
-642
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+3076
-642
lines changed

.circleci/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1913,6 +1913,7 @@ jobs:
19131913
-e APORIA_API_BASE_1=$APORIA_API_BASE_1 \
19141914
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
19151915
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
1916+
-e DEFAULT_NUM_WORKERS_LITELLM_PROXY=1
19161917
-e USE_DDTRACE=True \
19171918
-e DD_API_KEY=$DD_API_KEY \
19181919
-e DD_SITE=$DD_SITE \

.github/workflows/test-litellm.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
poetry run pip install "pytest-retry==1.6.3"
3232
poetry run pip install pytest-xdist
3333
poetry run pip install "google-genai==1.22.0"
34+
poetry run pip install "google-cloud-aiplatform>=1.38"
3435
poetry run pip install "fastapi-offline==1.7.3"
3536
- name: Setup litellm-enterprise as local package
3637
run: |

docs/my-website/docs/completion/input.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ def completion(
106106
parallel_tool_calls: Optional[bool] = None,
107107
logprobs: Optional[bool] = None,
108108
top_logprobs: Optional[int] = None,
109+
safety_identifier: Optional[str] = None,
109110
deployment_id=None,
110111
# soon to be deprecated params by OpenAI
111112
functions: Optional[List] = None,
@@ -196,6 +197,8 @@ def completion(
196197

197198
- `top_logprobs`: *int (optional)* - An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to true if this parameter is used.
198199

200+
- `safety_identifier`: *string (optional)* - A unique identifier for tracking and managing safety-related requests. This parameter helps with safety monitoring and compliance tracking.
201+
199202
- `headers`: *dict (optional)* - A dictionary of headers to be sent with the request.
200203

201204
- `extra_headers`: *dict (optional)* - Alternative to `headers`, used to send extra headers in LLM API request.

docs/my-website/docs/providers/vertex_partner.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import TabItem from '@theme/TabItem';
1515
| Mistral | `vertex_ai/mistral-*` | [Vertex AI - Mistral Models](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/mistral) |
1616
| AI21 (Jamba) | `vertex_ai/jamba-*` | [Vertex AI - AI21 Models](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/ai21) |
1717
| Qwen | `vertex_ai/qwen/*` | [Vertex AI - Qwen Models](https://cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen) |
18+
| OpenAI (GPT-OSS) | `vertex_ai/openai/gpt-oss-*` | [Vertex AI - GPT-OSS Models](https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/) |
1819
| Model Garden | `vertex_ai/openai/{MODEL_ID}` or `vertex_ai/{MODEL_ID}` | [Vertex Model Garden](https://cloud.google.com/model-garden?hl=en) |
1920

2021
## Vertex AI - Anthropic (Claude)
@@ -658,6 +659,141 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
658659
</Tabs>
659660

660661

662+
## VertexAI GPT-OSS Models
663+
664+
| Property | Details |
665+
|----------|---------|
666+
| Provider Route | `vertex_ai/openai/{MODEL}` |
667+
| Vertex Documentation | [Vertex AI - GPT-OSS Models](https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/) |
668+
669+
**LiteLLM Supports all Vertex AI GPT-OSS Models.** Ensure you use the `vertex_ai/openai/` prefix for all Vertex AI GPT-OSS models.
670+
671+
| Model Name | Usage |
672+
|------------------|------------------------------|
673+
| vertex_ai/openai/gpt-oss-20b-maas | `completion('vertex_ai/openai/gpt-oss-20b-maas', messages)` |
674+
675+
#### Usage
676+
677+
<Tabs>
678+
<TabItem value="sdk" label="SDK">
679+
680+
```python
681+
from litellm import completion
682+
import os
683+
684+
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
685+
686+
model = "openai/gpt-oss-20b-maas"
687+
688+
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
689+
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
690+
691+
response = completion(
692+
model="vertex_ai/" + model,
693+
messages=[{"role": "user", "content": "hi"}],
694+
vertex_ai_project=vertex_ai_project,
695+
vertex_ai_location=vertex_ai_location,
696+
)
697+
print("\nModel Response", response)
698+
```
699+
</TabItem>
700+
<TabItem value="proxy" label="Proxy">
701+
702+
**1. Add to config**
703+
704+
```yaml
705+
model_list:
706+
- model_name: gpt-oss
707+
litellm_params:
708+
model: vertex_ai/openai/gpt-oss-20b-maas
709+
vertex_ai_project: "my-test-project"
710+
vertex_ai_location: "us-central1"
711+
```
712+
713+
**2. Start proxy**
714+
715+
```bash
716+
litellm --config /path/to/config.yaml
717+
718+
# RUNNING at http://0.0.0.0:4000
719+
```
720+
721+
**3. Test it!**
722+
723+
```bash
724+
curl --location 'http://0.0.0.0:4000/chat/completions' \
725+
--header 'Authorization: Bearer sk-1234' \
726+
--header 'Content-Type: application/json' \
727+
--data '{
728+
"model": "gpt-oss", # 👈 the 'model_name' in config
729+
"messages": [
730+
{
731+
"role": "user",
732+
"content": "what llm are you"
733+
}
734+
],
735+
}'
736+
```
737+
738+
</TabItem>
739+
</Tabs>
740+
741+
#### Usage - `reasoning_effort`
742+
743+
GPT-OSS models support the `reasoning_effort` parameter for enhanced reasoning capabilities.
744+
745+
<Tabs>
746+
<TabItem value="sdk" label="SDK">
747+
748+
```python
749+
from litellm import completion
750+
751+
response = completion(
752+
model="vertex_ai/openai/gpt-oss-20b-maas",
753+
messages=[{"role": "user", "content": "Solve this complex problem step by step"}],
754+
reasoning_effort="low", # Options: "minimal", "low", "medium", "high"
755+
vertex_ai_project="your-vertex-project",
756+
vertex_ai_location="us-central1",
757+
)
758+
```
759+
760+
</TabItem>
761+
762+
<TabItem value="proxy" label="PROXY">
763+
764+
1. Setup config.yaml
765+
766+
```yaml
767+
model_list:
768+
- model_name: gpt-oss
769+
litellm_params:
770+
model: vertex_ai/openai/gpt-oss-20b-maas
771+
vertex_ai_project: "my-test-project"
772+
vertex_ai_location: "us-central1"
773+
```
774+
775+
2. Start proxy
776+
777+
```bash
778+
litellm --config /path/to/config.yaml
779+
```
780+
781+
3. Test it!
782+
783+
```bash
784+
curl http://0.0.0.0:4000/v1/chat/completions \
785+
-H "Content-Type: application/json" \
786+
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
787+
-d '{
788+
"model": "gpt-oss",
789+
"messages": [{"role": "user", "content": "Solve this complex problem step by step"}],
790+
"reasoning_effort": "low"
791+
}'
792+
```
793+
794+
</TabItem>
795+
</Tabs>
796+
661797
## Model Garden
662798

663799
:::tip

docs/my-website/docs/proxy/config_settings.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -573,6 +573,10 @@ router_settings:
573573
| LITELLM_LOCAL_MODEL_COST_MAP | Local configuration for model cost mapping in LiteLLM
574574
| LITELLM_LOG | Enable detailed logging for LiteLLM
575575
| LITELLM_LOG_FILE | File path to write LiteLLM logs to. When set, logs will be written to both console and the specified file
576+
| LITELLM_LOGGER_NAME | Name for OTEL logger
577+
| LITELLM_METER_NAME | Name for OTEL Meter
578+
| LITELLM_OTEL_INTEGRATION_ENABLE_EVENTS | Optionally enable semantic logs for OTEL
579+
| LITELLM_OTEL_INTEGRATION_ENABLE_METRICS | Optionally enable emantic metrics for OTEL
576580
| LITELLM_MASTER_KEY | Master key for proxy authentication
577581
| LITELLM_MODE | Operating mode for LiteLLM (e.g., production, development)
578582
| LITELLM_RATE_LIMIT_WINDOW_SIZE | Rate limit window size for LiteLLM. Default is 60

docs/my-website/docs/proxy/enterprise.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,33 @@ response = client.chat.completions.create(
439439

440440
print(response)
441441
```
442+
443+
**Using Headers:**
444+
445+
```python
446+
import openai
447+
client = openai.OpenAI(
448+
api_key="sk-1234",
449+
base_url="http://0.0.0.0:4000"
450+
)
451+
452+
# Pass spend logs metadata via headers
453+
response = client.chat.completions.create(
454+
model="gpt-3.5-turbo",
455+
messages = [
456+
{
457+
"role": "user",
458+
"content": "this is a test request, write a short poem"
459+
}
460+
],
461+
extra_headers={
462+
"x-litellm-spend-logs-metadata": '{"user_id": "12345", "project_id": "proj_abc", "request_type": "chat_completion"}'
463+
}
464+
)
465+
466+
print(response)
467+
```
468+
442469
</TabItem>
443470

444471

@@ -478,6 +505,43 @@ async function runOpenAI() {
478505
// Call the asynchronous function
479506
runOpenAI();
480507
```
508+
509+
**Using Headers:**
510+
511+
```js
512+
const openai = require('openai');
513+
514+
async function runOpenAI() {
515+
const client = new openai.OpenAI({
516+
apiKey: 'sk-1234',
517+
baseURL: 'http://0.0.0.0:4000'
518+
});
519+
520+
try {
521+
const response = await client.chat.completions.create({
522+
model: 'gpt-3.5-turbo',
523+
messages: [
524+
{
525+
role: 'user',
526+
content: "this is a test request, write a short poem"
527+
},
528+
]
529+
}, {
530+
headers: {
531+
'x-litellm-spend-logs-metadata': '{"user_id": "12345", "project_id": "proj_abc", "request_type": "chat_completion"}'
532+
}
533+
});
534+
console.log(response);
535+
} catch (error) {
536+
console.log("got this exception from server");
537+
console.error(error);
538+
}
539+
}
540+
541+
// Call the asynchronous function
542+
runOpenAI();
543+
```
544+
481545
</TabItem>
482546

483547
<TabItem value="Curl" label="Curl Request">
@@ -502,6 +566,29 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
502566
}
503567
}'
504568
```
569+
570+
</TabItem>
571+
572+
<TabItem value="headers" label="Using Headers">
573+
574+
Pass `x-litellm-spend-logs-metadata` as a request header with JSON string
575+
576+
```shell
577+
curl --location 'http://0.0.0.0:4000/chat/completions' \
578+
--header 'Content-Type: application/json' \
579+
--header 'Authorization: Bearer sk-1234' \
580+
--header 'x-litellm-spend-logs-metadata: {"user_id": "12345", "project_id": "proj_abc", "request_type": "chat_completion"}' \
581+
--data '{
582+
"model": "gpt-3.5-turbo",
583+
"messages": [
584+
{
585+
"role": "user",
586+
"content": "what llm are you"
587+
}
588+
]
589+
}'
590+
```
591+
505592
</TabItem>
506593
<TabItem value="langchain" label="Langchain">
507594

docs/my-website/docs/proxy/request_headers.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Special headers that are supported by LiteLLM.
1414

1515
`x-litellm-num-retries`: Optional[int]: The number of retries for the request.
1616

17+
`x-litellm-spend-logs-metadata`: Optional[str]: JSON string containing custom metadata to include in spend logs. Example: `{"user_id": "12345", "project_id": "proj_abc", "request_type": "chat_completion"}`. [Learn More](../proxy/enterprise#tracking-spend-with-custom-metadata)
18+
1719
## Anthropic Headers
1820

1921
`anthropic-version` Optional[str]: The version of the Anthropic API to use.

litellm/__init__.py

Lines changed: 5 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
bedrock_embedding_models,
6868
known_tokenizer_config,
6969
BEDROCK_INVOKE_PROVIDERS_LITERAL,
70+
BEDROCK_CONVERSE_MODELS,
7071
DEFAULT_MAX_TOKENS,
7172
DEFAULT_SOFT_BUDGET,
7273
DEFAULT_ALLOWED_FAILS,
@@ -432,40 +433,6 @@ def identify(event_details):
432433
project = None
433434
config_path = None
434435
vertex_ai_safety_settings: Optional[dict] = None
435-
BEDROCK_CONVERSE_MODELS = [
436-
"openai.gpt-oss-20b-1:0",
437-
"openai.gpt-oss-120b-1:0",
438-
"anthropic.claude-opus-4-1-20250805-v1:0",
439-
"anthropic.claude-opus-4-20250514-v1:0",
440-
"anthropic.claude-sonnet-4-20250514-v1:0",
441-
"anthropic.claude-3-7-sonnet-20250219-v1:0",
442-
"anthropic.claude-3-5-haiku-20241022-v1:0",
443-
"anthropic.claude-3-5-sonnet-20241022-v2:0",
444-
"anthropic.claude-3-5-sonnet-20240620-v1:0",
445-
"anthropic.claude-3-opus-20240229-v1:0",
446-
"anthropic.claude-3-sonnet-20240229-v1:0",
447-
"anthropic.claude-3-haiku-20240307-v1:0",
448-
"anthropic.claude-v2",
449-
"anthropic.claude-v2:1",
450-
"anthropic.claude-v1",
451-
"anthropic.claude-instant-v1",
452-
"ai21.jamba-instruct-v1:0",
453-
"ai21.jamba-1-5-mini-v1:0",
454-
"ai21.jamba-1-5-large-v1:0",
455-
"meta.llama3-70b-instruct-v1:0",
456-
"meta.llama3-8b-instruct-v1:0",
457-
"meta.llama3-1-8b-instruct-v1:0",
458-
"meta.llama3-1-70b-instruct-v1:0",
459-
"meta.llama3-1-405b-instruct-v1:0",
460-
"meta.llama3-70b-instruct-v1:0",
461-
"mistral.mistral-large-2407-v1:0",
462-
"mistral.mistral-large-2402-v1:0",
463-
"mistral.mistral-small-2402-v1:0",
464-
"meta.llama3-2-1b-instruct-v1:0",
465-
"meta.llama3-2-3b-instruct-v1:0",
466-
"meta.llama3-2-11b-instruct-v1:0",
467-
"meta.llama3-2-90b-instruct-v1:0",
468-
]
469436

470437
####### COMPLETION MODELS ###################
471438
from typing import Set
@@ -491,6 +458,7 @@ def identify(event_details):
491458
vertex_deepseek_models: Set = set()
492459
vertex_ai_ai21_models: Set = set()
493460
vertex_mistral_models: Set = set()
461+
vertex_openai_models: Set = set()
494462
ai21_models: Set = set()
495463
ai21_chat_models: Set = set()
496464
nlp_cloud_models: Set = set()
@@ -637,6 +605,9 @@ def add_known_models():
637605
elif value.get("litellm_provider") == "vertex_ai-image-models":
638606
key = key.replace("vertex_ai/", "")
639607
vertex_ai_image_models.add(key)
608+
elif value.get("litellm_provider") == "vertex_ai-openai_models":
609+
key = key.replace("vertex_ai/", "")
610+
vertex_openai_models.add(key)
640611
elif value.get("litellm_provider") == "ai21":
641612
if value.get("mode") == "chat":
642613
ai21_chat_models.add(key)

0 commit comments

Comments
 (0)