Skip to content

Commit c821f1d

Browse files
authored
[Feature]: Support GPT-OSS models on vertex ai (#14184)
* add VertexAIGPTOSSTransformation * fix: optional_params * fix: is_vertex_partner_model * test_partner_models_httpx * docs GPT oss docs * test_vertex_ai_gpt_oss_reasoning_effort * add vertex ai models
1 parent 4b7c114 commit c821f1d

File tree

10 files changed

+455
-2
lines changed

10 files changed

+455
-2
lines changed

docs/my-website/docs/providers/vertex_partner.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import TabItem from '@theme/TabItem';
1515
| Mistral | `vertex_ai/mistral-*` | [Vertex AI - Mistral Models](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/mistral) |
1616
| AI21 (Jamba) | `vertex_ai/jamba-*` | [Vertex AI - AI21 Models](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/ai21) |
1717
| Qwen | `vertex_ai/qwen/*` | [Vertex AI - Qwen Models](https://cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen) |
18+
| OpenAI (GPT-OSS) | `vertex_ai/openai/gpt-oss-*` | [Vertex AI - GPT-OSS Models](https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/) |
1819
| Model Garden | `vertex_ai/openai/{MODEL_ID}` or `vertex_ai/{MODEL_ID}` | [Vertex Model Garden](https://cloud.google.com/model-garden?hl=en) |
1920

2021
## Vertex AI - Anthropic (Claude)
@@ -658,6 +659,141 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
658659
</Tabs>
659660

660661

662+
## VertexAI GPT-OSS Models
663+
664+
| Property | Details |
665+
|----------|---------|
666+
| Provider Route | `vertex_ai/openai/{MODEL}` |
667+
| Vertex Documentation | [Vertex AI - GPT-OSS Models](https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/) |
668+
669+
**LiteLLM Supports all Vertex AI GPT-OSS Models.** Ensure you use the `vertex_ai/openai/` prefix for all Vertex AI GPT-OSS models.
670+
671+
| Model Name | Usage |
672+
|------------------|------------------------------|
673+
| vertex_ai/openai/gpt-oss-20b-maas | `completion('vertex_ai/openai/gpt-oss-20b-maas', messages)` |
674+
675+
#### Usage
676+
677+
<Tabs>
678+
<TabItem value="sdk" label="SDK">
679+
680+
```python
681+
from litellm import completion
682+
import os
683+
684+
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
685+
686+
model = "openai/gpt-oss-20b-maas"
687+
688+
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
689+
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
690+
691+
response = completion(
692+
model="vertex_ai/" + model,
693+
messages=[{"role": "user", "content": "hi"}],
694+
vertex_ai_project=vertex_ai_project,
695+
vertex_ai_location=vertex_ai_location,
696+
)
697+
print("\nModel Response", response)
698+
```
699+
</TabItem>
700+
<TabItem value="proxy" label="Proxy">
701+
702+
**1. Add to config**
703+
704+
```yaml
705+
model_list:
706+
- model_name: gpt-oss
707+
litellm_params:
708+
model: vertex_ai/openai/gpt-oss-20b-maas
709+
vertex_ai_project: "my-test-project"
710+
vertex_ai_location: "us-central1"
711+
```
712+
713+
**2. Start proxy**
714+
715+
```bash
716+
litellm --config /path/to/config.yaml
717+
718+
# RUNNING at http://0.0.0.0:4000
719+
```
720+
721+
**3. Test it!**
722+
723+
```bash
724+
curl --location 'http://0.0.0.0:4000/chat/completions' \
725+
--header 'Authorization: Bearer sk-1234' \
726+
--header 'Content-Type: application/json' \
727+
--data '{
728+
"model": "gpt-oss", # 👈 the 'model_name' in config
729+
"messages": [
730+
{
731+
"role": "user",
732+
"content": "what llm are you"
733+
}
734+
],
735+
}'
736+
```
737+
738+
</TabItem>
739+
</Tabs>
740+
741+
#### Usage - `reasoning_effort`
742+
743+
GPT-OSS models support the `reasoning_effort` parameter for enhanced reasoning capabilities.
744+
745+
<Tabs>
746+
<TabItem value="sdk" label="SDK">
747+
748+
```python
749+
from litellm import completion
750+
751+
response = completion(
752+
model="vertex_ai/openai/gpt-oss-20b-maas",
753+
messages=[{"role": "user", "content": "Solve this complex problem step by step"}],
754+
reasoning_effort="low", # Options: "minimal", "low", "medium", "high"
755+
vertex_ai_project="your-vertex-project",
756+
vertex_ai_location="us-central1",
757+
)
758+
```
759+
760+
</TabItem>
761+
762+
<TabItem value="proxy" label="PROXY">
763+
764+
1. Setup config.yaml
765+
766+
```yaml
767+
model_list:
768+
- model_name: gpt-oss
769+
litellm_params:
770+
model: vertex_ai/openai/gpt-oss-20b-maas
771+
vertex_ai_project: "my-test-project"
772+
vertex_ai_location: "us-central1"
773+
```
774+
775+
2. Start proxy
776+
777+
```bash
778+
litellm --config /path/to/config.yaml
779+
```
780+
781+
3. Test it!
782+
783+
```bash
784+
curl http://0.0.0.0:4000/v1/chat/completions \
785+
-H "Content-Type: application/json" \
786+
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
787+
-d '{
788+
"model": "gpt-oss",
789+
"messages": [{"role": "user", "content": "Solve this complex problem step by step"}],
790+
"reasoning_effort": "low"
791+
}'
792+
```
793+
794+
</TabItem>
795+
</Tabs>
796+
661797
## Model Garden
662798

663799
:::tip

litellm/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,7 @@ def identify(event_details):
458458
vertex_deepseek_models: Set = set()
459459
vertex_ai_ai21_models: Set = set()
460460
vertex_mistral_models: Set = set()
461+
vertex_openai_models: Set = set()
461462
ai21_models: Set = set()
462463
ai21_chat_models: Set = set()
463464
nlp_cloud_models: Set = set()
@@ -604,6 +605,9 @@ def add_known_models():
604605
elif value.get("litellm_provider") == "vertex_ai-image-models":
605606
key = key.replace("vertex_ai/", "")
606607
vertex_ai_image_models.add(key)
608+
elif value.get("litellm_provider") == "vertex_ai-openai_models":
609+
key = key.replace("vertex_ai/", "")
610+
vertex_openai_models.add(key)
607611
elif value.get("litellm_provider") == "ai21":
608612
if value.get("mode") == "chat":
609613
ai21_chat_models.add(key)
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import litellm
2+
from litellm.llms.openai.chat.gpt_transformation import OpenAIGPTConfig
3+
4+
5+
class VertexAIGPTOSSTransformation(OpenAIGPTConfig):
6+
"""
7+
Transformation for GPT-OSS model from VertexAI
8+
9+
https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/gpt-oss-120b-maas?hl=id
10+
"""
11+
def __init__(self):
12+
super().__init__()
13+
14+
def get_supported_openai_params(self, model: str) -> list:
15+
base_gpt_series_params = super().get_supported_openai_params(model=model)
16+
gpt_oss_only_params = ["reasoning_effort"]
17+
base_gpt_series_params.extend(gpt_oss_only_params)
18+
19+
#########################################################
20+
# VertexAI - GPT-OSS does not support tool calls
21+
#########################################################
22+
if litellm.supports_function_calling(model=model) is False:
23+
TOOL_CALLING_PARAMS_TO_REMOVE = ["tool", "tool_choice", "function_call", "functions"]
24+
base_gpt_series_params = [param for param in base_gpt_series_params if param not in TOOL_CALLING_PARAMS_TO_REMOVE]
25+
26+
return base_gpt_series_params
27+

litellm/llms/vertex_ai/vertex_ai_partner_models/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ def is_vertex_partner_model(model: str):
4949
or model.startswith("jamba")
5050
or model.startswith("claude")
5151
or model.startswith("qwen")
52+
or model.startswith("openai")
5253
):
5354
return True
5455
return False
@@ -59,6 +60,7 @@ def should_use_openai_handler(model: str):
5960
"llama",
6061
"deepseek-ai",
6162
"qwen",
63+
"openai",
6264
]
6365
if any(provider in model for provider in OPENAI_LIKE_VERTEX_PROVIDERS):
6466
return True

litellm/model_prices_and_context_window_backup.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9884,6 +9884,28 @@
98849884
"supports_tool_choice": true,
98859885
"supports_prompt_caching": true
98869886
},
9887+
"vertex_ai/openai/gpt-oss-20b-maas": {
9888+
"max_tokens": 32768,
9889+
"max_input_tokens": 131072,
9890+
"max_output_tokens": 32768,
9891+
"input_cost_per_token": 0.075e-06,
9892+
"output_cost_per_token": 0.30e-06,
9893+
"litellm_provider": "vertex_ai-openai_models",
9894+
"mode": "chat",
9895+
"supports_reasoning": true,
9896+
"source": "https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/gpt-oss-120b-maas"
9897+
},
9898+
"vertex_ai/openai/gpt-oss-120b-maas": {
9899+
"max_tokens": 32768,
9900+
"max_input_tokens": 131072,
9901+
"max_output_tokens": 32768,
9902+
"input_cost_per_token": 0.15e-06,
9903+
"output_cost_per_token": 0.60e-06,
9904+
"litellm_provider": "vertex_ai-openai_models",
9905+
"mode": "chat",
9906+
"supports_reasoning": true,
9907+
"source": "https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/gpt-oss-120b-maas"
9908+
},
98879909
"vertex_ai/qwen/qwen3-coder-480b-a35b-instruct-maas": {
98889910
"max_tokens": 32768,
98899911
"max_input_tokens": 262144,

litellm/proxy/proxy_config.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,3 @@ model_list:
33
litellm_params:
44
model: openai/*
55
api_base: https://exampleopenaiendpoint-production-0ee2.up.railway.app/
6-
mock_response: "hi"

litellm/utils.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3601,6 +3601,17 @@ def _check_valid_arg(supported_params: List[str]):
36013601
else False
36023602
),
36033603
)
3604+
elif provider_config is not None:
3605+
optional_params = provider_config.map_openai_params(
3606+
non_default_params=non_default_params,
3607+
optional_params=optional_params,
3608+
model=model,
3609+
drop_params=(
3610+
drop_params
3611+
if drop_params is not None and isinstance(drop_params, bool)
3612+
else False
3613+
),
3614+
)
36043615
else: # use generic openai-like param mapping
36053616
optional_params = litellm.VertexAILlama3Config().map_openai_params(
36063617
non_default_params=non_default_params,
@@ -6864,6 +6875,11 @@ def get_provider_chat_config( # noqa: PLR0915
68646875
return litellm.VertexGeminiConfig()
68656876
elif "claude" in model:
68666877
return litellm.VertexAIAnthropicConfig()
6878+
elif "gpt-oss" in model:
6879+
from litellm.llms.vertex_ai.vertex_ai_partner_models.gpt_oss.transformation import (
6880+
VertexAIGPTOSSTransformation,
6881+
)
6882+
return VertexAIGPTOSSTransformation()
68676883
elif model in litellm.vertex_mistral_models:
68686884
if "codestral" in model:
68696885
return litellm.CodestralTextCompletionConfig()

model_prices_and_context_window.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9884,6 +9884,28 @@
98849884
"supports_tool_choice": true,
98859885
"supports_prompt_caching": true
98869886
},
9887+
"vertex_ai/openai/gpt-oss-20b-maas": {
9888+
"max_tokens": 32768,
9889+
"max_input_tokens": 131072,
9890+
"max_output_tokens": 32768,
9891+
"input_cost_per_token": 0.075e-06,
9892+
"output_cost_per_token": 0.30e-06,
9893+
"litellm_provider": "vertex_ai-openai_models",
9894+
"mode": "chat",
9895+
"supports_reasoning": true,
9896+
"source": "https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/gpt-oss-120b-maas"
9897+
},
9898+
"vertex_ai/openai/gpt-oss-120b-maas": {
9899+
"max_tokens": 32768,
9900+
"max_input_tokens": 131072,
9901+
"max_output_tokens": 32768,
9902+
"input_cost_per_token": 0.15e-06,
9903+
"output_cost_per_token": 0.60e-06,
9904+
"litellm_provider": "vertex_ai-openai_models",
9905+
"mode": "chat",
9906+
"supports_reasoning": true,
9907+
"source": "https://console.cloud.google.com/vertex-ai/publishers/openai/model-garden/gpt-oss-120b-maas"
9908+
},
98879909
"vertex_ai/qwen/qwen3-coder-480b-a35b-instruct-maas": {
98889910
"max_tokens": 32768,
98899911
"max_input_tokens": 262144,

tests/local_testing/test_amazing_vertex_completion.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,8 @@ async def test_gemini_pro_function_calling_httpx(model, sync_mode):
840840
[
841841
("vertex_ai/mistral-large-2411", "us-central1"),
842842
("vertex_ai/mistral-nemo@2407", "us-central1"),
843-
("vertex_ai/qwen/qwen3-coder-480b-a35b-instruct-maas", "us-south1")
843+
("vertex_ai/qwen/qwen3-coder-480b-a35b-instruct-maas", "us-south1"),
844+
("vertex_ai/openai/gpt-oss-20b-maas", "us-central1"),
844845
],
845846
)
846847
@pytest.mark.parametrize(
@@ -911,6 +912,7 @@ async def test_partner_models_httpx(model, region, sync_mode):
911912
("vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas", "us-east5"),
912913
("vertex_ai/qwen/qwen3-coder-480b-a35b-instruct-maas", "us-south1"),
913914
("vertex_ai/mistral-large-2411", "us-central1"), # critical - we had this issue: https://github.com/BerriAI/litellm/issues/13888
915+
("vertex_ai/openai/gpt-oss-20b-maas", "us-central1"),
914916
],
915917
)
916918
@pytest.mark.parametrize(

0 commit comments

Comments
 (0)