Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
cc3e5ba
Fix cache_control hook to support role+index filtering and negative i…
muneerusman25 Feb 11, 2026
cd710fe
fix(factory): handle list content in map_system_message_pt
voidborne-d Mar 16, 2026
5892143
fix(proxy): instantiate OTEL callback at startup instead of deferring…
Harshit28j Mar 17, 2026
e465014
fix: add role key to satisfy AllMessageValues type contract
voidborne-d Mar 17, 2026
1970157
fix: handle None content in _get_content_as_str
voidborne-d Mar 17, 2026
6eaa48a
fix: req changes from greptile
Harshit28j Mar 21, 2026
6ef2086
[Fix]: streaming_handler.py:197 - Error in _route_streaming_logging_t…
FBIKKIBF Mar 21, 2026
1f573d0
fix: copy dict before mutation + use filter(None, ...) to avoid trail…
voidborne-d Mar 21, 2026
f4904e5
fix: greptile feedback
Harshit28j Mar 21, 2026
5c93fc2
fix: address review comments + black formatting
voidborne-d Mar 21, 2026
52cdf94
fix: req changes on test case
Harshit28j Mar 21, 2026
2f3b395
Update litellm/llms/anthropic/chat/handler.py for exception logging
FBIKKIBF Mar 21, 2026
eabac04
Fix logging import and format
FBIKKIBF Mar 21, 2026
4c65ed9
Add customizable CORS settings
emerzon Dec 19, 2025
e779a85
Apply suggestions from code review
emerzon Dec 19, 2025
881da92
Address greptile CORS review feedback
emerzon Mar 21, 2026
b22dc5c
Address additional CORS review feedback
emerzon Mar 21, 2026
ea5502b
Address latest CORS review feedback
emerzon Mar 21, 2026
13f08c9
Address CORS env precedence and wildcard feedback
emerzon Mar 21, 2026
a62cc78
Address final CORS review feedback
emerzon Mar 21, 2026
803ae26
Address empty CORS methods and headers feedback
emerzon Mar 21, 2026
aab385a
Address CORS credentials validation feedback
emerzon Mar 21, 2026
abdc20c
Merge branch 'main' into cors_settings
emerzon Mar 21, 2026
db4cb3c
Fix CORS config startup ordering
emerzon Mar 21, 2026
48d42b4
Fix CORS include loading edge cases
emerzon Mar 21, 2026
6393b36
Resolve list-based CORS env refs
emerzon Mar 21, 2026
5952f02
Tighten CORS test cleanup and docs
emerzon Mar 21, 2026
11aa499
Merge pull request #24288 from FBIKKIBF/main
krrishdholakia Mar 22, 2026
d559073
Merge pull request #18265 from emerzon/cors_settings
krrishdholakia Mar 22, 2026
cdb06fd
Merge pull request #20955 from muneerusman25/fix_cache_control_problem
krrishdholakia Mar 22, 2026
680b1bb
Merge pull request #23802 from Harshit28j/litellm_otel_callback
krrishdholakia Mar 22, 2026
4c12967
Adding pricing and model data for Bedrock Z.AI GLM 5 model. (#24240)
cmbaatz Mar 22, 2026
13b893d
fix(otel): use completion_tokens_details for Chat Completions API rea…
AtharvaJaiswal005 Mar 22, 2026
abc6536
fix(key_rotation): add distributed lock to prevent concurrent rotatio…
michelligabriele Mar 22, 2026
486a752
docs(minimax): update model descriptions and add new M2.5 models (#23…
bugparty Mar 22, 2026
053e923
fix(ui): guard PriceDataManagementTab TabPanel with admin role check …
xykong Mar 22, 2026
c49ef4d
Merge pull request #23782 from voidborne-d/fix/map-system-message-lis…
krrishdholakia Mar 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/my-website/docs/providers/minimax.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ Litellm provides anthropic specs compatible support for minmax

## Supported Models

MiniMax offers three models through their Anthropic-compatible API:
MiniMax offers the following models through their Anthropic-compatible API:

| Model | Description | Input Cost | Output Cost | Prompt Caching Read | Prompt Caching Write |
|-------|-------------|------------|-------------|---------------------|----------------------|
| **MiniMax-M2.1** | Powerful Multi-Language Programming with Enhanced Programming Experience (~60 tps) | $0.3/M tokens | $1.2/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.1-lightning** | Faster and More Agile (~100 tps) | $0.3/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.1-lightning** | Deprecated model name. Use `MiniMax-M2.1-highspeed` for new integrations. | $0.3/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.1-highspeed** | High-speed variant of MiniMax M2.1 | $0.6/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.5** | MiniMax M2.5 general-purpose model | $0.3/M tokens | $1.2/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.5-lightning** | Deprecated model name. Use `MiniMax-M2.5-highspeed` for new integrations. | $0.3/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2.5-highspeed** | High-speed variant of MiniMax M2.5 | $0.6/M tokens | $2.4/M tokens | $0.03/M tokens | $0.375/M tokens |
| **MiniMax-M2** | Agentic capabilities, Advanced reasoning | $0.3/M tokens | $1.2/M tokens | $0.03/M tokens | $0.375/M tokens |


Expand Down
4 changes: 4 additions & 0 deletions docs/my-website/docs/proxy/config_settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,10 @@ router_settings:
| enforce_user_param | boolean | If true, requires all OpenAI endpoint requests to have a 'user' param. [Doc on call hooks](call_hooks)|
| reject_clientside_metadata_tags | boolean | If true, rejects requests that contain client-side 'metadata.tags' to prevent users from influencing budgets by sending different tags. Tags can only be inherited from the API key metadata. |
| allowed_routes | array of strings | List of allowed proxy API routes a user can access [Doc on controlling allowed routes](enterprise#control-available-public-private-routes)|
| cors_allow_origins | Union[str, List[str]] | CORS allowlist origins for the proxy. Defaults to `["*"]` when unset. Set this to `[]` to disable CORS for all origins, or provide explicit origins to restrict access. Existing `LITELLM_CORS_*` env vars take precedence over config values. Restart the proxy after changing any CORS setting. |
| cors_allow_credentials | boolean | Allow CORS credentials. Defaults to `false` when `cors_allow_origins` is explicitly configured and this setting is unset. Otherwise it preserves the proxy's existing default behavior. Wildcard origins or patterns disable credentials. |
| cors_allow_methods | Union[str, List[str]] | CORS allowlist methods for the proxy. Defaults to `"*"` when unset. |
| cors_allow_headers | Union[str, List[str]] | CORS allowlist headers for the proxy. Defaults to `"*"` when unset. |
| key_management_system | string | Specifies the key management system. [Doc Secret Managers](../secret) |
| master_key | string | The master key for the proxy [Set up Virtual Keys](virtual_keys) |
| database_url | string | The URL for the database connection [Set up Virtual Keys](virtual_keys) |
Expand Down
29 changes: 26 additions & 3 deletions litellm/integrations/anthropic_cache_control_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,31 @@ def _process_message_injection(

targetted_role = point.get("role", None)

# Case 1: Target by specific index
if targetted_index is not None:
# Case 1: Target by role + index (e.g., index=-1 among assistant messages)
if targetted_index is not None and targetted_role is not None:
role_indices = [
i
for i, msg in enumerate(messages)
if msg.get("role") == targetted_role
]
if role_indices:
try:
# Negative indices handled by Python's native list indexing (e.g., -1 = last)
actual_idx = role_indices[targetted_index]
except IndexError:
verbose_logger.warning(
f"AnthropicCacheControlHook: Index {targetted_index} is out of bounds "
f"for {len(role_indices)} messages with role '{targetted_role}'. "
f"Skipping cache control injection for this point."
)
else:
messages[actual_idx] = (
AnthropicCacheControlHook._safe_insert_cache_control_in_message(
messages[actual_idx], control
)
)
# Case 2: Target by index only
elif targetted_index is not None:
original_index = targetted_index
# Handle negative indices (convert to positive)
if targetted_index < 0:
Expand All @@ -116,7 +139,7 @@ def _process_message_injection(
f"AnthropicCacheControlHook: Provided index {original_index} is out of bounds for message list of length {len(messages)}. "
f"Targeted index was {targetted_index}. Skipping cache control injection for this point."
)
# Case 2: Target by role
# Case 3: Target by role only
elif targetted_role is not None:
for msg in messages:
if msg.get("role") == targetted_role:
Expand Down
29 changes: 22 additions & 7 deletions litellm/integrations/arize/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,13 +236,28 @@ def _set_usage_outputs(span: "Span", response_obj, span_attrs):
prompt_tokens = usage.get("prompt_tokens") or usage.get("input_tokens")
if prompt_tokens:
safe_set_attribute(span, span_attrs.LLM_TOKEN_COUNT_PROMPT, prompt_tokens)
reasoning_tokens = usage.get("output_tokens_details", {}).get("reasoning_tokens")
if reasoning_tokens:
safe_set_attribute(
span,
span_attrs.LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING,
reasoning_tokens,
)
completion_tokens_details = usage.get("completion_tokens_details") or usage.get(
"output_tokens_details"
)
if completion_tokens_details is not None:
reasoning_tokens = getattr(completion_tokens_details, "reasoning_tokens", None)
Comment on lines +239 to +243
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 or short-circuits on falsy objects, may skip valid completion_tokens_details

completion_tokens_details = usage.get("completion_tokens_details") or usage.get(
    "output_tokens_details"
)

If completion_tokens_details is present but evaluates as falsy (e.g. a CompletionTokensDetailsWrapper() instance with all-zero fields where __bool__ returns False, or an empty dict {}), Python's or will silently fall through to output_tokens_details. This is unlikely in practice today, but a safer pattern would be an explicit None check:

Suggested change
completion_tokens_details = usage.get("completion_tokens_details") or usage.get(
"output_tokens_details"
)
if completion_tokens_details is not None:
reasoning_tokens = getattr(completion_tokens_details, "reasoning_tokens", None)
completion_tokens_details = usage.get("completion_tokens_details")
if completion_tokens_details is None:
completion_tokens_details = usage.get("output_tokens_details")

The same applies to prompt_tokens_details / input_tokens_details a few lines below.

if reasoning_tokens:
safe_set_attribute(
span,
span_attrs.LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING,
reasoning_tokens,
)
prompt_tokens_details = usage.get("prompt_tokens_details") or usage.get(
"input_tokens_details"
)
if prompt_tokens_details is not None:
cached_tokens = getattr(prompt_tokens_details, "cached_tokens", None)
if cached_tokens:
safe_set_attribute(
span,
span_attrs.LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ,
cached_tokens,
)


def _infer_open_inference_span_kind(call_type: Optional[str]) -> str:
Expand Down
73 changes: 42 additions & 31 deletions litellm/litellm_core_utils/prompt_templates/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,17 @@ def prompt_injection_detection_default_pt():
) # similar to autogen. Only used if `litellm.modify_params=True`.


def _get_content_as_str(content: Union[str, list, None]) -> str:
"""Extract text from content that may be a string, a list of content blocks, or None."""
if content is None:
return ""
if isinstance(content, str):
return content
if isinstance(content, list):
return convert_content_list_to_str({"role": "user", "content": content})
return ""


def map_system_message_pt(messages: list) -> list:
"""
Convert 'system' message to 'user' message if provider doesn't support 'system' role.
Expand All @@ -100,20 +111,24 @@ def map_system_message_pt(messages: list) -> list:
new_messages = []
for i, m in enumerate(messages):
if m["role"] == "system":
system_text = _get_content_as_str(m["content"])
if i < len(messages) - 1: # Not the last message
next_m = messages[i + 1]
next_role = next_m["role"]
if (
next_role == "user" or next_role == "assistant"
): # Next message is a user or assistant message
# Merge system prompt into the next message
next_m["content"] = m["content"] + " " + next_m["content"]
# Copy to avoid mutating the caller's original dict
next_m = messages[i + 1] = {**next_m}
next_text = _get_content_as_str(next_m["content"])
next_m["content"] = " ".join(filter(None, [system_text, next_text]))
elif next_role == "system": # Next message is a system message
# Append a user message instead of the system message
new_message = {"role": "user", "content": m["content"]}
new_message = {"role": "user", "content": system_text}
new_messages.append(new_message)
else: # Last message
new_message = {"role": "user", "content": m["content"]}
new_message = {"role": "user", "content": system_text}
new_messages.append(new_message)
else: # Not a system message
new_messages.append(m)
Expand Down Expand Up @@ -1393,10 +1408,10 @@ def convert_to_gemini_tool_call_invoke(
if tool_calls is not None:
for idx, tool in enumerate(tool_calls):
if "function" in tool:
gemini_function_call: Optional[
VertexFunctionCall
] = _gemini_tool_call_invoke_helper(
function_call_params=tool["function"]
gemini_function_call: Optional[VertexFunctionCall] = (
_gemini_tool_call_invoke_helper(
function_call_params=tool["function"]
)
)
if gemini_function_call is not None:
part_dict: VertexPartType = {
Expand Down Expand Up @@ -1540,9 +1555,7 @@ def convert_to_gemini_tool_call_result( # noqa: PLR0915
file_data = (
file_content.get("file_data", "")
if isinstance(file_content, dict)
else file_content
if isinstance(file_content, str)
else ""
else file_content if isinstance(file_content, str) else ""
)

if file_data:
Expand Down Expand Up @@ -2046,9 +2059,9 @@ def _sanitize_empty_text_content(
if isinstance(content, str):
if not content or not content.strip():
message = cast(AllMessageValues, dict(message)) # Make a copy
message[
"content"
] = "[System: Empty message content sanitised to satisfy protocol]"
message["content"] = (
"[System: Empty message content sanitised to satisfy protocol]"
)
verbose_logger.debug(
f"_sanitize_empty_text_content: Replaced empty text content in {message.get('role')} message"
)
Expand Down Expand Up @@ -2388,9 +2401,9 @@ def anthropic_messages_pt( # noqa: PLR0915
# Convert ChatCompletionImageUrlObject to dict if needed
image_url_value = m["image_url"]
if isinstance(image_url_value, str):
image_url_input: Union[
str, dict[str, Any]
] = image_url_value
image_url_input: Union[str, dict[str, Any]] = (
image_url_value
)
else:
# ChatCompletionImageUrlObject or dict case - convert to dict
image_url_input = {
Expand All @@ -2417,9 +2430,9 @@ def anthropic_messages_pt( # noqa: PLR0915
)

if "cache_control" in _content_element:
_anthropic_content_element[
"cache_control"
] = _content_element["cache_control"]
_anthropic_content_element["cache_control"] = (
_content_element["cache_control"]
)
user_content.append(_anthropic_content_element)
elif m.get("type", "") == "text":
m = cast(ChatCompletionTextObject, m)
Expand Down Expand Up @@ -2479,9 +2492,9 @@ def anthropic_messages_pt( # noqa: PLR0915
)

if "cache_control" in _content_element:
_anthropic_content_text_element[
"cache_control"
] = _content_element["cache_control"]
_anthropic_content_text_element["cache_control"] = (
_content_element["cache_control"]
)

user_content.append(_anthropic_content_text_element)

Expand Down Expand Up @@ -2614,9 +2627,9 @@ def anthropic_messages_pt( # noqa: PLR0915
original_content_element=dict(assistant_content_block),
)
if "cache_control" in _content_element:
_anthropic_text_content_element[
"cache_control"
] = _content_element["cache_control"]
_anthropic_text_content_element["cache_control"] = (
_content_element["cache_control"]
)
text_element = _anthropic_text_content_element

# Interleave: each thinking block precedes its server tool group.
Expand Down Expand Up @@ -2776,9 +2789,9 @@ def anthropic_messages_pt( # noqa: PLR0915
)

if "cache_control" in _content_element:
_anthropic_text_content_element[
"cache_control"
] = _content_element["cache_control"]
_anthropic_text_content_element["cache_control"] = (
_content_element["cache_control"]
)

assistant_content.append(_anthropic_text_content_element)

Expand Down Expand Up @@ -5220,9 +5233,7 @@ def default_response_schema_prompt(response_schema: dict) -> str:
prompt_str = """Use this JSON schema:
```json
{}
```""".format(
response_schema
)
```""".format(response_schema)
return prompt_str


Expand Down
17 changes: 13 additions & 4 deletions litellm/llms/anthropic/chat/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import litellm.litellm_core_utils
import litellm.types
import litellm.types.utils
from litellm._logging import verbose_logger
from litellm.anthropic_beta_headers_manager import (
update_request_with_filtered_beta,
)
Expand Down Expand Up @@ -1245,7 +1246,15 @@ def convert_str_chunk_to_generic_chunk(self, chunk: str) -> ModelResponseStream:
str_line = str_line[index:]

if str_line.startswith("data:"):
data_json = json.loads(str_line[5:])
return self.chunk_parser(chunk=data_json)
else:
return ModelResponseStream(id=self.response_id)
chunk_str = str_line[5:].strip()
# Models like Deepseek might return "data: [DONE]" here which is not a
# valid JSON input. We can just ignore these chunks.
try:
data_json = json.loads(chunk_str)
return self.chunk_parser(chunk=data_json)
except json.JSONDecodeError:
verbose_logger.debug(
f"Non-JSON SSE chunk received, ignoring: {chunk_str!r}"
)

return ModelResponseStream(id=self.response_id)
7 changes: 6 additions & 1 deletion litellm/llms/minimax/chat/transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,14 @@ class MinimaxChatConfig(OpenAIGPTConfig):
- International: https://api.minimax.io/v1
- China: https://api.minimaxi.com/v1

Note: MiniMax's Claude-compatible `/anthropic/v1/messages` support is implemented
separately in `litellm/llms/minimax/messages/transformation.py`.

Supported models:
- MiniMax-M2.1
- MiniMax-M2.1-lightning
- MiniMax-M2.1-highspeed
- MiniMax-M2.5
- MiniMax-M2.5-highspeed
- MiniMax-M2
"""

Expand Down
9 changes: 7 additions & 2 deletions litellm/llms/minimax/messages/transformation.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""
MiniMax Anthropic transformation config - extends AnthropicConfig for MiniMax's Anthropic-compatible API
MiniMax Anthropic-compatible Messages API transformation config.

MiniMax exposes Claude-compatible `/anthropic/v1/messages` endpoints separately from
its OpenAI-compatible `/v1/chat/completions` endpoint.
"""
from typing import Optional

Expand All @@ -19,7 +22,9 @@ class MinimaxMessagesConfig(AnthropicMessagesConfig):

Supported models:
- MiniMax-M2.1
- MiniMax-M2.1-lightning
- MiniMax-M2.1-highspeed
- MiniMax-M2.5
- MiniMax-M2.5-highspeed
- MiniMax-M2
"""

Expand Down
14 changes: 14 additions & 0 deletions litellm/model_prices_and_context_window_backup.json
Original file line number Diff line number Diff line change
Expand Up @@ -32542,6 +32542,20 @@
"supports_vision": true,
"supports_web_search": true
},
"zai.glm-5": {
"input_cost_per_token": 1e-06,
"litellm_provider": "bedrock_converse",
"max_input_tokens": 200000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "chat",
"output_cost_per_token": 3.2e-06,
"supports_function_calling": true,
"supports_reasoning": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"source": "https://aws.amazon.com/bedrock/pricing/"
},
"zai.glm-4.7": {
"input_cost_per_token": 6e-07,
"litellm_provider": "bedrock_converse",
Expand Down
16 changes: 16 additions & 0 deletions litellm/proxy/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -2236,6 +2236,22 @@ class ConfigGeneralSettings(LiteLLMPydanticObjectBase):
allowed_routes: Optional[List] = Field(
None, description="Proxy API Endpoints you want users to be able to access"
)
cors_allow_origins: Optional[Union[str, List[str]]] = Field(
None,
description='CORS allowlist origins for the proxy. Defaults to `["*"]` when unset. Set this to `[]` to disable CORS for all origins, or provide explicit origins to restrict access. Existing `LITELLM_CORS_*` env vars take precedence over config values. Restart the proxy after changing any CORS setting.',
)
cors_allow_credentials: Optional[bool] = Field(
None,
description="Allow CORS credentials. Defaults to False when cors_allow_origins is explicitly configured and this setting is unset. Otherwise it preserves the proxy's existing default behavior. Wildcard origins or patterns disable credentials.",
)
cors_allow_methods: Optional[Union[str, List[str]]] = Field(
None,
description='CORS allowlist methods for the proxy. Defaults to `"*"` when unset.',
)
cors_allow_headers: Optional[Union[str, List[str]]] = Field(
None,
description='CORS allowlist headers for the proxy. Defaults to `"*"` when unset.',
)
reject_clientside_metadata_tags: Optional[bool] = Field(
None,
description="When set to True, rejects requests that contain client-side 'metadata.tags' to prevent users from influencing budgets by sending different tags. Tags can only be inherited from the API key metadata.",
Expand Down
Loading
Loading