Skip to content

Commit 63c26d7

Browse files
Merge branch 'litellm_contributor_prs_09_18_2025_p2' into fix/issue-14685-bedrock-titan-v2-encoding-format
2 parents 978cd80 + 114d077 commit 63c26d7

27 files changed

+877
-364
lines changed

.circleci/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1458,6 +1458,7 @@ jobs:
14581458
# - run: python ./tests/documentation_tests/test_general_setting_keys.py
14591459
- run: python ./tests/code_coverage_tests/check_licenses.py
14601460
- run: python ./tests/code_coverage_tests/router_code_coverage.py
1461+
- run: python ./tests/code_coverage_tests/test_chat_completion_imports.py
14611462
- run: python ./tests/code_coverage_tests/info_log_check.py
14621463
- run: python ./tests/code_coverage_tests/test_ban_set_verbose.py
14631464
- run: python ./tests/code_coverage_tests/code_qa_check_tests.py

docs/my-website/docs/providers/bedrock.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1821,6 +1821,7 @@ Here's an example of using a bedrock model with LiteLLM. For a complete list, re
18211821
| Mistral 7B Instruct | `completion(model='bedrock/mistral.mistral-7b-instruct-v0:2', messages=messages)` | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
18221822
| Mixtral 8x7B Instruct | `completion(model='bedrock/mistral.mixtral-8x7b-instruct-v0:1', messages=messages)` | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
18231823

1824+
18241825
## Bedrock Embedding
18251826

18261827
### API keys
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
## Bedrock Embedding
2+
3+
## Supported Embedding Models
4+
5+
| Provider | LiteLLM Route | AWS Documentation |
6+
|----------|---------------|-------------------|
7+
| Amazon Titan | `bedrock/amazon.*` | [Amazon Titan Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) |
8+
| Cohere | `bedrock/cohere.*` | [Cohere Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-embed.html) |
9+
| TwelveLabs | `bedrock/us.twelvelabs.*` | [TwelveLabs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-twelvelabs.html) |
10+
11+
### API keys
12+
This can be set as env variables or passed as **params to litellm.embedding()**
13+
```python
14+
import os
15+
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
16+
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
17+
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
18+
```
19+
20+
## Usage
21+
### LiteLLM Python SDK
22+
```python
23+
from litellm import embedding
24+
response = embedding(
25+
model="bedrock/amazon.titan-embed-text-v1",
26+
input=["good morning from litellm"],
27+
)
28+
print(response)
29+
```
30+
31+
### LiteLLM Proxy Server
32+
33+
#### 1. Setup config.yaml
34+
```yaml
35+
model_list:
36+
- model_name: titan-embed-v1
37+
litellm_params:
38+
model: bedrock/amazon.titan-embed-text-v1
39+
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
40+
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
41+
aws_region_name: us-east-1
42+
- model_name: titan-embed-v2
43+
litellm_params:
44+
model: bedrock/amazon.titan-embed-text-v2:0
45+
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
46+
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
47+
aws_region_name: us-east-1
48+
```
49+
50+
#### 2. Start Proxy
51+
```bash
52+
litellm --config /path/to/config.yaml
53+
```
54+
55+
#### 3. Use with OpenAI Python SDK
56+
```python
57+
import openai
58+
client = openai.OpenAI(
59+
api_key="anything",
60+
base_url="http://0.0.0.0:4000"
61+
)
62+
63+
response = client.embeddings.create(
64+
input=["good morning from litellm"],
65+
model="titan-embed-v1"
66+
)
67+
print(response)
68+
```
69+
70+
#### 4. Use with LiteLLM Python SDK
71+
```python
72+
import litellm
73+
response = litellm.embedding(
74+
model="titan-embed-v1", # model alias from config.yaml
75+
input=["good morning from litellm"],
76+
api_base="http://0.0.0.0:4000",
77+
api_key="anything"
78+
)
79+
print(response)
80+
```
81+
82+
## Supported AWS Bedrock Embedding Models
83+
84+
| Model Name | Usage | Supported Additional OpenAI params |
85+
|----------------------|---------------------------------------------|-----|
86+
| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_v2_transformation.py#L59) |
87+
| Titan Embeddings - V1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_g1_transformation.py#L53)
88+
| Titan Multimodal Embeddings | `embedding(model="bedrock/amazon.titan-embed-image-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py#L28) |
89+
| TwelveLabs Marengo Embed 2.7 | `embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input)` | Supports multimodal input (text, video, audio, image) |
90+
| Cohere Embeddings - English | `embedding(model="bedrock/cohere.embed-english-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
91+
| Cohere Embeddings - Multilingual | `embedding(model="bedrock/cohere.embed-multilingual-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
92+
93+
### Advanced - [Drop Unsupported Params](https://docs.litellm.ai/docs/completion/drop_params#openai-proxy-usage)
94+
95+
### Advanced - [Pass model/provider-specific Params](https://docs.litellm.ai/docs/completion/provider_specific_params#proxy-usage)

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,7 @@ const sidebars = {
411411
label: "Bedrock",
412412
items: [
413413
"providers/bedrock",
414+
"providers/bedrock_embedding",
414415
"providers/bedrock_agents",
415416
"providers/bedrock_batches",
416417
"providers/bedrock_vector_store",

litellm/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
bedrock_embedding_models,
6868
known_tokenizer_config,
6969
BEDROCK_INVOKE_PROVIDERS_LITERAL,
70+
BEDROCK_EMBEDDING_PROVIDERS_LITERAL,
7071
BEDROCK_CONVERSE_MODELS,
7172
DEFAULT_MAX_TOKENS,
7273
DEFAULT_SOFT_BUDGET,

litellm/constants.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -769,6 +769,12 @@
769769
"deepseek_r1",
770770
]
771771

772+
BEDROCK_EMBEDDING_PROVIDERS_LITERAL = Literal[
773+
"cohere",
774+
"amazon",
775+
"twelvelabs",
776+
]
777+
772778
BEDROCK_CONVERSE_MODELS = [
773779
"openai.gpt-oss-20b-1:0",
774780
"openai.gpt-oss-120b-1:0",
@@ -822,6 +828,7 @@
822828
"amazon.titan-embed-text-v1",
823829
"cohere.embed-english-v3",
824830
"cohere.embed-multilingual-v3",
831+
"twelvelabs.marengo-embed-2-7-v1:0",
825832
]
826833
)
827834

@@ -1065,4 +1072,6 @@
10651072
]
10661073

10671074
# CoroutineChecker cache configuration
1068-
COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000))
1075+
COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(
1076+
os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000)
1077+
)

litellm/integrations/datadog/datadog_llm_obs.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,7 @@ def _get_dd_llm_obs_payload_metadata(
498498
"guardrail_information": standard_logging_payload.get(
499499
"guardrail_information", None
500500
),
501+
"is_streamed_request": self._get_stream_value_from_payload(standard_logging_payload),
501502
}
502503

503504
#########################################################
@@ -561,6 +562,31 @@ def _get_latency_metrics(
561562

562563
return latency_metrics
563564

565+
def _get_stream_value_from_payload(self, standard_logging_payload: StandardLoggingPayload) -> bool:
566+
"""
567+
Extract the stream value from standard logging payload.
568+
569+
The stream field in StandardLoggingPayload is only set to True for completed streaming responses.
570+
For non-streaming requests, it's None. The original stream parameter is in model_parameters.
571+
572+
Returns:
573+
bool: True if this was a streaming request, False otherwise
574+
"""
575+
# Check top-level stream field first (only True for completed streaming)
576+
stream_value = standard_logging_payload.get("stream")
577+
if stream_value is True:
578+
return True
579+
580+
# Fallback to model_parameters.stream for original request parameters
581+
model_params = standard_logging_payload.get("model_parameters", {})
582+
if isinstance(model_params, dict):
583+
stream_value = model_params.get("stream")
584+
if stream_value is True:
585+
return True
586+
587+
# Default to False for non-streaming requests
588+
return False
589+
564590
def _get_spend_metrics(
565591
self, standard_logging_payload: StandardLoggingPayload
566592
) -> DDLLMObsSpendMetrics:
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
"""
2+
Cached imports module for LiteLLM.
3+
4+
This module provides cached import functionality to avoid repeated imports
5+
inside functions that are critical to performance.
6+
"""
7+
8+
from typing import TYPE_CHECKING, Callable, Optional, Type
9+
10+
# Type annotations for cached imports
11+
if TYPE_CHECKING:
12+
from litellm.litellm_core_utils.litellm_logging import Logging
13+
from litellm.litellm_core_utils.coroutine_checker import CoroutineChecker
14+
15+
# Global cache variables
16+
_LiteLLMLogging: Optional[Type["Logging"]] = None
17+
_coroutine_checker: Optional["CoroutineChecker"] = None
18+
_set_callbacks: Optional[Callable] = None
19+
20+
21+
def get_litellm_logging_class() -> Type["Logging"]:
22+
"""Get the cached LiteLLM Logging class, initializing if needed."""
23+
global _LiteLLMLogging
24+
if _LiteLLMLogging is not None:
25+
return _LiteLLMLogging
26+
from litellm.litellm_core_utils.litellm_logging import Logging
27+
_LiteLLMLogging = Logging
28+
return _LiteLLMLogging
29+
30+
31+
def get_coroutine_checker() -> "CoroutineChecker":
32+
"""Get the cached coroutine checker instance, initializing if needed."""
33+
global _coroutine_checker
34+
if _coroutine_checker is not None:
35+
return _coroutine_checker
36+
from litellm.litellm_core_utils.coroutine_checker import coroutine_checker
37+
_coroutine_checker = coroutine_checker
38+
return _coroutine_checker
39+
40+
41+
def get_set_callbacks() -> Callable:
42+
"""Get the cached set_callbacks function, initializing if needed."""
43+
global _set_callbacks
44+
if _set_callbacks is not None:
45+
return _set_callbacks
46+
from litellm.litellm_core_utils.litellm_logging import set_callbacks
47+
_set_callbacks = set_callbacks
48+
return _set_callbacks
49+
50+
51+
def clear_cached_imports() -> None:
52+
"""Clear all cached imports. Useful for testing or memory management."""
53+
global _LiteLLMLogging, _coroutine_checker, _set_callbacks
54+
_LiteLLMLogging = None
55+
_coroutine_checker = None
56+
_set_callbacks = None

litellm/litellm_core_utils/exception_mapping_utils.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -556,7 +556,7 @@ def exception_type( # type: ignore # noqa: PLR0915
556556
model=model,
557557
llm_provider="anthropic",
558558
)
559-
elif "overloaded_error" in error_str:
559+
elif "overloaded_error" in error_str or "Overloaded" in error_str:
560560
exception_mapping_worked = True
561561
raise InternalServerError(
562562
message="AnthropicError - {}".format(error_str),
@@ -1449,6 +1449,14 @@ def exception_type( # type: ignore # noqa: PLR0915
14491449
model=model,
14501450
response=getattr(original_exception, "response", None),
14511451
)
1452+
elif "invalid type: parameter" in error_str:
1453+
exception_mapping_worked = True
1454+
raise BadRequestError(
1455+
message=f"CohereException - {original_exception.message}",
1456+
llm_provider="cohere",
1457+
model=model,
1458+
response=getattr(original_exception, "response", None),
1459+
)
14521460
elif "too many tokens" in error_str:
14531461
exception_mapping_worked = True
14541462
raise ContextWindowExceededError(

litellm/litellm_core_utils/prompt_templates/factory.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3079,7 +3079,6 @@ def _initial_message_setup(
30793079
messages.append(DEFAULT_USER_CONTINUE_MESSAGE)
30803080
return messages
30813081

3082-
30833082
@staticmethod
30843083
async def _bedrock_converse_messages_pt_async( # noqa: PLR0915
30853084
messages: List,
@@ -3124,9 +3123,9 @@ async def _bedrock_converse_messages_pt_async( # noqa: PLR0915
31243123
_part = BedrockContentBlock(text=element["text"])
31253124
_parts.append(_part)
31263125
elif element["type"] == "guarded_text":
3127-
# Wrap guarded_text in guardrailConverseContent block
3126+
# Wrap guarded_text in guardContent block
31283127
_part = BedrockContentBlock(
3129-
guardrailConverseContent={"text": element["text"]}
3128+
guardContent={"text": {"text": element["text"]}}
31303129
)
31313130
_parts.append(_part)
31323131
elif element["type"] == "image_url":
@@ -3171,7 +3170,6 @@ async def _bedrock_converse_messages_pt_async( # noqa: PLR0915
31713170

31723171
msg_i += 1
31733172
if user_content:
3174-
31753173
if len(contents) > 0 and contents[-1]["role"] == "user":
31763174
if (
31773175
assistant_continue_message is not None
@@ -3506,9 +3504,9 @@ def _bedrock_converse_messages_pt( # noqa: PLR0915
35063504
_part = BedrockContentBlock(text=element["text"])
35073505
_parts.append(_part)
35083506
elif element["type"] == "guarded_text":
3509-
# Wrap guarded_text in guardrailConverseContent block
3507+
# Wrap guarded_text in guardContent block
35103508
_part = BedrockContentBlock(
3511-
guardrailConverseContent={"text": element["text"]}
3509+
guardContent={"text": {"text": element["text"]}}
35123510
)
35133511
_parts.append(_part)
35143512
elif element["type"] == "image_url":
@@ -3554,7 +3552,6 @@ def _bedrock_converse_messages_pt( # noqa: PLR0915
35543552

35553553
msg_i += 1
35563554
if user_content:
3557-
35583555
if len(contents) > 0 and contents[-1]["role"] == "user":
35593556
if (
35603557
assistant_continue_message is not None

0 commit comments

Comments
 (0)