BerriAI
diff --git a/‎.circleci/config.yml‎
Lines changed: 1 addition & 0 deletions b/‎.circleci/config.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/my-website/docs/providers/bedrock.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/my-website/docs/providers/bedrock.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/my-website/docs/providers/bedrock_embedding.md‎
Lines changed: 95 additions & 0 deletions b/‎docs/my-website/docs/providers/bedrock_embedding.md‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎docs/my-website/sidebars.js‎
Lines changed: 1 addition & 0 deletions b/‎docs/my-website/sidebars.js‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎litellm/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎litellm/__init__.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎litellm/constants.py‎
Lines changed: 10 additions & 1 deletion b/‎litellm/constants.py‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎litellm/integrations/datadog/datadog_llm_obs.py‎
Lines changed: 26 additions & 0 deletions b/‎litellm/integrations/datadog/datadog_llm_obs.py‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎litellm/litellm_core_utils/cached_imports.py‎
Lines changed: 56 additions & 0 deletions b/‎litellm/litellm_core_utils/cached_imports.py‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎litellm/litellm_core_utils/exception_mapping_utils.py‎
Lines changed: 9 additions & 1 deletion b/‎litellm/litellm_core_utils/exception_mapping_utils.py‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎litellm/litellm_core_utils/prompt_templates/factory.py‎
Lines changed: 4 additions & 7 deletions b/‎litellm/litellm_core_utils/prompt_templates/factory.py‎
Lines changed: 4 additions & 7 deletions
@@ -1458,6 +1458,7 @@ jobs:
       # - run: python ./tests/documentation_tests/test_general_setting_keys.py
       - run: python ./tests/code_coverage_tests/check_licenses.py
       - run: python ./tests/code_coverage_tests/router_code_coverage.py
+      - run: python ./tests/code_coverage_tests/test_chat_completion_imports.py
       - run: python ./tests/code_coverage_tests/info_log_check.py
       - run: python ./tests/code_coverage_tests/test_ban_set_verbose.py
       - run: python ./tests/code_coverage_tests/code_qa_check_tests.py
 
@@ -1821,6 +1821,7 @@ Here's an example of using a bedrock model with LiteLLM. For a complete list, re
 | Mistral 7B Instruct        | `completion(model='bedrock/mistral.mistral-7b-instruct-v0:2', messages=messages)`   | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
 | Mixtral 8x7B Instruct      | `completion(model='bedrock/mistral.mixtral-8x7b-instruct-v0:1', messages=messages)`   | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
 
+
 ## Bedrock Embedding
 
 ### API keys
 
@@ -0,0 +1,95 @@
+## Bedrock Embedding
+
+## Supported Embedding Models
+
+| Provider | LiteLLM Route | AWS Documentation |
+|----------|---------------|-------------------|
+| Amazon Titan | `bedrock/amazon.*` | [Amazon Titan Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) |
+| Cohere | `bedrock/cohere.*` | [Cohere Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-embed.html) |
+| TwelveLabs | `bedrock/us.twelvelabs.*` | [TwelveLabs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-twelvelabs.html) |
+
+### API keys
+This can be set as env variables or passed as **params to litellm.embedding()**
+```python
+import os
+os.environ["AWS_ACCESS_KEY_ID"] = ""        # Access key
+os.environ["AWS_SECRET_ACCESS_KEY"] = ""    # Secret access key
+os.environ["AWS_REGION_NAME"] = ""           # us-east-1, us-east-2, us-west-1, us-west-2
+```
+
+## Usage
+### LiteLLM Python SDK
+```python
+from litellm import embedding
+response = embedding(
+    model="bedrock/amazon.titan-embed-text-v1",
+    input=["good morning from litellm"],
+)
+print(response)
+```
+
+### LiteLLM Proxy Server
+
+#### 1. Setup config.yaml
+```yaml
+model_list:
+  - model_name: titan-embed-v1
+    litellm_params:
+      model: bedrock/amazon.titan-embed-text-v1
+      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
+      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
+      aws_region_name: us-east-1
+  - model_name: titan-embed-v2
+    litellm_params:
+      model: bedrock/amazon.titan-embed-text-v2:0
+      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
+      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
+      aws_region_name: us-east-1
+```
+
+#### 2. Start Proxy 
+```bash
+litellm --config /path/to/config.yaml
+```
+
+#### 3. Use with OpenAI Python SDK
+```python
+import openai
+client = openai.OpenAI(
+    api_key="anything",
+    base_url="http://0.0.0.0:4000"
+)
+
+response = client.embeddings.create(
+    input=["good morning from litellm"],
+    model="titan-embed-v1"
+)
+print(response)
+```
+
+#### 4. Use with LiteLLM Python SDK
+```python
+import litellm
+response = litellm.embedding(
+    model="titan-embed-v1", # model alias from config.yaml
+    input=["good morning from litellm"],
+    api_base="http://0.0.0.0:4000",
+    api_key="anything"
+)
+print(response)
+```
+
+## Supported AWS Bedrock Embedding Models
+
+| Model Name           | Usage                               | Supported Additional OpenAI params |
+|----------------------|---------------------------------------------|-----|
+| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_v2_transformation.py#L59) |
+| Titan Embeddings - V1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_g1_transformation.py#L53)
+| Titan Multimodal Embeddings | `embedding(model="bedrock/amazon.titan-embed-image-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py#L28) |
+| TwelveLabs Marengo Embed 2.7 | `embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input)` | Supports multimodal input (text, video, audio, image) |
+| Cohere Embeddings - English | `embedding(model="bedrock/cohere.embed-english-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
+| Cohere Embeddings - Multilingual | `embedding(model="bedrock/cohere.embed-multilingual-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
+
+### Advanced - [Drop Unsupported Params](https://docs.litellm.ai/docs/completion/drop_params#openai-proxy-usage)
+
+### Advanced - [Pass model/provider-specific Params](https://docs.litellm.ai/docs/completion/provider_specific_params#proxy-usage)
@@ -411,6 +411,7 @@ const sidebars = {
           label: "Bedrock",
           items: [
             "providers/bedrock",
+            "providers/bedrock_embedding",
             "providers/bedrock_agents",
             "providers/bedrock_batches",
             "providers/bedrock_vector_store",
 
@@ -67,6 +67,7 @@
     bedrock_embedding_models,
     known_tokenizer_config,
     BEDROCK_INVOKE_PROVIDERS_LITERAL,
+    BEDROCK_EMBEDDING_PROVIDERS_LITERAL,
     BEDROCK_CONVERSE_MODELS,
     DEFAULT_MAX_TOKENS,
     DEFAULT_SOFT_BUDGET,
 
@@ -769,6 +769,12 @@
     "deepseek_r1",
 ]
 
+BEDROCK_EMBEDDING_PROVIDERS_LITERAL = Literal[
+    "cohere",
+    "amazon",
+    "twelvelabs",
+]
+
 BEDROCK_CONVERSE_MODELS = [
     "openai.gpt-oss-20b-1:0",
     "openai.gpt-oss-120b-1:0",
@@ -822,6 +828,7 @@
         "amazon.titan-embed-text-v1",
         "cohere.embed-english-v3",
         "cohere.embed-multilingual-v3",
+        "twelvelabs.marengo-embed-2-7-v1:0",
     ]
 )
 
@@ -1065,4 +1072,6 @@
 ]
 
 # CoroutineChecker cache configuration
-COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000))
+COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(
+    os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000)
+)
@@ -498,6 +498,7 @@ def _get_dd_llm_obs_payload_metadata(
             "guardrail_information": standard_logging_payload.get(
                 "guardrail_information", None
             ),
+            "is_streamed_request": self._get_stream_value_from_payload(standard_logging_payload),
         }
 
         #########################################################
@@ -561,6 +562,31 @@ def _get_latency_metrics(
 
         return latency_metrics
 
+    def _get_stream_value_from_payload(self, standard_logging_payload: StandardLoggingPayload) -> bool:
+        """
+        Extract the stream value from standard logging payload.
+
+        The stream field in StandardLoggingPayload is only set to True for completed streaming responses.
+        For non-streaming requests, it's None. The original stream parameter is in model_parameters.
+
+        Returns:
+            bool: True if this was a streaming request, False otherwise
+        """
+        # Check top-level stream field first (only True for completed streaming)
+        stream_value = standard_logging_payload.get("stream")
+        if stream_value is True:
+            return True
+
+        # Fallback to model_parameters.stream for original request parameters
+        model_params = standard_logging_payload.get("model_parameters", {})
+        if isinstance(model_params, dict):
+            stream_value = model_params.get("stream")
+            if stream_value is True:
+                return True
+
+        # Default to False for non-streaming requests
+        return False
+
     def _get_spend_metrics(
         self, standard_logging_payload: StandardLoggingPayload
     ) -> DDLLMObsSpendMetrics:
 
@@ -0,0 +1,56 @@
+"""
+Cached imports module for LiteLLM.
+
+This module provides cached import functionality to avoid repeated imports
+inside functions that are critical to performance.
+"""
+
+from typing import TYPE_CHECKING, Callable, Optional, Type
+
+# Type annotations for cached imports
+if TYPE_CHECKING:
+    from litellm.litellm_core_utils.litellm_logging import Logging
+    from litellm.litellm_core_utils.coroutine_checker import CoroutineChecker
+
+# Global cache variables
+_LiteLLMLogging: Optional[Type["Logging"]] = None
+_coroutine_checker: Optional["CoroutineChecker"] = None
+_set_callbacks: Optional[Callable] = None
+
+
+def get_litellm_logging_class() -> Type["Logging"]:
+    """Get the cached LiteLLM Logging class, initializing if needed."""
+    global _LiteLLMLogging
+    if _LiteLLMLogging is not None:
+        return _LiteLLMLogging
+    from litellm.litellm_core_utils.litellm_logging import Logging
+    _LiteLLMLogging = Logging
+    return _LiteLLMLogging
+
+
+def get_coroutine_checker() -> "CoroutineChecker":
+    """Get the cached coroutine checker instance, initializing if needed."""
+    global _coroutine_checker
+    if _coroutine_checker is not None:
+        return _coroutine_checker
+    from litellm.litellm_core_utils.coroutine_checker import coroutine_checker
+    _coroutine_checker = coroutine_checker
+    return _coroutine_checker
+
+
+def get_set_callbacks() -> Callable:
+    """Get the cached set_callbacks function, initializing if needed."""
+    global _set_callbacks
+    if _set_callbacks is not None:
+        return _set_callbacks
+    from litellm.litellm_core_utils.litellm_logging import set_callbacks
+    _set_callbacks = set_callbacks
+    return _set_callbacks
+
+
+def clear_cached_imports() -> None:
+    """Clear all cached imports. Useful for testing or memory management."""
+    global _LiteLLMLogging, _coroutine_checker, _set_callbacks
+    _LiteLLMLogging = None
+    _coroutine_checker = None
+    _set_callbacks = None
@@ -556,7 +556,7 @@ def exception_type(  # type: ignore  # noqa: PLR0915
                         model=model,
                         llm_provider="anthropic",
                     )
-                elif "overloaded_error" in error_str:
+                elif "overloaded_error" in error_str or "Overloaded" in error_str:
                     exception_mapping_worked = True
                     raise InternalServerError(
                         message="AnthropicError - {}".format(error_str),
@@ -1449,6 +1449,14 @@ def exception_type(  # type: ignore  # noqa: PLR0915
                         model=model,
                         response=getattr(original_exception, "response", None),
                     )
+                elif "invalid type: parameter" in error_str:
+                    exception_mapping_worked = True
+                    raise BadRequestError(
+                        message=f"CohereException - {original_exception.message}",
+                        llm_provider="cohere",
+                        model=model,
+                        response=getattr(original_exception, "response", None),
+                    )
                 elif "too many tokens" in error_str:
                     exception_mapping_worked = True
                     raise ContextWindowExceededError(
 
@@ -3079,7 +3079,6 @@ def _initial_message_setup(
                 messages.append(DEFAULT_USER_CONTINUE_MESSAGE)
         return messages
 
-
     @staticmethod
     async def _bedrock_converse_messages_pt_async(  # noqa: PLR0915
         messages: List,
@@ -3124,9 +3123,9 @@ async def _bedrock_converse_messages_pt_async(  # noqa: PLR0915
                                 _part = BedrockContentBlock(text=element["text"])
                                 _parts.append(_part)
                             elif element["type"] == "guarded_text":
-                                # Wrap guarded_text in guardrailConverseContent block
+                                # Wrap guarded_text in guardContent block
                                 _part = BedrockContentBlock(
-                                    guardrailConverseContent={"text": element["text"]}
+                                    guardContent={"text": {"text": element["text"]}}
                                 )
                                 _parts.append(_part)
                             elif element["type"] == "image_url":
@@ -3171,7 +3170,6 @@ async def _bedrock_converse_messages_pt_async(  # noqa: PLR0915
 
                 msg_i += 1
             if user_content:
-
                 if len(contents) > 0 and contents[-1]["role"] == "user":
                     if (
                         assistant_continue_message is not None
@@ -3506,9 +3504,9 @@ def _bedrock_converse_messages_pt(  # noqa: PLR0915
                             _part = BedrockContentBlock(text=element["text"])
                             _parts.append(_part)
                         elif element["type"] == "guarded_text":
-                            # Wrap guarded_text in guardrailConverseContent block
+                            # Wrap guarded_text in guardContent block
                             _part = BedrockContentBlock(
-                                guardrailConverseContent={"text": element["text"]}
+                                guardContent={"text": {"text": element["text"]}}
                             )
                             _parts.append(_part)
                         elif element["type"] == "image_url":
@@ -3554,7 +3552,6 @@ def _bedrock_converse_messages_pt(  # noqa: PLR0915
 
             msg_i += 1
         if user_content:
-
             if len(contents) > 0 and contents[-1]["role"] == "user":
                 if (
                     assistant_continue_message is not None
Original file line number	Diff line number	Diff line change
`@@ -769,6 +769,12 @@`
`769`	`769`	`"deepseek_r1",`
`770`	`770`	`]`
`771`	`771`
	`772`	`+BEDROCK_EMBEDDING_PROVIDERS_LITERAL = Literal[`
	`773`	`+ "cohere",`
	`774`	`+ "amazon",`
	`775`	`+ "twelvelabs",`
	`776`	`+]`
	`777`	`+`
`772`	`778`	`BEDROCK_CONVERSE_MODELS = [`
`773`	`779`	`"openai.gpt-oss-20b-1:0",`
`774`	`780`	`"openai.gpt-oss-120b-1:0",`
`@@ -822,6 +828,7 @@`
`822`	`828`	`"amazon.titan-embed-text-v1",`
`823`	`829`	`"cohere.embed-english-v3",`
`824`	`830`	`"cohere.embed-multilingual-v3",`
	`831`	`+ "twelvelabs.marengo-embed-2-7-v1:0",`
`825`	`832`	`]`
`826`	`833`	`)`
`827`	`834`
`@@ -1065,4 +1072,6 @@`
`1065`	`1072`	`]`
`1066`	`1073`
`1067`	`1074`	`# CoroutineChecker cache configuration`
`1068`		`-COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000))`
	`1075`	`+COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(`
	`1076`	`+ os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000)`
	`1077`	`+)`