Skip to content

Commit 664c83c

Browse files
Merge branch 'litellm_contributor_prs_09_18_2025_p2' into litellm_dev_09_17_2025_p2_v2
2 parents c620d76 + 9238e1d commit 664c83c

File tree

50 files changed

+2345
-810
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+2345
-810
lines changed

.circleci/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1458,6 +1458,7 @@ jobs:
14581458
# - run: python ./tests/documentation_tests/test_general_setting_keys.py
14591459
- run: python ./tests/code_coverage_tests/check_licenses.py
14601460
- run: python ./tests/code_coverage_tests/router_code_coverage.py
1461+
- run: python ./tests/code_coverage_tests/test_chat_completion_imports.py
14611462
- run: python ./tests/code_coverage_tests/info_log_check.py
14621463
- run: python ./tests/code_coverage_tests/test_ban_set_verbose.py
14631464
- run: python ./tests/code_coverage_tests/code_qa_check_tests.py

docs/my-website/docs/providers/bedrock.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1821,6 +1821,7 @@ Here's an example of using a bedrock model with LiteLLM. For a complete list, re
18211821
| Mistral 7B Instruct | `completion(model='bedrock/mistral.mistral-7b-instruct-v0:2', messages=messages)` | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
18221822
| Mixtral 8x7B Instruct | `completion(model='bedrock/mistral.mixtral-8x7b-instruct-v0:1', messages=messages)` | `os.environ['AWS_ACCESS_KEY_ID']`, `os.environ['AWS_SECRET_ACCESS_KEY']`, `os.environ['AWS_REGION_NAME']` |
18231823

1824+
18241825
## Bedrock Embedding
18251826

18261827
### API keys
@@ -1842,11 +1843,29 @@ response = embedding(
18421843
print(response)
18431844
```
18441845

1846+
#### Titan V2 - encoding_format support
1847+
```python
1848+
from litellm import embedding
1849+
# Float format (default)
1850+
response = embedding(
1851+
model="bedrock/amazon.titan-embed-text-v2:0",
1852+
input=["good morning from litellm"],
1853+
encoding_format="float" # Returns float array
1854+
)
1855+
1856+
# Binary format
1857+
response = embedding(
1858+
model="bedrock/amazon.titan-embed-text-v2:0",
1859+
input=["good morning from litellm"],
1860+
encoding_format="base64" # Returns base64 encoded binary
1861+
)
1862+
```
1863+
18451864
## Supported AWS Bedrock Embedding Models
18461865

18471866
| Model Name | Usage | Supported Additional OpenAI params |
18481867
|----------------------|---------------------------------------------|-----|
1849-
| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_v2_transformation.py#L59) |
1868+
| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` | `dimensions`, `encoding_format` |
18501869
| Titan Embeddings - V1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_g1_transformation.py#L53)
18511870
| Titan Multimodal Embeddings | `embedding(model="bedrock/amazon.titan-embed-image-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py#L28) |
18521871
| Cohere Embeddings - English | `embedding(model="bedrock/cohere.embed-english-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
## Bedrock Embedding
2+
3+
## Supported Embedding Models
4+
5+
| Provider | LiteLLM Route | AWS Documentation |
6+
|----------|---------------|-------------------|
7+
| Amazon Titan | `bedrock/amazon.*` | [Amazon Titan Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) |
8+
| Cohere | `bedrock/cohere.*` | [Cohere Embeddings](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-embed.html) |
9+
| TwelveLabs | `bedrock/us.twelvelabs.*` | [TwelveLabs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-twelvelabs.html) |
10+
11+
### API keys
12+
This can be set as env variables or passed as **params to litellm.embedding()**
13+
```python
14+
import os
15+
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
16+
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
17+
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
18+
```
19+
20+
## Usage
21+
### LiteLLM Python SDK
22+
```python
23+
from litellm import embedding
24+
response = embedding(
25+
model="bedrock/amazon.titan-embed-text-v1",
26+
input=["good morning from litellm"],
27+
)
28+
print(response)
29+
```
30+
31+
### LiteLLM Proxy Server
32+
33+
#### 1. Setup config.yaml
34+
```yaml
35+
model_list:
36+
- model_name: titan-embed-v1
37+
litellm_params:
38+
model: bedrock/amazon.titan-embed-text-v1
39+
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
40+
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
41+
aws_region_name: us-east-1
42+
- model_name: titan-embed-v2
43+
litellm_params:
44+
model: bedrock/amazon.titan-embed-text-v2:0
45+
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
46+
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
47+
aws_region_name: us-east-1
48+
```
49+
50+
#### 2. Start Proxy
51+
```bash
52+
litellm --config /path/to/config.yaml
53+
```
54+
55+
#### 3. Use with OpenAI Python SDK
56+
```python
57+
import openai
58+
client = openai.OpenAI(
59+
api_key="anything",
60+
base_url="http://0.0.0.0:4000"
61+
)
62+
63+
response = client.embeddings.create(
64+
input=["good morning from litellm"],
65+
model="titan-embed-v1"
66+
)
67+
print(response)
68+
```
69+
70+
#### 4. Use with LiteLLM Python SDK
71+
```python
72+
import litellm
73+
response = litellm.embedding(
74+
model="titan-embed-v1", # model alias from config.yaml
75+
input=["good morning from litellm"],
76+
api_base="http://0.0.0.0:4000",
77+
api_key="anything"
78+
)
79+
print(response)
80+
```
81+
82+
## Supported AWS Bedrock Embedding Models
83+
84+
| Model Name | Usage | Supported Additional OpenAI params |
85+
|----------------------|---------------------------------------------|-----|
86+
| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_v2_transformation.py#L59) |
87+
| Titan Embeddings - V1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_g1_transformation.py#L53)
88+
| Titan Multimodal Embeddings | `embedding(model="bedrock/amazon.titan-embed-image-v1", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/amazon_titan_multimodal_transformation.py#L28) |
89+
| TwelveLabs Marengo Embed 2.7 | `embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input)` | Supports multimodal input (text, video, audio, image) |
90+
| Cohere Embeddings - English | `embedding(model="bedrock/cohere.embed-english-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
91+
| Cohere Embeddings - Multilingual | `embedding(model="bedrock/cohere.embed-multilingual-v3", input=input)` | [here](https://github.com/BerriAI/litellm/blob/f5905e100068e7a4d61441d7453d7cf5609c2121/litellm/llms/bedrock/embed/cohere_transformation.py#L18)
92+
93+
### Advanced - [Drop Unsupported Params](https://docs.litellm.ai/docs/completion/drop_params#openai-proxy-usage)
94+
95+
### Advanced - [Pass model/provider-specific Params](https://docs.litellm.ai/docs/completion/provider_specific_params#proxy-usage)

docs/my-website/docs/proxy/config_settings.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -772,3 +772,4 @@ router_settings:
772772
| WEBHOOK_URL | URL for receiving webhooks from external services
773773
| SPEND_LOG_RUN_LOOPS | Constant for setting how many runs of 1000 batch deletes should spend_log_cleanup task run |
774774
| SPEND_LOG_CLEANUP_BATCH_SIZE | Number of logs deleted per batch during cleanup. Default is 1000 |
775+
| COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY | Maximum size for CoroutineChecker in-memory cache. Default is 1000 |

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,7 @@ const sidebars = {
411411
label: "Bedrock",
412412
items: [
413413
"providers/bedrock",
414+
"providers/bedrock_embedding",
414415
"providers/bedrock_agents",
415416
"providers/bedrock_batches",
416417
"providers/bedrock_vector_store",

litellm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
bedrock_embedding_models,
6868
known_tokenizer_config,
6969
BEDROCK_INVOKE_PROVIDERS_LITERAL,
70+
BEDROCK_EMBEDDING_PROVIDERS_LITERAL,
7071
BEDROCK_CONVERSE_MODELS,
7172
DEFAULT_MAX_TOKENS,
7273
DEFAULT_SOFT_BUDGET,
@@ -1045,7 +1046,6 @@ def add_known_models():
10451046
from .llms.databricks.embed.transformation import DatabricksEmbeddingConfig
10461047
from .llms.predibase.chat.transformation import PredibaseConfig
10471048
from .llms.replicate.chat.transformation import ReplicateConfig
1048-
from .llms.cohere.completion.transformation import CohereTextConfig as CohereConfig
10491049
from .llms.snowflake.chat.transformation import SnowflakeConfig
10501050
from .llms.cohere.rerank.transformation import CohereRerankConfig
10511051
from .llms.cohere.rerank_v2.transformation import CohereRerankV2Config

litellm/caching/redis_cache.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import litellm
2020
from litellm._logging import print_verbose, verbose_logger
2121
from litellm.litellm_core_utils.core_helpers import _get_parent_otel_span_from_kwargs
22+
from litellm.litellm_core_utils.coroutine_checker import coroutine_checker
2223
from litellm.types.caching import RedisPipelineIncrementOperation
2324
from litellm.types.services import ServiceTypes
2425

@@ -138,7 +139,7 @@ def __init__(
138139
self.redis_flush_size = redis_flush_size
139140
self.redis_version = "Unknown"
140141
try:
141-
if not inspect.iscoroutinefunction(self.redis_client):
142+
if not coroutine_checker.is_async_callable(self.redis_client):
142143
self.redis_version = self.redis_client.info()["redis_version"] # type: ignore
143144
except Exception:
144145
pass

litellm/constants.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@
179179
os.getenv("NON_LLM_CONNECTION_TIMEOUT", 15)
180180
) # timeout for adjacent services (e.g. jwt auth)
181181
MAX_EXCEPTION_MESSAGE_LENGTH = int(os.getenv("MAX_EXCEPTION_MESSAGE_LENGTH", 2000))
182-
MAX_STRING_LENGTH_PROMPT_IN_DB = int(os.getenv("MAX_STRING_LENGTH_PROMPT_IN_DB", 1000))
182+
MAX_STRING_LENGTH_PROMPT_IN_DB = int(os.getenv("MAX_STRING_LENGTH_PROMPT_IN_DB", 2048))
183183
BEDROCK_MAX_POLICY_SIZE = int(os.getenv("BEDROCK_MAX_POLICY_SIZE", 75))
184184
REPLICATE_POLLING_DELAY_SECONDS = float(
185185
os.getenv("REPLICATE_POLLING_DELAY_SECONDS", 0.5)
@@ -769,6 +769,12 @@
769769
"deepseek_r1",
770770
]
771771

772+
BEDROCK_EMBEDDING_PROVIDERS_LITERAL = Literal[
773+
"cohere",
774+
"amazon",
775+
"twelvelabs",
776+
]
777+
772778
BEDROCK_CONVERSE_MODELS = [
773779
"openai.gpt-oss-20b-1:0",
774780
"openai.gpt-oss-120b-1:0",
@@ -822,6 +828,7 @@
822828
"amazon.titan-embed-text-v1",
823829
"cohere.embed-english-v3",
824830
"cohere.embed-multilingual-v3",
831+
"twelvelabs.marengo-embed-2-7-v1:0",
825832
]
826833
)
827834

@@ -1063,3 +1070,8 @@
10631070
"SMTP_SENDER_EMAIL",
10641071
"TEST_EMAIL_ADDRESS",
10651072
]
1073+
1074+
# CoroutineChecker cache configuration
1075+
COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY = int(
1076+
os.getenv("COROUTINE_CHECKER_MAX_SIZE_IN_MEMORY", 1000)
1077+
)

litellm/integrations/datadog/datadog_llm_obs.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,7 @@ def _get_dd_llm_obs_payload_metadata(
498498
"guardrail_information": standard_logging_payload.get(
499499
"guardrail_information", None
500500
),
501+
"is_streamed_request": self._get_stream_value_from_payload(standard_logging_payload),
501502
}
502503

503504
#########################################################
@@ -561,6 +562,31 @@ def _get_latency_metrics(
561562

562563
return latency_metrics
563564

565+
def _get_stream_value_from_payload(self, standard_logging_payload: StandardLoggingPayload) -> bool:
566+
"""
567+
Extract the stream value from standard logging payload.
568+
569+
The stream field in StandardLoggingPayload is only set to True for completed streaming responses.
570+
For non-streaming requests, it's None. The original stream parameter is in model_parameters.
571+
572+
Returns:
573+
bool: True if this was a streaming request, False otherwise
574+
"""
575+
# Check top-level stream field first (only True for completed streaming)
576+
stream_value = standard_logging_payload.get("stream")
577+
if stream_value is True:
578+
return True
579+
580+
# Fallback to model_parameters.stream for original request parameters
581+
model_params = standard_logging_payload.get("model_parameters", {})
582+
if isinstance(model_params, dict):
583+
stream_value = model_params.get("stream")
584+
if stream_value is True:
585+
return True
586+
587+
# Default to False for non-streaming requests
588+
return False
589+
564590
def _get_spend_metrics(
565591
self, standard_logging_payload: StandardLoggingPayload
566592
) -> DDLLMObsSpendMetrics:
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
"""
2+
Cached imports module for LiteLLM.
3+
4+
This module provides cached import functionality to avoid repeated imports
5+
inside functions that are critical to performance.
6+
"""
7+
8+
from typing import TYPE_CHECKING, Callable, Optional, Type
9+
10+
# Type annotations for cached imports
11+
if TYPE_CHECKING:
12+
from litellm.litellm_core_utils.litellm_logging import Logging
13+
from litellm.litellm_core_utils.coroutine_checker import CoroutineChecker
14+
15+
# Global cache variables
16+
_LiteLLMLogging: Optional[Type["Logging"]] = None
17+
_coroutine_checker: Optional["CoroutineChecker"] = None
18+
_set_callbacks: Optional[Callable] = None
19+
20+
21+
def get_litellm_logging_class() -> Type["Logging"]:
22+
"""Get the cached LiteLLM Logging class, initializing if needed."""
23+
global _LiteLLMLogging
24+
if _LiteLLMLogging is not None:
25+
return _LiteLLMLogging
26+
from litellm.litellm_core_utils.litellm_logging import Logging
27+
_LiteLLMLogging = Logging
28+
return _LiteLLMLogging
29+
30+
31+
def get_coroutine_checker() -> "CoroutineChecker":
32+
"""Get the cached coroutine checker instance, initializing if needed."""
33+
global _coroutine_checker
34+
if _coroutine_checker is not None:
35+
return _coroutine_checker
36+
from litellm.litellm_core_utils.coroutine_checker import coroutine_checker
37+
_coroutine_checker = coroutine_checker
38+
return _coroutine_checker
39+
40+
41+
def get_set_callbacks() -> Callable:
42+
"""Get the cached set_callbacks function, initializing if needed."""
43+
global _set_callbacks
44+
if _set_callbacks is not None:
45+
return _set_callbacks
46+
from litellm.litellm_core_utils.litellm_logging import set_callbacks
47+
_set_callbacks = set_callbacks
48+
return _set_callbacks
49+
50+
51+
def clear_cached_imports() -> None:
52+
"""Clear all cached imports. Useful for testing or memory management."""
53+
global _LiteLLMLogging, _coroutine_checker, _set_callbacks
54+
_LiteLLMLogging = None
55+
_coroutine_checker = None
56+
_set_callbacks = None

0 commit comments

Comments
 (0)