Skip to content

Commit 2070474

Browse files
feat: async llm support
feat: introduce async contract to BaseLLM feat: add async call support for: Azure provider Anthropic provider OpenAI provider Gemini provider Bedrock provider LiteLLM provider chore: expand scrubbed header fields (conftest, anthropic, bedrock) chore: update docs to cover async functionality chore: update and harden tests to support acall; re-add uri for cassette compatibility chore: generate missing cassette fix: ensure acall is non-abstract and set supports_tools = true for supported Anthropic models chore: improve Bedrock async docstring and general test robustness
1 parent 59180e9 commit 2070474

File tree

70 files changed

+8589
-154
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+8589
-154
lines changed

conftest.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,30 @@ def setup_test_environment() -> Generator[None, Any, None]:
9696
"x-ratelimit-reset-requests": "X-RATELIMIT-RESET-REQUESTS-XXX",
9797
"x-ratelimit-reset-tokens": "X-RATELIMIT-RESET-TOKENS-XXX",
9898
"x-goog-api-key": "X-GOOG-API-KEY-XXX",
99+
"api-key": "X-API-KEY-XXX",
100+
"User-Agent": "X-USER-AGENT-XXX",
101+
"apim-request-id:": "X-API-CLIENT-REQUEST-ID-XXX",
102+
"azureml-model-session": "AZUREML-MODEL-SESSION-XXX",
103+
"x-ms-client-request-id": "X-MS-CLIENT-REQUEST-ID-XXX",
104+
"x-ms-region": "X-MS-REGION-XXX",
105+
"apim-request-id": "APIM-REQUEST-ID-XXX",
106+
"x-api-key": "X-API-KEY-XXX",
107+
"anthropic-organization-id": "ANTHROPIC-ORGANIZATION-ID-XXX",
108+
"request-id": "REQUEST-ID-XXX",
109+
"anthropic-ratelimit-input-tokens-limit": "ANTHROPIC-RATELIMIT-INPUT-TOKENS-LIMIT-XXX",
110+
"anthropic-ratelimit-input-tokens-remaining": "ANTHROPIC-RATELIMIT-INPUT-TOKENS-REMAINING-XXX",
111+
"anthropic-ratelimit-input-tokens-reset": "ANTHROPIC-RATELIMIT-INPUT-TOKENS-RESET-XXX",
112+
"anthropic-ratelimit-output-tokens-limit": "ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-LIMIT-XXX",
113+
"anthropic-ratelimit-output-tokens-remaining": "ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-REMAINING-XXX",
114+
"anthropic-ratelimit-output-tokens-reset": "ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-RESET-XXX",
115+
"anthropic-ratelimit-tokens-limit": "ANTHROPIC-RATELIMIT-TOKENS-LIMIT-XXX",
116+
"anthropic-ratelimit-tokens-remaining": "ANTHROPIC-RATELIMIT-TOKENS-REMAINING-XXX",
117+
"anthropic-ratelimit-tokens-reset": "ANTHROPIC-RATELIMIT-TOKENS-RESET-XXX",
118+
"x-amz-date": "X-AMZ-DATE-XXX",
119+
"amz-sdk-invocation-id": "AMZ-SDK-INVOCATION-ID-XXX",
120+
"accept-encoding": "ACCEPT-ENCODING-XXX",
121+
"x-amzn-requestid": "X-AMZN-REQUESTID-XXX",
122+
"x-amzn-RequestId": "X-AMZN-REQUESTID-XXX",
99123
}
100124

101125

@@ -105,6 +129,8 @@ def _filter_request_headers(request: Request) -> Request: # type: ignore[no-any
105129
for variant in [header_name, header_name.upper(), header_name.title()]:
106130
if variant in request.headers:
107131
request.headers[variant] = [replacement]
132+
133+
request.method = request.method.upper()
108134
return request
109135

110136

docs/en/concepts/llms.mdx

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,6 +1089,50 @@ CrewAI supports streaming responses from LLMs, allowing your application to rece
10891089
</Tab>
10901090
</Tabs>
10911091

1092+
## Async LLM Calls
1093+
1094+
CrewAI supports asynchronous LLM calls for improved performance and concurrency in your AI workflows. Async calls allow you to run multiple LLM requests concurrently without blocking, making them ideal for high-throughput applications and parallel agent operations.
1095+
1096+
<Tabs>
1097+
<Tab title="Basic Usage">
1098+
Use the `acall` method for asynchronous LLM requests:
1099+
1100+
```python
1101+
import asyncio
1102+
from crewai import LLM
1103+
1104+
async def main():
1105+
llm = LLM(model="openai/gpt-4o")
1106+
1107+
# Single async call
1108+
response = await llm.acall("What is the capital of France?")
1109+
print(response)
1110+
1111+
asyncio.run(main())
1112+
```
1113+
1114+
The `acall` method supports all the same parameters as the synchronous `call` method, including messages, tools, and callbacks.
1115+
</Tab>
1116+
1117+
<Tab title="With Streaming">
1118+
Combine async calls with streaming for real-time concurrent responses:
1119+
1120+
```python
1121+
import asyncio
1122+
from crewai import LLM
1123+
1124+
async def stream_async():
1125+
llm = LLM(model="openai/gpt-4o", stream=True)
1126+
1127+
response = await llm.acall("Write a short story about AI")
1128+
1129+
print(response)
1130+
1131+
asyncio.run(stream_async())
1132+
```
1133+
</Tab>
1134+
</Tabs>
1135+
10921136
## Structured LLM Calls
10931137

10941138
CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.

lib/crewai/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ qdrant = [
6868
]
6969
aws = [
7070
"boto3~=1.40.38",
71+
"aiobotocore~=2.25.2",
7172
]
7273
watson = [
7374
"ibm-watsonx-ai~=1.3.39",

0 commit comments

Comments
 (0)