You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Agents can communicate with AI models hosted on any provider, including [Workers AI](/workers-ai/), OpenAI, Anthropic, and Google's Gemini, and use the model routing features in [AI Gateway](/ai-gateway/) to route across providers, eval responses, and manage AI provider rate limits.
12
12
@@ -16,9 +16,69 @@ A user can disconnect during a long-running response from a modern reasoning mod
16
16
17
17
## Calling AI Models
18
18
19
+
You can call models from any method within an Agent, including from HTTP requests using the [`onRequest`](/agents/api-reference/sdk/) handler, when a [scheduled task](/agents/examples/schedule-tasks/) runs, when handling a WebSocket message in the [`onMessage`](/agents/examples/websockets/) handler, or from any of your own methods.
20
+
21
+
Importantly, Agents can call AI models on their own — autonomously — and can handle long-running responses that can take minutes (or longer) to respond in full.
22
+
23
+
### Long-running model requests {/*long-running-model-requests*/}
24
+
25
+
Modern [reasoning models](https://platform.openai.com/docs/guides/reasoning) or "thinking" model can take some time to both generate a response _and_ stream the response back to the client.
26
+
27
+
Instead of buffering the entire response, or risking the client disconecting, you can stream the response back to the client by using the [WebSocket API](/agents/examples/websockets/).
You can also persist AI model responses back to [Agent's internal state](/agents/examples/manage-and-sync-state/) by using the `this.setState` method. For example, if you run a [scheduled task](/agents/examples/scheduling-tasks/), you can store the output of the task and read it later. Or, if a user disconnects, read the message history back and send it to the user when they reconnect.
78
+
19
79
### Workers AI
20
80
21
-
### Inference endpoints
81
+
### Hosted models
22
82
23
83
You can use [any of the models available in Workers AI](/workers-ai/models/) within your Agent by [configuring a binding](/workers-ai/configuration/bindings/).
24
84
@@ -63,7 +123,6 @@ binding = "AI"
63
123
64
124
</WranglerConfig>
65
125
66
-
67
126
### Model routing
68
127
69
128
You can also use the model routing features in [AI Gateway](/ai-gateway/) directly from an Agent by specifying a [`gateway` configuration](/ai-gateway/providers/workersai/) when calling the AI binding.
@@ -149,11 +208,11 @@ export class MyAgent extends Agent<Env> {
149
208
150
209
</TypeScriptExample>
151
210
152
-
### OpenAI SDK
211
+
### OpenAI compatible endpoints
153
212
154
213
Agents can call models across any service, including those that support the OpenAI API. For example, you can use the OpenAI SDK to use one of [Google's Gemini models](https://ai.google.dev/gemini-api/docs/openai#node.js) directly from your Agent.
155
214
156
-
Agents can stream responses back over HTTP using Server Sent Events (SSE) from within an `onRequest` handler, or by using the native [WebSockets](/agents/examples/websockets/) API to responses back to a a clientover a long running WebSocket.
215
+
Agents can stream responses back over HTTP using Server Sent Events (SSE) from within an `onRequest` handler, or by using the native [WebSockets](/agents/examples/websockets/) API in your Agent to responses back to a client, which is especially useful for larger models that can take over 30+ seconds to reply.
description: Implement human-in-the-loop functionality using Cloudflare Agents, allowing AI agents to request human approval before executing certain actions
0 commit comments