Skip to content

Commit 68928f6

Browse files
committed
ai models
1 parent 4b9864a commit 68928f6

File tree

1 file changed

+151
-10
lines changed

1 file changed

+151
-10
lines changed

src/content/docs/agents/examples/using-ai-models.mdx

Lines changed: 151 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ sidebar:
88

99
import { MetaInfo, Render, Type, TypeScriptExample, WranglerConfig } from "~/components";
1010

11-
Agents can communicate with AI models hosted on any provider, including [Workers AI](/workers-ai/), OpenAI, Anthropic, and Google's Gemini.
11+
Agents can communicate with AI models hosted on any provider, including [Workers AI](/workers-ai/), OpenAI, Anthropic, and Google's Gemini, and use the model routing features in [AI Gateway](/ai-gateway/) to route across providers, eval responses, and manage AI provider rate limits.
1212

1313
Because Agents are built on top of [Durable Objects](/durable-objects/), each Agent or chat session is associated with a stateful compute instance. Tradtional serverless architectures often present challenges for persistent connections needed in real-time applications like chat.
1414

@@ -18,16 +18,116 @@ A user can disconnect during a long-running response from a modern reasoning mod
1818

1919
### Workers AI
2020

21-
TODO
21+
### Inference endpoints
2222

23-
- Workers AI
24-
- AI Gateway / model routing
23+
You can use [any of the models available in Workers AI](/workers-ai/models/) within your Agent by [configuring a binding](/workers-ai/configuration/bindings/).
2524

25+
Workers AI supports streaming responses out-of-the-box by setting `stream: true`, and we strongly recommend using them to avoid buffering and delaying responses, especially for larger models or reasoning models that require more time to generate a response.
26+
27+
<TypeScriptExample file="src/index.ts">
28+
29+
```ts
30+
import { Agent } from "@cloudflare/agents"
31+
32+
interface Env {
33+
AI: Ai;
34+
}
35+
36+
export class MyAgent extends Agent<Env> {
37+
async onRequest(request: Request) {
38+
const response = await env.AI.run(
39+
"@cf/deepseek-ai/deepseek-r1-distill-qwen-32b",
40+
{
41+
prompt: "Build me a Cloudflare Worker that returns JSON.",
42+
stream: true, // Stream a response and don't block the client!
43+
}
44+
);
45+
46+
// Return the stream
47+
return new Response(answer, {
48+
headers: { "content-type": "text/event-stream" }
49+
}
50+
}
51+
```
52+
53+
</TypeScriptExample>
54+
55+
Your wrangler configuration will need an `ai` binding added:
56+
57+
<WranglerConfig>
58+
59+
```toml
60+
[ai]
61+
binding = "AI"
62+
```
63+
64+
</WranglerConfig>
65+
66+
67+
### Model routing
68+
69+
You can also use the model routing features in [AI Gateway](/ai-gateway/) directly from an Agent by specifying a [`gateway` configuration](/ai-gateway/providers/workersai/) when calling the AI binding.
70+
71+
:::note
72+
73+
Model routing allows you to route requests to different AI models based on whether they are reachable, rate-limiting your client, and/or if you've exceeded your cost budget for a specific provider.
74+
75+
:::
76+
77+
<TypeScriptExample file="src/index.ts">
78+
79+
```ts
80+
import { Agent } from "@cloudflare/agents"
81+
82+
interface Env {
83+
AI: Ai;
84+
}
85+
86+
export class MyAgent extends Agent<Env> {
87+
async onRequest(request: Request) {
88+
const response = await env.AI.run(
89+
"@cf/deepseek-ai/deepseek-r1-distill-qwen-32b",
90+
{
91+
prompt: "Build me a Cloudflare Worker that returns JSON."
92+
},
93+
{
94+
gateway: {
95+
id: "{gateway_id}", // Specify your AI Gateway ID here
96+
skipCache: false,
97+
cacheTtl: 3360,
98+
},
99+
},
100+
);
101+
102+
return Response.json(response)
103+
}
104+
}
105+
```
106+
107+
</TypeScriptExample>
108+
109+
Your wrangler configuration will need an `ai` binding added. This is shared across both Workers AI and AI Gateway.
110+
<WranglerConfig>
111+
112+
```toml
113+
[ai]
114+
binding = "AI"
115+
```
116+
117+
</WranglerConfig>
118+
119+
Visit the [AI Gateway documentation](/ai-gateway/) to learn how to configure a gateway and retrieve a gateway ID.
26120
27121
### AI SDK
28122
29123
The [AI SDK](https://sdk.vercel.ai/docs/introduction) provides a unified API for using AI models, including for text generation, tool calling, structured responses, image generation, and more.
30124
125+
To use the AI SDK, install the `ai` package and use it within your Agent. The example below shows how it use it to generate text on request, but you can use it from any method within your Agent, including WebSocket handlers, as part of a scheduled task, or even when the Agent is initialized.
126+
127+
```sh
128+
npm install ai @ai-sdk/openai
129+
```
130+
31131
<TypeScriptExample file="src/index.ts">
32132
33133
```ts
@@ -36,10 +136,6 @@ import { generateText } from 'ai';
36136
import { openai } from '@ai-sdk/openai';
37137

38138
export class MyAgent extends Agent<Env> {
39-
constructor(state: DurableObjectState, env: Env) {
40-
super(state, env);
41-
}
42-
43139
async onRequest(request: Request): Promise<Response> {
44140
const { text } = await generateText({
45141
model: openai("o3-mini"),
@@ -55,7 +151,52 @@ export class MyAgent extends Agent<Env> {
55151
56152
### OpenAI SDK
57153
154+
Agents can call models across any service, including those that support the OpenAI API. For example, you can use the OpenAI SDK to use one of [Google's Gemini models](https://ai.google.dev/gemini-api/docs/openai#node.js) directly from your Agent.
155+
156+
Agents can stream responses back over HTTP using Server Sent Events (SSE) from within an `onRequest` handler, or by using the native [WebSockets](/agents/examples/websockets/) API to responses back to a a client over a long running WebSocket.
58157
59-
## Long-running LLM calls
158+
<TypeScriptExample file="src/index.ts">
159+
160+
```ts
161+
import { Agent } from "@cloudflare/agents"
162+
import { OpenAI } from "openai"
60163

61-
TODO
164+
export class MyAgent extends Agent<Env> {
165+
async onRequest(request: Request): Promise<Response> {
166+
const openai = new OpenAI({
167+
apiKey: this.env.GEMINI_API_KEY,
168+
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
169+
});
170+
171+
// Create a TransformStream to handle streaming data
172+
let { readable, writable } = new TransformStream();
173+
let writer = writable.getWriter();
174+
const textEncoder = new TextEncoder();
175+
176+
// Use ctx.waitUntil to run the async function in the background
177+
// so that it doesn't block the streaming response
178+
ctx.waitUntil(
179+
(async () => {
180+
const stream = await openai.chat.completions.create({
181+
model: "4o",
182+
messages: [{ role: "user", content: "Write me a Cloudflare Worker." }],
183+
stream: true,
184+
});
185+
186+
// loop over the data as it is streamed and write to the writeable
187+
for await (const part of stream) {
188+
writer.write(
189+
textEncoder.encode(part.choices[0]?.delta?.content || ""),
190+
);
191+
}
192+
writer.close();
193+
})(),
194+
);
195+
196+
// Return the readable stream back to the client
197+
return new Response(readable)
198+
}
199+
}
200+
```
201+
202+
</TypeScriptExample>

0 commit comments

Comments
 (0)