Skip to content

Commit 611ed22

Browse files
authored
feat(): Create workers AI article for image detection
1 parent 81edf34 commit 611ed22

File tree

1 file changed

+212
-0
lines changed

1 file changed

+212
-0
lines changed
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
reviewed: 2025-11-01
3+
difficulty: Beginner
4+
pcx_content_type: tutorial
5+
title: Build a cat breed detector with Llama 3.2 Vision
6+
products:
7+
- Workers AI
8+
tags:
9+
- AI
10+
- Vision Models
11+
- JSON Schema
12+
sidebar:
13+
order: 5
14+
description: Learn how to use Meta's Llama 3.2 11B Vision Instruct model to analyze images and return structured JSON responses at the edge.
15+
---
16+
17+
import { PackageManagers } from "~/components";
18+
19+
In this tutorial, you will build a cat breed detector using Meta's Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI.
20+
21+
The Llama 3.2 11B model strikes an optimal balance between accuracy and performance for focused vision tasks. For applications like breed identification where speed and predictable pricing matter more than handling complex multi-step reasoning, smaller specialized models deployed at the edge often outperform their larger counterparts in real-world scenarios.
22+
23+
By the end of this tutorial, you will learn how to:
24+
25+
- Set up a Cloudflare Worker with Workers AI bindings
26+
- Process images with vision models using the Messages API format
27+
- Implement JSON Schema mode for structured outputs
28+
- Handle large images efficiently to avoid memory issues
29+
- Deploy your vision AI application to Cloudflare's edge network
30+
31+
## Prerequisites
32+
33+
Before you begin, make sure you have:
34+
35+
1. **A Cloudflare account**  If you do not have one already, sign up at [cloudflare.com](https://cloudflare.com).
36+
2. **Workers AI access**  Accept the Workers AI Terms of Service in your Cloudflare dashboard.
37+
3. **Llama 3.2 Vision model access**  Accept the terms of service for the `@cf/meta/llama-3.2-11b-vision-instruct` model in the Workers AI catalog.
38+
4. **Node.js installed**  Download from [nodejs.org](https://nodejs.org). The LTS version is recommended.
39+
40+
## 1. Create a new Worker project
41+
42+
Create a new Worker project named `cat-breed-detector` by running:
43+
44+
<PackageManagers pkg="wrangler@latest" subcommand="init cat-breed-detector" />
45+
46+
When prompted:
47+
48+
- Select **Hello World Example** for the template
49+
- Choose **Worker only** for the deployment target
50+
- Select **TypeScript** for the language
51+
52+
This will create a new directory with a basic Worker project structure.
53+
54+
## 2. Configure the AI binding
55+
56+
To use Workers AI in your Worker, you need to add an AI binding to your `wrangler.jsonc` configuration file.
57+
58+
Open `wrangler.jsonc` and add the following binding below the `observability` section:
59+
60+
```jsonc
61+
"ai": {
62+
"binding": "AI"
63+
}
64+
```
65+
66+
This binding allows your Worker to access the Workers AI runtime through the `env.AI` object.
67+
68+
Next, generate TypeScript types for your Worker bindings:
69+
70+
```sh
71+
npx wrangler types
72+
```
73+
74+
This command creates a `worker-configuration.d.ts` file with type definitions for your environment bindings.
75+
76+
## 3. Install dependencies and start development server
77+
78+
Navigate to your project directory and start the development server:
79+
80+
```sh
81+
cd cat-breed-detector
82+
npm run start
83+
```
84+
85+
Open the localhost URL shown in your terminal (typically `http://localhost:8787`). You should see "Hello World!" displayed in your browser.
86+
87+
## 4. Implement the cat breed detector
88+
89+
Now you will implement the core functionality. Replace the contents of `src/index.ts` with the following code:
90+
91+
```typescript
92+
export interface Env {
93+
AI: Ai;
94+
}
95+
96+
export default {
97+
async fetch(request: Request, env: Env): Promise<Response> {
98+
const res = await fetch('https://cataas.com/cat');
99+
const blob = await res.arrayBuffer();
100+
101+
// Convert array buffer to base64 data URL (in chunks to avoid stack overflow)
102+
const uint8Array = new Uint8Array(blob);
103+
let binaryString = '';
104+
const chunkSize = 8192;
105+
for (let i = 0; i < uint8Array.length; i += chunkSize) {
106+
const chunk = uint8Array.slice(i, i + chunkSize);
107+
binaryString += String.fromCharCode(...chunk);
108+
}
109+
const base64 = btoa(binaryString);
110+
const dataUrl = `data:image/jpeg;base64,${base64}`;
111+
112+
const messages = [
113+
{
114+
role: 'system',
115+
content: 'You are a cat breed expert assistant. You must respond with valid JSON only, matching the provided schema exactly.',
116+
},
117+
{
118+
role: 'user',
119+
content: [
120+
{
121+
type: 'text',
122+
text: 'Analyze this image and identify the cat breed. Respond with a JSON object containing: breed (string), confidence (one of: high, medium, low), and description (string with brief description of the cat).',
123+
},
124+
{ type: 'image_url', image_url: { url: dataUrl } },
125+
],
126+
},
127+
];
128+
129+
const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
130+
messages,
131+
max_tokens: 512,
132+
response_format: {
133+
type: 'json_schema',
134+
json_schema: {
135+
name: 'cat_breed_analysis',
136+
strict: true,
137+
schema: {
138+
type: 'object',
139+
properties: {
140+
breed: {
141+
type: 'string',
142+
description: 'The cat breed identified in the image',
143+
},
144+
confidence: {
145+
type: 'string',
146+
enum: ['high', 'medium', 'low'],
147+
description: 'Confidence level of the breed identification',
148+
},
149+
description: {
150+
type: 'string',
151+
description: 'A brief description of the cat in the image',
152+
},
153+
},
154+
required: ['breed', 'confidence', 'description'],
155+
additionalProperties: false,
156+
},
157+
},
158+
},
159+
});
160+
161+
return Response.json(response);
162+
},
163+
} satisfies ExportedHandler<Env>;
164+
```
165+
166+
### Understanding the code
167+
168+
This Worker implementation has several key components:
169+
170+
**Image fetching and processing**: The Worker fetches a random cat image from `cataas.com/cat` and converts it to a base64-encoded data URL. The conversion happens in chunks of 8192 bytes to avoid stack overflow errors that can occur when processing large images.
171+
172+
**Messages API format**: The Llama 3.2 Vision model uses a chat-based Messages API format, similar to other modern LLMs. The `messages` array includes:
173+
- A `system` message that sets the model's behavior
174+
- A `user` message containing both text and image content
175+
176+
**JSON Schema mode**: The `response_format` parameter enforces structured output using JSON Schema. This ensures the model returns data in a predictable format with:
177+
- `name`: An identifier for the schema
178+
- `strict: true`: Enables strict schema validation
179+
- `schema`: The actual JSON Schema definition with properties, types, and constraints
180+
181+
**Structured output**: By defining `breed`, `confidence`, and `description` fields with specific types and constraints, you guarantee the response will always match your expected format.
182+
183+
## 5. Test your Worker locally
184+
185+
Save the file and refresh your browser. You should see a JSON response similar to:
186+
187+
```json
188+
{
189+
"breed": "Domestic Shorthair",
190+
"confidence": "medium",
191+
"description": "An orange tabby cat with distinctive striped markings"
192+
}
193+
```
194+
195+
Each time you refresh, you will get a different cat image analyzed.
196+
197+
## 6. Deploy to Cloudflare
198+
199+
Once you have tested your Worker locally, deploy it to Cloudflare's global network:
200+
201+
```sh
202+
npx wrangler deploy
203+
```
204+
205+
After deployment, Wrangler will provide a URL where your Worker is live (for example, `https://cat-breed-detector.your-subdomain.workers.dev`).
206+
207+
## Related resources
208+
209+
- [Workers AI documentation](/workers-ai/)
210+
- [Llama 3.2 Vision model reference](/workers-ai/models/llama-3.2-11b-vision-instruct/)
211+
- [JSON Mode documentation](/workers-ai/features/json-mode/)
212+
- [Messages API format](/workers-ai/configuration/open-ai-compatibility/)

0 commit comments

Comments
 (0)