Skip to content

Commit 85ade05

Browse files
draft
1 parent 3da1424 commit 85ade05

File tree

1 file changed

+244
-0
lines changed

1 file changed

+244
-0
lines changed
Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
---
2+
updated: 2025-03-04
3+
difficulty: Beginner
4+
pcx_content_type: tutorial
5+
title: Whisper-large-v3-turbo with Cloudflare Workers AI
6+
tags:
7+
- AI
8+
---
9+
10+
In this tutorial you will learn how to:
11+
12+
- **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation.
13+
- **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations.
14+
- **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment.
15+
16+
## Step 1: Create a New Cloudflare Worker Project
17+
18+
import { Render, PackageManagers, WranglerConfig } from "~/components";
19+
20+
This guide will instruct you through setting up and deploying your first Workers AI project. You will use [Workers](/workers/), a Workers AI binding, and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network.
21+
22+
<Render file="prereqs" product="workers" />
23+
24+
## 1. Create a Worker project
25+
26+
You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare.
27+
28+
Create a new project named `hello-ai` by running:
29+
30+
<PackageManagers type="create" pkg="cloudflare@latest" args={"hello-ai"} />
31+
32+
Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI.
33+
34+
<Render
35+
file="c3-post-run-steps"
36+
product="workers"
37+
params={{
38+
category: "hello-world",
39+
type: "Worker only",
40+
lang: "TypeScript",
41+
}}
42+
/>
43+
44+
This will create a new `hello-ai` directory. Your new `hello-ai` directory will include:
45+
46+
- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
47+
- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file.
48+
49+
Go to your application directory:
50+
51+
```sh
52+
cd hello-ai
53+
```
54+
55+
## 2. Connect your Worker to Workers AI
56+
57+
You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.
58+
59+
To bind Workers AI to your Worker, add the following to the end of your Wrangler file:
60+
61+
<WranglerConfig>
62+
63+
```toml
64+
[ai]
65+
binding = "AI"
66+
```
67+
68+
</WranglerConfig>
69+
70+
Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/).
71+
72+
3. **Navigate to Your Project Directory:**
73+
74+
```
75+
cd whisper-tutorial
76+
```
77+
78+
## Step 2: Configure Wrangler
79+
80+
1. **Enable Node.js Compatibility:**
81+
82+
In your `wrangler.toml` file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later):
83+
84+
```
85+
compatibility_date = "2024-09-23"
86+
nodejs_compat = true
87+
```
88+
89+
2. **Add the AI Binding:**
90+
91+
In the same file, add an AI binding so that you can use Cloudflare’s AI models in your Worker:
92+
93+
```
94+
[ai]
95+
binding = "AI"
96+
```
97+
98+
## Step 3: Full TypeScript Code – Handling Large Audio Files with Chunking
99+
100+
Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to:
101+
102+
- Extract an audio file URL from the query parameters.
103+
- Fetch the audio file while explicitly following redirects.
104+
- Split the audio file into smaller chunks (e.g., 1MB chunks).
105+
- Transcribe each chunk using the Whisper‑large‑v3‑turbo model via the Cloudflare AI binding.
106+
- Return the aggregated transcription as plain text.
107+
108+
```
109+
import { Buffer } from "node:buffer";
110+
import type { Ai } from "workers-ai";
111+
112+
export interface Env {
113+
AI: Ai;
114+
// If needed, add your KV namespace for storing transcripts.
115+
// MY_KV_NAMESPACE: KVNamespace;
116+
}
117+
118+
/**
119+
* Fetches the audio file from the provided URL and splits it into chunks.
120+
* This function explicitly follows redirects.
121+
*
122+
* @param audioUrl - The URL of the audio file.
123+
* @returns An array of ArrayBuffers, each representing a chunk of the audio.
124+
*/
125+
async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> {
126+
const response = await fetch(audioUrl, { redirect: "follow" });
127+
if (!response.ok) {
128+
throw new Error(`Failed to fetch audio: ${response.status}`);
129+
}
130+
const arrayBuffer = await response.arrayBuffer();
131+
132+
// Example: Split the audio into 1MB chunks.
133+
const chunkSize = 1024 * 1024; // 1MB
134+
const chunks: ArrayBuffer[] = [];
135+
for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) {
136+
const chunk = arrayBuffer.slice(i, i + chunkSize);
137+
chunks.push(chunk);
138+
}
139+
return chunks;
140+
}
141+
142+
/**
143+
* Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model.
144+
* The function converts the audio chunk to a Base64-encoded string and
145+
* sends it to the model via the AI binding.
146+
*
147+
* @param chunkBuffer - The audio chunk as an ArrayBuffer.
148+
* @param env - The Cloudflare Worker environment, including the AI binding.
149+
* @returns The transcription text from the model.
150+
*/
151+
async function transcribeChunk(chunkBuffer: ArrayBuffer, env: Env): Promise<string> {
152+
const base64 = Buffer.from(chunkBuffer, "binary").toString("base64");
153+
const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", {
154+
audio: base64,
155+
// Optional parameters (uncomment and set if needed):
156+
// task: "transcribe", // or "translate"
157+
// language: "en",
158+
// vad_filter: "false",
159+
// initial_prompt: "Provide context if needed.",
160+
// prefix: "Transcription:",
161+
});
162+
return res.text; // Assumes the transcription result includes a "text" property.
163+
}
164+
165+
/**
166+
* The main fetch handler. It extracts the 'url' query parameter, fetches the audio,
167+
* processes it in chunks, and returns the full transcription.
168+
*/
169+
export default {
170+
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
171+
// Extract the audio URL from the query parameters.
172+
const { searchParams } = new URL(request.url);
173+
const audioUrl = searchParams.get("url");
174+
175+
if (!audioUrl) {
176+
return new Response("Missing 'url' query parameter", { status: 400 });
177+
}
178+
179+
// Get the audio chunks.
180+
const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl);
181+
let fullTranscript = "";
182+
183+
// Process each chunk and build the full transcript.
184+
for (const chunk of audioChunks) {
185+
try {
186+
const transcript = await transcribeChunk(chunk, env);
187+
fullTranscript += transcript + "\n";
188+
} catch (error) {
189+
fullTranscript += "[Error transcribing chunk]\n";
190+
}
191+
}
192+
193+
return new Response(fullTranscript, {
194+
headers: { "Content-Type": "text/plain" },
195+
});
196+
},
197+
} satisfies ExportedHandler<Env>;
198+
```
199+
200+
---
201+
202+
## Step 4: Develop, Test, and Deploy
203+
204+
1. **Run the Worker Locally:**
205+
206+
Use Wrangler's development mode to test your Worker locally:
207+
208+
```
209+
npx wrangler dev --remote
210+
```
211+
212+
Open your browser and visit [http://localhost:8787](http://localhost:8787), or use curl:
213+
214+
```
215+
curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
216+
```
217+
218+
Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.)
219+
220+
221+
222+
2. **Deploy the Worker:**
223+
224+
Once testing is complete, deploy your Worker with:
225+
226+
```
227+
npx wrangler deploy
228+
```
229+
230+
3. **Test the Deployed Worker:**
231+
232+
After deployment, test your Worker by passing the audio URL as a query parameter:
233+
234+
```
235+
curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
236+
```
237+
238+
Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details.
239+
240+
If successful, the Worker will return a transcript of the audio file:
241+
242+
```sh
243+
This is the transcript of the audio...
244+
``

0 commit comments

Comments
 (0)