|
| 1 | +--- |
| 2 | +updated: 2025-03-04 |
| 3 | +difficulty: Beginner |
| 4 | +pcx_content_type: tutorial |
| 5 | +title: Whisper-large-v3-turbo with Cloudflare Workers AI |
| 6 | +tags: |
| 7 | + - AI |
| 8 | +--- |
| 9 | + |
| 10 | +In this tutorial you will learn how to: |
| 11 | + |
| 12 | +- **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation. |
| 13 | +- **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations. |
| 14 | +- **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment. |
| 15 | + |
| 16 | +## Step 1: Create a New Cloudflare Worker Project |
| 17 | + |
| 18 | +import { Render, PackageManagers, WranglerConfig } from "~/components"; |
| 19 | + |
| 20 | +This guide will instruct you through setting up and deploying your first Workers AI project. You will use [Workers](/workers/), a Workers AI binding, and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network. |
| 21 | + |
| 22 | +<Render file="prereqs" product="workers" /> |
| 23 | + |
| 24 | +## 1. Create a Worker project |
| 25 | + |
| 26 | +You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare. |
| 27 | + |
| 28 | +Create a new project named `hello-ai` by running: |
| 29 | + |
| 30 | +<PackageManagers type="create" pkg="cloudflare@latest" args={"hello-ai"} /> |
| 31 | + |
| 32 | +Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI. |
| 33 | + |
| 34 | +<Render |
| 35 | + file="c3-post-run-steps" |
| 36 | + product="workers" |
| 37 | + params={{ |
| 38 | + category: "hello-world", |
| 39 | + type: "Worker only", |
| 40 | + lang: "TypeScript", |
| 41 | + }} |
| 42 | +/> |
| 43 | + |
| 44 | +This will create a new `hello-ai` directory. Your new `hello-ai` directory will include: |
| 45 | + |
| 46 | +- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`. |
| 47 | +- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. |
| 48 | + |
| 49 | +Go to your application directory: |
| 50 | + |
| 51 | +```sh |
| 52 | +cd hello-ai |
| 53 | +``` |
| 54 | + |
| 55 | +## 2. Connect your Worker to Workers AI |
| 56 | + |
| 57 | +You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. |
| 58 | + |
| 59 | +To bind Workers AI to your Worker, add the following to the end of your Wrangler file: |
| 60 | + |
| 61 | +<WranglerConfig> |
| 62 | + |
| 63 | +```toml |
| 64 | +[ai] |
| 65 | +binding = "AI" |
| 66 | +``` |
| 67 | + |
| 68 | +</WranglerConfig> |
| 69 | + |
| 70 | +Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/). |
| 71 | + |
| 72 | +3. **Navigate to Your Project Directory:** |
| 73 | + |
| 74 | +``` |
| 75 | +cd whisper-tutorial |
| 76 | +``` |
| 77 | + |
| 78 | +## Step 2: Configure Wrangler |
| 79 | + |
| 80 | +1. **Enable Node.js Compatibility:** |
| 81 | + |
| 82 | + In your `wrangler.toml` file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later): |
| 83 | + |
| 84 | +``` |
| 85 | +compatibility_date = "2024-09-23" |
| 86 | +nodejs_compat = true |
| 87 | +``` |
| 88 | + |
| 89 | +2. **Add the AI Binding:** |
| 90 | + |
| 91 | + In the same file, add an AI binding so that you can use Cloudflare’s AI models in your Worker: |
| 92 | + |
| 93 | +``` |
| 94 | +[ai] |
| 95 | +binding = "AI" |
| 96 | +``` |
| 97 | + |
| 98 | +## Step 3: Full TypeScript Code – Handling Large Audio Files with Chunking |
| 99 | + |
| 100 | +Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to: |
| 101 | + |
| 102 | +- Extract an audio file URL from the query parameters. |
| 103 | +- Fetch the audio file while explicitly following redirects. |
| 104 | +- Split the audio file into smaller chunks (e.g., 1MB chunks). |
| 105 | +- Transcribe each chunk using the Whisper‑large‑v3‑turbo model via the Cloudflare AI binding. |
| 106 | +- Return the aggregated transcription as plain text. |
| 107 | + |
| 108 | +``` |
| 109 | +import { Buffer } from "node:buffer"; |
| 110 | +import type { Ai } from "workers-ai"; |
| 111 | +
|
| 112 | +export interface Env { |
| 113 | + AI: Ai; |
| 114 | + // If needed, add your KV namespace for storing transcripts. |
| 115 | + // MY_KV_NAMESPACE: KVNamespace; |
| 116 | +} |
| 117 | +
|
| 118 | +/** |
| 119 | + * Fetches the audio file from the provided URL and splits it into chunks. |
| 120 | + * This function explicitly follows redirects. |
| 121 | + * |
| 122 | + * @param audioUrl - The URL of the audio file. |
| 123 | + * @returns An array of ArrayBuffers, each representing a chunk of the audio. |
| 124 | + */ |
| 125 | +async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> { |
| 126 | + const response = await fetch(audioUrl, { redirect: "follow" }); |
| 127 | + if (!response.ok) { |
| 128 | + throw new Error(`Failed to fetch audio: ${response.status}`); |
| 129 | + } |
| 130 | + const arrayBuffer = await response.arrayBuffer(); |
| 131 | +
|
| 132 | + // Example: Split the audio into 1MB chunks. |
| 133 | + const chunkSize = 1024 * 1024; // 1MB |
| 134 | + const chunks: ArrayBuffer[] = []; |
| 135 | + for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) { |
| 136 | + const chunk = arrayBuffer.slice(i, i + chunkSize); |
| 137 | + chunks.push(chunk); |
| 138 | + } |
| 139 | + return chunks; |
| 140 | +} |
| 141 | +
|
| 142 | +/** |
| 143 | + * Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model. |
| 144 | + * The function converts the audio chunk to a Base64-encoded string and |
| 145 | + * sends it to the model via the AI binding. |
| 146 | + * |
| 147 | + * @param chunkBuffer - The audio chunk as an ArrayBuffer. |
| 148 | + * @param env - The Cloudflare Worker environment, including the AI binding. |
| 149 | + * @returns The transcription text from the model. |
| 150 | + */ |
| 151 | +async function transcribeChunk(chunkBuffer: ArrayBuffer, env: Env): Promise<string> { |
| 152 | + const base64 = Buffer.from(chunkBuffer, "binary").toString("base64"); |
| 153 | + const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", { |
| 154 | + audio: base64, |
| 155 | + // Optional parameters (uncomment and set if needed): |
| 156 | + // task: "transcribe", // or "translate" |
| 157 | + // language: "en", |
| 158 | + // vad_filter: "false", |
| 159 | + // initial_prompt: "Provide context if needed.", |
| 160 | + // prefix: "Transcription:", |
| 161 | + }); |
| 162 | + return res.text; // Assumes the transcription result includes a "text" property. |
| 163 | +} |
| 164 | +
|
| 165 | +/** |
| 166 | + * The main fetch handler. It extracts the 'url' query parameter, fetches the audio, |
| 167 | + * processes it in chunks, and returns the full transcription. |
| 168 | + */ |
| 169 | +export default { |
| 170 | + async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> { |
| 171 | + // Extract the audio URL from the query parameters. |
| 172 | + const { searchParams } = new URL(request.url); |
| 173 | + const audioUrl = searchParams.get("url"); |
| 174 | +
|
| 175 | + if (!audioUrl) { |
| 176 | + return new Response("Missing 'url' query parameter", { status: 400 }); |
| 177 | + } |
| 178 | +
|
| 179 | + // Get the audio chunks. |
| 180 | + const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl); |
| 181 | + let fullTranscript = ""; |
| 182 | +
|
| 183 | + // Process each chunk and build the full transcript. |
| 184 | + for (const chunk of audioChunks) { |
| 185 | + try { |
| 186 | + const transcript = await transcribeChunk(chunk, env); |
| 187 | + fullTranscript += transcript + "\n"; |
| 188 | + } catch (error) { |
| 189 | + fullTranscript += "[Error transcribing chunk]\n"; |
| 190 | + } |
| 191 | + } |
| 192 | +
|
| 193 | + return new Response(fullTranscript, { |
| 194 | + headers: { "Content-Type": "text/plain" }, |
| 195 | + }); |
| 196 | + }, |
| 197 | +} satisfies ExportedHandler<Env>; |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Step 4: Develop, Test, and Deploy |
| 203 | + |
| 204 | +1. **Run the Worker Locally:** |
| 205 | + |
| 206 | + Use Wrangler's development mode to test your Worker locally: |
| 207 | + |
| 208 | +``` |
| 209 | +npx wrangler dev --remote |
| 210 | +``` |
| 211 | + |
| 212 | + Open your browser and visit [http://localhost:8787](http://localhost:8787), or use curl: |
| 213 | + |
| 214 | +``` |
| 215 | +curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" |
| 216 | +``` |
| 217 | + |
| 218 | + Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.) |
| 219 | + |
| 220 | + |
| 221 | + |
| 222 | +2. **Deploy the Worker:** |
| 223 | + |
| 224 | + Once testing is complete, deploy your Worker with: |
| 225 | + |
| 226 | +``` |
| 227 | +npx wrangler deploy |
| 228 | +``` |
| 229 | + |
| 230 | +3. **Test the Deployed Worker:** |
| 231 | + |
| 232 | + After deployment, test your Worker by passing the audio URL as a query parameter: |
| 233 | + |
| 234 | +``` |
| 235 | +curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" |
| 236 | +``` |
| 237 | + |
| 238 | + Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details. |
| 239 | + |
| 240 | +If successful, the Worker will return a transcript of the audio file: |
| 241 | + |
| 242 | +```sh |
| 243 | +This is the transcript of the audio... |
| 244 | +`` |
0 commit comments