[Workers AI]Whisper Tutorial (#21715)

daisyfaithauma · deadlypants1973 · RebeccaTamachiro · commit ade30b6a721b · 2025-04-21T11:07:06.000+01:00
* Whisper Tutorial

* minor fixed

* Update src/content/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx

Co-authored-by: Kate Tungusova &lt;70746074+deadlypants1973@users.noreply.github.com&gt;

* Update src/content/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx

Co-authored-by: Kate Tungusova &lt;70746074+deadlypants1973@users.noreply.github.com&gt;

---------

Co-authored-by: Kate Tungusova &lt;70746074+deadlypants1973@users.noreply.github.com&gt;
diff --git a/src/content/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx b/src/content/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx
@@ -0,0 +1,238 @@
+---
+updated: 2025-04-03
+difficulty: Beginner
+pcx_content_type: tutorial
+title: Whisper-large-v3-turbo with Cloudflare Workers AI
+tags:
+  - AI
+---
+
+In this tutorial you will learn how to:
+
+- **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation.
+- **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations.
+- **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment.
+
+## 1: Create a new Cloudflare Worker project
+
+import { Render, PackageManagers, WranglerConfig } from "~/components";
+
+<Render file="prereqs" product="workers" />
+
+You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare.
+
+Create a new project named `whisper-tutorial` by running:
+
+<PackageManagers
+	type="create"
+	pkg="cloudflare@latest"
+	args={"whisper-tutorial"}
+/>
+
+Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI.
+
+<Render
+	file="c3-post-run-steps"
+	product="workers"
+	params={{
+		category: "hello-world",
+		type: "Worker only",
+		lang: "TypeScript",
+	}}
+/>
+
+This will create a new `whisper-tutorial` directory. Your new `whisper-tutorial` directory will include:
+
+- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
+- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file.
+
+Go to your application directory:
+
+```sh
+cd whisper-tutorial
+```
+
+## 2. Connect your Worker to Workers AI
+
+You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.
+
+To bind Workers AI to your Worker, add the following to the end of your `wrangler.toml` file:
+
+<WranglerConfig>
+
+```toml
+[ai]
+binding = "AI"
+```
+
+</WranglerConfig>
+
+Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/).
+
+## 3. Configure Wrangler
+
+In your wrangler file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later):
+
+<WranglerConfig>
+
+```toml title="wrangler.toml"
+compatibility_flags = [ "nodejs_compat" ]
+compatibility_date = "2024-09-23"
+```
+
+</WranglerConfig>
+
+## 4. Handle large audio files with chunking
+
+Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to:
+
+(1) Extract an audio file URL from the query parameters.
+
+(2) Fetch the audio file while explicitly following redirects.
+
+(3) Split the audio file into smaller chunks (such as, 1 MB chunks).
+
+(4) Transcribe each chunk using the Whisper-large-v3-turbo model via the Cloudflare AI binding.
+
+(5) Return the aggregated transcription as plain text.
+
+```ts
+import { Buffer } from "node:buffer";
+import type { Ai } from "workers-ai";
+
+export interface Env {
+	AI: Ai;
+	// If needed, add your KV namespace for storing transcripts.
+	// MY_KV_NAMESPACE: KVNamespace;
+}
+
+/**
+ * Fetches the audio file from the provided URL and splits it into chunks.
+ * This function explicitly follows redirects.
+ *
+ * @param audioUrl - The URL of the audio file.
+ * @returns An array of ArrayBuffers, each representing a chunk of the audio.
+ */
+async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> {
+	const response = await fetch(audioUrl, { redirect: "follow" });
+	if (!response.ok) {
+		throw new Error(`Failed to fetch audio: ${response.status}`);
+	}
+	const arrayBuffer = await response.arrayBuffer();
+
+	// Example: Split the audio into 1MB chunks.
+	const chunkSize = 1024 * 1024; // 1MB
+	const chunks: ArrayBuffer[] = [];
+	for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) {
+		const chunk = arrayBuffer.slice(i, i + chunkSize);
+		chunks.push(chunk);
+	}
+	return chunks;
+}
+
+/**
+ * Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model.
+ * The function converts the audio chunk to a Base64-encoded string and
+ * sends it to the model via the AI binding.
+ *
+ * @param chunkBuffer - The audio chunk as an ArrayBuffer.
+ * @param env - The Cloudflare Worker environment, including the AI binding.
+ * @returns The transcription text from the model.
+ */
+async function transcribeChunk(
+	chunkBuffer: ArrayBuffer,
+	env: Env,
+): Promise<string> {
+	const base64 = Buffer.from(chunkBuffer, "binary").toString("base64");
+	const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", {
+		audio: base64,
+		// Optional parameters (uncomment and set if needed):
+		// task: "transcribe",   // or "translate"
+		// language: "en",
+		// vad_filter: "false",
+		// initial_prompt: "Provide context if needed.",
+		// prefix: "Transcription:",
+	});
+	return res.text; // Assumes the transcription result includes a "text" property.
+}
+
+/**
+ * The main fetch handler. It extracts the 'url' query parameter, fetches the audio,
+ * processes it in chunks, and returns the full transcription.
+ */
+export default {
+	async fetch(
+		request: Request,
+		env: Env,
+		ctx: ExecutionContext,
+	): Promise<Response> {
+		// Extract the audio URL from the query parameters.
+		const { searchParams } = new URL(request.url);
+		const audioUrl = searchParams.get("url");
+
+		if (!audioUrl) {
+			return new Response("Missing 'url' query parameter", { status: 400 });
+		}
+
+		// Get the audio chunks.
+		const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl);
+		let fullTranscript = "";
+
+		// Process each chunk and build the full transcript.
+		for (const chunk of audioChunks) {
+			try {
+				const transcript = await transcribeChunk(chunk, env);
+				fullTranscript += transcript + "\n";
+			} catch (error) {
+				fullTranscript += "[Error transcribing chunk]\n";
+			}
+		}
+
+		return new Response(fullTranscript, {
+			headers: { "Content-Type": "text/plain" },
+		});
+	},
+} satisfies ExportedHandler<Env>;
+```
+
+## 5. Deploy your Worker
+
+1. **Run the Worker locally:**
+
+   Use wrangler's development mode to test your Worker locally:
+
+```sh
+npx wrangler dev
+```
+
+Open your browser and go to [http://localhost:8787](http://localhost:8787), or use curl:
+
+```sh
+curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
+```
+
+Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.)
+
+2. **Deploy the Worker:**
+
+   Once testing is complete, deploy your Worker with:
+
+```sh
+npx wrangler deploy
+```
+
+3. **Test the deployed Worker:**
+
+   After deployment, test your Worker by passing the audio URL as a query parameter:
+
+```sh
+curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
+```
+
+Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details.
+
+If successful, the Worker will return a transcript of the audio file:
+
+```sh
+This is the transcript of the audio...
+```