-
Notifications
You must be signed in to change notification settings - Fork 10k
[Workers AI]Whisper Tutorial #21715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[Workers AI]Whisper Tutorial #21715
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
454a9d3
Whisper Tutorial
daisyfaithauma 0790eeb
minor fixed
daisyfaithauma 264199e
Update src/content/docs/workers-ai/guides/tutorials/build-a-workers-a…
daisyfaithauma 384d7f0
Update src/content/docs/workers-ai/guides/tutorials/build-a-workers-a…
daisyfaithauma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
238 changes: 238 additions & 0 deletions
238
...t/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,238 @@ | ||
| --- | ||
| updated: 2025-04-03 | ||
| difficulty: Beginner | ||
| pcx_content_type: tutorial | ||
| title: Whisper-large-v3-turbo with Cloudflare Workers AI | ||
| tags: | ||
| - AI | ||
| --- | ||
|
|
||
| In this tutorial you will learn how to: | ||
|
|
||
| - **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation. | ||
| - **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations. | ||
| - **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment. | ||
|
|
||
| ## 1: Create a new Cloudflare Worker project | ||
|
|
||
| import { Render, PackageManagers, WranglerConfig } from "~/components"; | ||
|
|
||
| <Render file="prereqs" product="workers" /> | ||
|
|
||
| You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare. | ||
|
|
||
| Create a new project named `whisper-tutorial` by running: | ||
|
|
||
| <PackageManagers | ||
| type="create" | ||
| pkg="cloudflare@latest" | ||
| args={"whisper-tutorial"} | ||
| /> | ||
|
|
||
| Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI. | ||
|
|
||
| <Render | ||
| file="c3-post-run-steps" | ||
| product="workers" | ||
| params={{ | ||
| category: "hello-world", | ||
| type: "Worker only", | ||
| lang: "TypeScript", | ||
| }} | ||
| /> | ||
|
|
||
| This will create a new `whisper-tutorial` directory. Your new `whisper-tutorial` directory will include: | ||
|
|
||
| - A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`. | ||
| - A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. | ||
|
|
||
| Go to your application directory: | ||
|
|
||
| ```sh | ||
| cd whisper-tutorial | ||
| ``` | ||
|
|
||
| ## 2. Connect your Worker to Workers AI | ||
|
|
||
| You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. | ||
|
|
||
| To bind Workers AI to your Worker, add the following to the end of your Wrangler file: | ||
|
|
||
| <WranglerConfig> | ||
|
|
||
| ```toml | ||
| [ai] | ||
| binding = "AI" | ||
| ``` | ||
|
|
||
| </WranglerConfig> | ||
|
|
||
| Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/). | ||
|
|
||
| ## 3. Configure Wrangler | ||
|
|
||
| In your wrangler file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later): | ||
|
|
||
| <WranglerConfig> | ||
|
|
||
| ```toml title="wrangler.toml" | ||
| compatibility_flags = [ "nodejs_compat" ] | ||
| compatibility_date = "2024-09-23" | ||
| ``` | ||
|
|
||
| </WranglerConfig> | ||
|
|
||
| ## 4. Handle large audio files with chunking | ||
|
|
||
| Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to: | ||
|
|
||
| (1) Extract an audio file URL from the query parameters. | ||
|
|
||
| (2) Fetch the audio file while explicitly following redirects. | ||
|
|
||
| (3) Split the audio file into smaller chunks (such as, 1MB chunks). | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| (4) Transcribe each chunk using the Whisper-large-v3-turbo model via the Cloudflare AI binding. | ||
|
|
||
| (5) Return the aggregated transcription as plain text. | ||
|
|
||
| ```ts | ||
| import { Buffer } from "node:buffer"; | ||
| import type { Ai } from "workers-ai"; | ||
|
|
||
| export interface Env { | ||
| AI: Ai; | ||
| // If needed, add your KV namespace for storing transcripts. | ||
| // MY_KV_NAMESPACE: KVNamespace; | ||
| } | ||
|
|
||
| /** | ||
| * Fetches the audio file from the provided URL and splits it into chunks. | ||
| * This function explicitly follows redirects. | ||
| * | ||
| * @param audioUrl - The URL of the audio file. | ||
| * @returns An array of ArrayBuffers, each representing a chunk of the audio. | ||
| */ | ||
| async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> { | ||
| const response = await fetch(audioUrl, { redirect: "follow" }); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch audio: ${response.status}`); | ||
| } | ||
| const arrayBuffer = await response.arrayBuffer(); | ||
|
|
||
| // Example: Split the audio into 1MB chunks. | ||
| const chunkSize = 1024 * 1024; // 1MB | ||
| const chunks: ArrayBuffer[] = []; | ||
| for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) { | ||
| const chunk = arrayBuffer.slice(i, i + chunkSize); | ||
| chunks.push(chunk); | ||
| } | ||
| return chunks; | ||
| } | ||
|
|
||
| /** | ||
| * Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model. | ||
| * The function converts the audio chunk to a Base64-encoded string and | ||
| * sends it to the model via the AI binding. | ||
| * | ||
| * @param chunkBuffer - The audio chunk as an ArrayBuffer. | ||
| * @param env - The Cloudflare Worker environment, including the AI binding. | ||
| * @returns The transcription text from the model. | ||
| */ | ||
| async function transcribeChunk( | ||
| chunkBuffer: ArrayBuffer, | ||
| env: Env, | ||
| ): Promise<string> { | ||
| const base64 = Buffer.from(chunkBuffer, "binary").toString("base64"); | ||
| const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", { | ||
| audio: base64, | ||
| // Optional parameters (uncomment and set if needed): | ||
| // task: "transcribe", // or "translate" | ||
| // language: "en", | ||
| // vad_filter: "false", | ||
| // initial_prompt: "Provide context if needed.", | ||
| // prefix: "Transcription:", | ||
| }); | ||
| return res.text; // Assumes the transcription result includes a "text" property. | ||
| } | ||
|
|
||
| /** | ||
| * The main fetch handler. It extracts the 'url' query parameter, fetches the audio, | ||
| * processes it in chunks, and returns the full transcription. | ||
| */ | ||
| export default { | ||
| async fetch( | ||
| request: Request, | ||
| env: Env, | ||
| ctx: ExecutionContext, | ||
| ): Promise<Response> { | ||
| // Extract the audio URL from the query parameters. | ||
| const { searchParams } = new URL(request.url); | ||
| const audioUrl = searchParams.get("url"); | ||
|
|
||
| if (!audioUrl) { | ||
| return new Response("Missing 'url' query parameter", { status: 400 }); | ||
| } | ||
|
|
||
| // Get the audio chunks. | ||
| const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl); | ||
| let fullTranscript = ""; | ||
|
|
||
| // Process each chunk and build the full transcript. | ||
| for (const chunk of audioChunks) { | ||
| try { | ||
| const transcript = await transcribeChunk(chunk, env); | ||
| fullTranscript += transcript + "\n"; | ||
| } catch (error) { | ||
| fullTranscript += "[Error transcribing chunk]\n"; | ||
| } | ||
| } | ||
|
|
||
| return new Response(fullTranscript, { | ||
| headers: { "Content-Type": "text/plain" }, | ||
| }); | ||
| }, | ||
| } satisfies ExportedHandler<Env>; | ||
| ``` | ||
|
|
||
| ## 5. Deploy your Worker | ||
|
|
||
| 1. **Run the Worker locally:** | ||
|
|
||
| Use wrangler's development mode to test your Worker locally: | ||
|
|
||
| ```sh | ||
| npx wrangler dev | ||
| ``` | ||
|
|
||
| Open your browser and go to [http://localhost:8787](http://localhost:8787), or use curl: | ||
|
|
||
| ```sh | ||
| curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" | ||
| ``` | ||
|
|
||
| Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.) | ||
|
|
||
| 2. **Deploy the Worker:** | ||
|
|
||
| Once testing is complete, deploy your Worker with: | ||
|
|
||
| ```sh | ||
| npx wrangler deploy | ||
| ``` | ||
|
|
||
| 3. **Test the deployed Worker:** | ||
|
|
||
| After deployment, test your Worker by passing the audio URL as a query parameter: | ||
|
|
||
| ```sh | ||
| curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" | ||
| ``` | ||
|
|
||
| Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details. | ||
|
|
||
| If successful, the Worker will return a transcript of the audio file: | ||
|
|
||
| ```sh | ||
| This is the transcript of the audio... | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.