-
Notifications
You must be signed in to change notification settings - Fork 10k
[Workers AI]Whisper-tutorial #21372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Workers AI]Whisper-tutorial #21372
Changes from 5 commits
3da1424
85ade05
d554465
e32ee0a
addff6a
54c299b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,226 @@ | ||
| --- | ||
| updated: 2025-04-03 | ||
| difficulty: Beginner | ||
| pcx_content_type: tutorial | ||
| title: Whisper-large-v3-turbo with Cloudflare Workers AI | ||
| tags: | ||
| - AI | ||
| --- | ||
|
|
||
| In this tutorial you will learn how to: | ||
|
|
||
| - **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation. | ||
| - **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations. | ||
| - **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment. | ||
|
|
||
| ## 1: Create a new Cloudflare Worker project | ||
|
|
||
| import { Render, PackageManagers, WranglerConfig } from "~/components"; | ||
|
|
||
| <Render file="prereqs" product="workers" /> | ||
|
|
||
| You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare. | ||
|
|
||
| Create a new project named `whisper-tutorial` by running: | ||
|
|
||
| <PackageManagers type="create" pkg="cloudflare@latest" args={"whisper-tutorial"} /> | ||
|
|
||
| Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI. | ||
|
|
||
| <Render | ||
| file="c3-post-run-steps" | ||
| product="workers" | ||
| params={{ | ||
| category: "hello-world", | ||
| type: "Worker only", | ||
| lang: "TypeScript", | ||
| }} | ||
| /> | ||
|
|
||
| This will create a new `whisper-tutorial` directory. Your new `whisper-tutorial` directory will include: | ||
|
|
||
| - A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`. | ||
| - A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. | ||
|
|
||
| Go to your application directory: | ||
|
|
||
| ```sh | ||
| cd whisper-tutorial | ||
| ``` | ||
|
|
||
| ## 2. Connect your Worker to Workers AI | ||
|
|
||
| You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. | ||
|
|
||
| To bind Workers AI to your Worker, add the following to the end of your Wrangler file: | ||
|
|
||
| <WranglerConfig> | ||
|
|
||
| ```toml | ||
| [ai] | ||
| binding = "AI" | ||
| ``` | ||
|
|
||
| </WranglerConfig> | ||
|
|
||
| Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/). | ||
|
|
||
| ## 3. Configure Wrangler | ||
|
|
||
| In your wrangler file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later): | ||
|
|
||
| <WranglerConfig> | ||
|
|
||
| ```toml title="wrangler.toml" | ||
| compatibility_flags = [ "nodejs_compat" ] | ||
| compatibility_date = "2024-09-23" | ||
| ``` | ||
|
|
||
| </WranglerConfig> | ||
|
|
||
| ## 4. Handle large audio files with chunking | ||
|
|
||
| Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to: | ||
|
|
||
| - Extract an audio file URL from the query parameters. | ||
| - Fetch the audio file while explicitly following redirects. | ||
| - Split the audio file into smaller chunks (such as, 1MB chunks). | ||
| - Transcribe each chunk using the Whisper-large-v3-turbo model via the Cloudflare AI binding. | ||
| - Return the aggregated transcription as plain text. | ||
|
|
||
| ```ts | ||
|
|
||
| import { Buffer } from "node:buffer"; | ||
| import type { Ai } from "workers-ai"; | ||
|
|
||
| export interface Env { | ||
| AI: Ai; | ||
| // If needed, add your KV namespace for storing transcripts. | ||
| // MY_KV_NAMESPACE: KVNamespace; | ||
| } | ||
|
|
||
| /** | ||
| * Fetches the audio file from the provided URL and splits it into chunks. | ||
| * This function explicitly follows redirects. | ||
| * | ||
| * @param audioUrl - The URL of the audio file. | ||
| * @returns An array of ArrayBuffers, each representing a chunk of the audio. | ||
| */ | ||
| async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> { | ||
| const response = await fetch(audioUrl, { redirect: "follow" }); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch audio: ${response.status}`); | ||
| } | ||
| const arrayBuffer = await response.arrayBuffer(); | ||
|
|
||
| // Example: Split the audio into 1MB chunks. | ||
| const chunkSize = 1024 * 1024; // 1MB | ||
| const chunks: ArrayBuffer[] = []; | ||
| for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) { | ||
| const chunk = arrayBuffer.slice(i, i + chunkSize); | ||
| chunks.push(chunk); | ||
| } | ||
| return chunks; | ||
| } | ||
|
|
||
| /** | ||
| * Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model. | ||
| * The function converts the audio chunk to a Base64-encoded string and | ||
| * sends it to the model via the AI binding. | ||
| * | ||
| * @param chunkBuffer - The audio chunk as an ArrayBuffer. | ||
| * @param env - The Cloudflare Worker environment, including the AI binding. | ||
| * @returns The transcription text from the model. | ||
| */ | ||
| async function transcribeChunk(chunkBuffer: ArrayBuffer, env: Env): Promise<string> { | ||
| const base64 = Buffer.from(chunkBuffer, "binary").toString("base64"); | ||
| const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", { | ||
| audio: base64, | ||
| // Optional parameters (uncomment and set if needed): | ||
| // task: "transcribe", // or "translate" | ||
| // language: "en", | ||
| // vad_filter: "false", | ||
| // initial_prompt: "Provide context if needed.", | ||
| // prefix: "Transcription:", | ||
| }); | ||
| return res.text; // Assumes the transcription result includes a "text" property. | ||
| } | ||
|
|
||
| /** | ||
| * The main fetch handler. It extracts the 'url' query parameter, fetches the audio, | ||
| * processes it in chunks, and returns the full transcription. | ||
| */ | ||
| export default { | ||
| async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> { | ||
| // Extract the audio URL from the query parameters. | ||
| const { searchParams } = new URL(request.url); | ||
| const audioUrl = searchParams.get("url"); | ||
|
|
||
| if (!audioUrl) { | ||
| return new Response("Missing 'url' query parameter", { status: 400 }); | ||
| } | ||
|
|
||
| // Get the audio chunks. | ||
| const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl); | ||
| let fullTranscript = ""; | ||
|
|
||
| // Process each chunk and build the full transcript. | ||
| for (const chunk of audioChunks) { | ||
| try { | ||
| const transcript = await transcribeChunk(chunk, env); | ||
| fullTranscript += transcript + "\n"; | ||
| } catch (error) { | ||
| fullTranscript += "[Error transcribing chunk]\n"; | ||
| } | ||
| } | ||
|
|
||
| return new Response(fullTranscript, { | ||
| headers: { "Content-Type": "text/plain" }, | ||
| }); | ||
| }, | ||
| } satisfies ExportedHandler<Env>; | ||
| ``` | ||
|
|
||
| ## 5. Develop, test, and deploy | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 1. **Run the Worker locally:** | ||
|
|
||
| Use wrangler's development mode to test your Worker locally: | ||
|
|
||
| ```sh | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain the thought behind using the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I f you want to run the worker locally https://batch-api.preview.developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/#2-develop-with-wrangler-cli |
||
| npx wrangler dev --remote | ||
| ``` | ||
|
|
||
| Open your browser and go to [http://localhost:8787](http://localhost:8787), or use curl: | ||
|
|
||
| ```sh | ||
| curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" | ||
| ``` | ||
|
|
||
| Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.) | ||
|
|
||
|
|
||
|
|
||
| 2. **Deploy the Worker:** | ||
|
|
||
| Once testing is complete, deploy your Worker with: | ||
|
|
||
| ```sh | ||
| npx wrangler deploy | ||
| ``` | ||
|
|
||
| 3. **Test the deployed Worker:** | ||
|
|
||
| After deployment, test your Worker by passing the audio URL as a query parameter: | ||
|
|
||
| ```sh | ||
| curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3" | ||
| ``` | ||
|
|
||
| Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details. | ||
|
|
||
| If successful, the Worker will return a transcript of the audio file: | ||
|
|
||
| ```sh | ||
| This is the transcript of the audio... | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.