|
1 | | -# Audio-Scribe: On-device multimodal AI with Gemini Nano - audio transcription |
| 1 | +# Audio-Scribe: Transcribe audio messages with Chrome's multimodal Prompt API |
2 | 2 |
|
3 | | -This sample demonstrates how to use the audio transcription capabilities of the multi-modal Gemini Nano API preview. To learn more about the API and how to sign-up for the origin trial, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api). |
| 3 | +This sample demonstrates how to use Chrome's built-in AI APIs to transcribe audio messages directly in the browser. It uses: |
| 4 | + |
| 5 | +- **[Prompt API](https://developer.chrome.com/docs/extensions/ai/prompt-api)** with multimodal audio input (Gemini Nano) for on-device speech-to-text transcription |
4 | 6 |
|
5 | 7 | ## Overview |
6 | 8 |
|
7 | | -This extension adds a sidepanel that will, when opened, display a transcription of all audio files on a web page (currently it looks only for audio files created using `URL.createObjectUrl`). |
| 9 | +Audio-Scribe adds a side panel that automatically transcribes audio messages from chat applications. When activated, it: |
| 10 | + |
| 11 | +1. Monitors the page for audio blobs created via `URL.createObjectURL` |
| 12 | +2. Detects audio content and sends it to Gemini Nano for transcription |
| 13 | +3. Streams the transcribed text in real-time to the side panel |
| 14 | +4. Works with messaging apps like WhatsApp Web that use blob URLs for audio messages |
8 | 15 |
|
9 | 16 | ## Running this extension |
10 | 17 |
|
11 | 18 | 1. Clone this repository. |
12 | | -1. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). |
13 | | -1. Open the audio-scribe sidepanel by clicking the audio-scribe action or by pressing the `ALT + A` keyboard shortcut. |
14 | | -1. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the demo chat app via: |
| 19 | +2. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). |
| 20 | +3. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the included demo chat app: |
15 | 21 | ``` |
16 | 22 | npx serve demo-chat-app |
17 | 23 | ``` |
18 | | -1. All audio messages in the current chat will be transcribed in the side panel. |
| 24 | +4. Open the Audio-Scribe side panel by clicking the extension icon or pressing `Alt+A`. |
| 25 | +5. Play or load audio messages in the chat - they will be automatically transcribed in the side panel. |
19 | 26 |
|
20 | 27 |  |
0 commit comments