-
Notifications
You must be signed in to change notification settings - Fork 9k
Improve Gemini samples #1611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Gemini samples #1611
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,22 @@ | ||
| # Alt-texter: On-device multimodal AI with Gemini Nano - image understanding | ||
| # Alt Texter: Generate accessible image descriptions with Chrome's multimodal Prompt AI | ||
|
|
||
| This sample demonstrates how to use the image understanding capabilities of the multi-modal Gemini Nano API preview together with [Chrome's translation API](https://developer.chrome.com/docs/ai/translator-api). To learn more about the API and how to sign-up for the origin trial, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api). | ||
| This sample demonstrates how to use Chrome's built-in AI APIs to generate alt text for images, making web content more accessible. It combines two on-device AI capabilities: | ||
|
|
||
| - **[Prompt API](https://developer.chrome.com/docs/extensions/ai/prompt-api)** with multimodal input (Gemini Nano) for image understanding | ||
| - **[Translator API](https://developer.chrome.com/docs/ai/translator-api)** for translating descriptions into multiple languages | ||
|
|
||
| ## Overview | ||
|
|
||
| This extension adds a context menu entry for images on the web to generate an alt text description that is displayed in a popup window. | ||
| Alt Texter adds a context menu entry for images on the web. When activated, it: | ||
|
|
||
| 1. Analyzes the image using Gemini Nano's multimodal capabilities | ||
| 2. Generates a concise, functional description following accessibility best practices (object-action-context framework) | ||
| 3. Displays the description in a popup where you can optionally translate it | ||
| 4. Lets you copy the alt text to your clipboard for use elsewhere | ||
|
|
||
| ## Running this extension | ||
|
|
||
| 1. Clone this repository. | ||
| 1. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 1. Right click an image on a webpage and select "Generate alt text" | ||
| 2. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 3. Right-click an image on a webpage and select "Generate alt text". | ||
| 4. Wait for the description to be generated, then optionally translate it or copy it to your clipboard. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,20 +1,27 @@ | ||
| # Audio-Scribe: On-device multimodal AI with Gemini Nano - audio transcription | ||
| # Audio-Scribe: Transcribe audio messages with Chrome's multimodal Prompt API | ||
|
|
||
| This sample demonstrates how to use the audio transcription capabilities of the multi-modal Gemini Nano API preview. To learn more about the API and how to sign-up for the origin trial, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api). | ||
| This sample demonstrates how to use Chrome's built-in AI APIs to transcribe audio messages directly in the browser. It uses: | ||
|
|
||
| - **[Prompt API](https://developer.chrome.com/docs/extensions/ai/prompt-api)** with multimodal audio input (Gemini Nano) for on-device speech-to-text transcription | ||
|
|
||
| ## Overview | ||
|
|
||
| This extension adds a sidepanel that will, when opened, display a transcription of all audio files on a web page (currently it looks only for audio files created using `URL.createObjectUrl`). | ||
| Audio-Scribe adds a side panel that automatically transcribes audio messages from chat applications. When activated, it: | ||
|
|
||
| 1. Monitors the page for audio blobs created via `URL.createObjectURL` | ||
| 2. Detects audio content and sends it to Gemini Nano for transcription | ||
| 3. Streams the transcribed text in real-time to the side panel | ||
| 4. Works with messaging apps like WhatsApp Web that use blob URLs for audio messages | ||
|
|
||
| ## Running this extension | ||
|
|
||
| 1. Clone this repository. | ||
| 1. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 1. Open the audio-scribe sidepanel by clicking the audio-scribe action or by pressing the `ALT + A` keyboard shortcut. | ||
| 1. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the demo chat app via: | ||
| 2. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 3. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the included demo chat app: | ||
| ``` | ||
| npx serve demo-chat-app | ||
| ``` | ||
| 1. All audio messages in the current chat will be transcribed in the side panel. | ||
| 4. Open the Audio-Scribe side panel by clicking the extension icon or pressing `Alt+A`. | ||
| 5. Play or load audio messages in the chat - they will be automatically transcribed in the side panel. | ||
|
|
||
|  |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,20 +1,31 @@ | ||
| # Calendar Mate: On-device AI with Gemini Nano | ||
|
|
||
| This sample demonstrates how to use the Gemini Nano prompt API for Chrome Extensions. To learn more about the API, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api). | ||
| This sample demonstrates how to use Chrome's built-in Gemini Nano Language Model API in an extension to extract calendar event details from natural language text. To learn more about the API, see [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api). | ||
|
||
|
|
||
| ## Overview | ||
|
|
||
| The extension provides a chat interface using the Prompt API with Chrome's built-in Gemini Nano model. | ||
| Calendar Mate allows users to quickly create Google Calendar events from any selected text on a webpage. Simply highlight text describing an event (e.g., "Team meeting on Friday at 3pm in Conference Room A"), right-click, and select "Create Calendar Event". The extension uses Gemini Nano to intelligently extract: | ||
|
|
||
| - Event title | ||
| - Start and end date/time | ||
| - Location | ||
| - Description | ||
| - Timezone | ||
|
|
||
| The extracted details are used to pre-populate a new Google Calendar event. | ||
|
|
||
| ## Running this extension | ||
|
|
||
| 1. Clone this repository. | ||
| 1. Run `npm install` in the project directory. | ||
| 1. Run `npm run build` in the project directory to build the extension. | ||
| 1. Load the newly created `dist` directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 1. Click the extension icon. | ||
| 1. Interact with the Prompt API in the sidebar. | ||
| 2. Run `npm install` in the project directory. | ||
| 3. Run `npm run build` to build the extension. | ||
| 4. Load the `dist` directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked). | ||
| 5. Select any text on a webpage that describes an event. | ||
| 6. Right-click and choose "Create Calendar Event" from the context menu. | ||
|
|
||
| ## Creating your own extension | ||
| ## How it works | ||
|
|
||
| If you use this sample as the foundation for your own extension, be sure to update the `"trial_tokens"` field [with your own origin trial token](https://developer.chrome.com/docs/web-platform/origin-trials#extensions) and to remove the `"key"` field in `manifest.json`. | ||
| 1. **Context Menu**: The extension adds a "Create Calendar Event" option to Chrome's right-click context menu when text is selected. | ||
| 2. **AI Extraction**: When triggered, the selected text is sent to Gemini Nano with a prompt to extract event details as structured JSON. | ||
| 3. **Date Parsing**: The extracted date/time strings are parsed using the [any-date-parser](https://www.npmjs.com/package/any-date-parser) library. | ||
| 4. **Calendar Integration**: A Google Calendar URL is generated with the extracted details and opened in a new tab. | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.