-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add Gemini CUA template #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dprevoznik
merged 7 commits into
main
from
tembo/kernel-429-create-kernel-app-add-gemini-computer-use-example-to-create
Oct 21, 2025
Merged
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
cc51343
feat(create-kernel-app): Add Gemini CUA template
tembo[bot] 3fa6bef
fix(gemini-cua): remove redundant comments in runStagehandTask function
tembo[bot] 92a36ff
Update Clean Up Process
dprevoznik 52959c1
Remove unused variable
dprevoznik 37d81b7
Make browserOptions more re-usable
dprevoznik 0315e05
Update dependencies
dprevoznik e570dea
Merge branch 'main' into tembo/kernel-429-create-kernel-app-add-gemin…
dprevoznik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Kernel TypeScript SDK + Stagehand + Gemini Computer Use Agent | ||
|
|
||
| A Kernel application that demonstrates Computer Use Agent (CUA) capabilities using Google's Gemini 2.5 model with Stagehand for browser automation. | ||
|
|
||
| ## What It Does | ||
|
|
||
| This app uses [Gemini 2.5's computer use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) capabilities to autonomously navigate websites and complete tasks. The example task searches for Kernel's company page on YCombinator and writes a blog post about their product. | ||
|
|
||
| ## Setup | ||
|
|
||
| 1. **Add your API keys as environment variables:** | ||
| - `KERNEL_API_KEY` - Get from [Kernel dashboard](https://dashboard.onkernel.com/sign-in) | ||
| - `GOOGLE_API_KEY` - Get from [Google AI Studio](https://aistudio.google.com/apikey) | ||
| - `OPENAI_API_KEY` - Get from [OpenAI platform](https://platform.openai.com/api-keys) | ||
|
|
||
| ## Running Locally | ||
|
|
||
| Execute the script directly with tsx: | ||
|
|
||
| ```bash | ||
| npx tsx index.ts | ||
| ``` | ||
|
|
||
| This runs the agent without a Kernel invocation context and provides the browser live view URL for debugging. | ||
|
|
||
| ## Deploying to Kernel | ||
|
|
||
| 1. **Deploy the application:** | ||
| ```bash | ||
| kernel deploy index.ts --env GOOGLE_API_KEY=XXX --env OPENAI_API_KEY=XXX | ||
| ``` | ||
|
|
||
| 2. **Invoke the action:** | ||
| ```bash | ||
| kernel invoke ts-gemini-cua gemini-cua-task | ||
| ``` | ||
|
|
||
| The action creates a Kernel-managed browser and associates it with the invocation for tracking and monitoring. | ||
|
|
||
| ## Documentation | ||
|
|
||
| - [Kernel Documentation](https://docs.onkernel.com/quickstart) | ||
| - [Kernel Stagehand Guide](https://www.onkernel.com/docs/integrations/stagehand) | ||
| - [Gemini 2.5 Computer Use](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Dependencies | ||
| node_modules/ | ||
| package-lock.json | ||
|
|
||
| # TypeScript | ||
| *.tsbuildinfo | ||
| dist/ | ||
| build/ | ||
|
|
||
| # Environment | ||
| .env | ||
| .env.local | ||
| .env.*.local | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db | ||
|
|
||
| # Logs | ||
| logs/ | ||
| *.log | ||
| npm-debug.log* | ||
| yarn-debug.log* | ||
| yarn-error.log* | ||
|
|
||
| # Testing | ||
| coverage/ | ||
| .nyc_output/ | ||
|
|
||
| # Misc | ||
| .cache/ | ||
| .temp/ | ||
| .tmp/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| import { Stagehand } from "@browserbasehq/stagehand"; | ||
| import client, { Kernel, type KernelContext } from '@onkernel/sdk'; | ||
|
|
||
| const kernel = new Kernel({ | ||
| apiKey: process.env.KERNEL_API_KEY | ||
| }); | ||
|
|
||
| const app = kernel.app('ts-gemini-cua'); | ||
|
|
||
| interface SearchQueryOutput { | ||
| success: boolean; | ||
| result: string; | ||
| } | ||
|
|
||
| // API Keys for LLM providers | ||
| // - GOOGLE_API_KEY: Required for Gemini 2.5 Computer Use Agent | ||
| // - OPENAI_API_KEY: Required for Stagehand's GPT-4o model | ||
| // Set via environment variables or `kernel deploy <filename> --env-file .env` | ||
| // See https://docs.onkernel.com/launch/deploy#environment-variables | ||
| const GOOGLE_API_KEY = process.env.GOOGLE_API_KEY; | ||
| const OPENAI_API_KEY = process.env.OPENAI_API_KEY; | ||
|
|
||
| if (!OPENAI_API_KEY) { | ||
| throw new Error('OPENAI_API_KEY is not set'); | ||
| } | ||
|
|
||
| if (!GOOGLE_API_KEY) { | ||
| throw new Error('GOOGLE_API_KEY is not set'); | ||
| } | ||
|
|
||
| async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutput> { | ||
| // Executes a Computer Use Agent (CUA) task using Gemini 2.5 and Stagehand | ||
| // | ||
| // This function supports dual execution modes: | ||
| // - Action Handler Mode: Called with invocation_id from Kernel app action context | ||
| // - Local Mode: Called without invocation_id for direct script execution | ||
| // | ||
| // Args: | ||
| // invocationId: Optional Kernel invocation ID to associate browser with action | ||
| // | ||
| // App Actions Returns: | ||
| // SearchQueryOutput: Success status and result message from the agent | ||
| // Local Execution Returns: | ||
| // Logs the result of the agent execution | ||
|
|
||
| const browserOptions = invocationId | ||
| ? { invocation_id: invocationId, stealth: true } | ||
| : { stealth: true }; | ||
|
|
||
| const kernelBrowser = await kernel.browsers.create(browserOptions); | ||
|
|
||
| console.log("Kernel browser live view url: ", kernelBrowser.browser_live_view_url); | ||
|
|
||
| const stagehand = new Stagehand({ | ||
| env: "LOCAL", | ||
| verbose: 1, | ||
| domSettleTimeoutMs: 30_000, | ||
| modelName: "gpt-4o", | ||
| modelClientOptions: { | ||
| apiKey: OPENAI_API_KEY | ||
| }, | ||
| localBrowserLaunchOptions: { | ||
| cdpUrl: kernelBrowser.cdp_ws_url | ||
| } | ||
| }); | ||
| await stagehand.init(); | ||
|
|
||
| ///////////////////////////////////// | ||
| // Your Stagehand implementation here | ||
| ///////////////////////////////////// | ||
| try { | ||
| const page = stagehand.page; | ||
|
|
||
| const agent = stagehand.agent({ | ||
| provider: "google", | ||
| model: "gemini-2.5-computer-use-preview-10-2025", | ||
| instructions: `You are a helpful assistant that can use a web browser. | ||
| You are currently on the following page: ${page.url()}. | ||
| Do not ask follow up questions, the user will trust your judgement.`, | ||
| options: { | ||
| apiKey: GOOGLE_API_KEY, | ||
| } | ||
| }); | ||
|
|
||
| // Navigate to YCombinator's website | ||
| await page.goto("https://www.ycombinator.com/companies"); | ||
|
|
||
| // Define the instructions for the CUA agent | ||
| const instruction = "Find Kernel's company page on the YCombinator website and write a blog post about their product offering."; | ||
|
|
||
| // Execute the instruction | ||
| const result = await agent.execute({ | ||
| instruction, | ||
| maxSteps: 20, | ||
| }); | ||
|
|
||
| console.log("result: ", result); | ||
|
|
||
| console.log("Deleting browser and closing stagehand..."); | ||
| await stagehand.close(); | ||
| await kernel.browsers.deleteByID(kernelBrowser.session_id); | ||
| return { success: true, result: result.message }; | ||
| } catch (error) { | ||
| console.error(error); | ||
| console.log("Deleting browser and closing stagehand..."); | ||
| await stagehand.close(); | ||
| await kernel.browsers.deleteByID(kernelBrowser.session_id); | ||
| return { success: false, result: "" }; | ||
| } | ||
| } | ||
|
|
||
| // Register Kernel action handler for remote invocation | ||
| // Invoked via: kernel invoke ts-gemini-cua gemini-cua-task | ||
| app.action<void, SearchQueryOutput>( | ||
| 'gemini-cua-task', | ||
| async (ctx: KernelContext): Promise<SearchQueryOutput> => { | ||
| return runStagehandTask(ctx.invocation_id); | ||
| }, | ||
| ); | ||
|
|
||
| // Run locally if executed directly (not imported as a module) | ||
| // Execute via: npx tsx index.ts | ||
| if (import.meta.url === `file://${process.argv[1]}`) { | ||
| runStagehandTask().then(result => { | ||
| console.log('Local execution result:', result); | ||
| process.exit(result.success ? 0 : 1); | ||
| }).catch(error => { | ||
| console.error('Local execution failed:', error); | ||
| process.exit(1); | ||
| }); | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| { | ||
| "name": "ts-gemini-cua", | ||
| "module": "index.ts", | ||
| "type": "module", | ||
| "private": true, | ||
| "peerDependencies": { | ||
| "typescript": "^5" | ||
| }, | ||
| "dependencies": { | ||
| "@browserbasehq/stagehand": "^2.5.2", | ||
| "@onkernel/sdk": "^0.14.0", | ||
| "zod": "^3.25.7" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| { | ||
| "compilerOptions": { | ||
| "lib": ["ESNext", "DOM"], | ||
| "target": "ESNext", | ||
| "module": "ESNext", | ||
| "moduleDetection": "force", | ||
| "jsx": "react-jsx", | ||
| "allowJs": true, | ||
| "moduleResolution": "bundler", | ||
| "allowImportingTsExtensions": true, | ||
| "verbatimModuleSyntax": true, | ||
| "noEmit": true, | ||
| "strict": true, | ||
| "skipLibCheck": true, | ||
| "noFallthroughCasesInSwitch": true, | ||
| "noUncheckedIndexedAccess": true, | ||
| "noUnusedLocals": false, | ||
| "noUnusedParameters": false, | ||
| "noPropertyAccessFromIndexSignature": false | ||
| }, | ||
| "include": ["./**/*.ts", "./**/*.tsx"], | ||
| "exclude": ["node_modules", "dist"] | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tembo this section of comments is tool long. Can you leave the top comment in and remove "This function..." through "...Logs the result of the agent execution?"