Skip to content

Commit b1a3793

Browse files
authored
Danny/kernel 716 cli gemini cua template action broken (#58)
## Fix: Update Gemini Computer Use template for Stagehand v3 compatibility ### Description Updates the Gemini Computer Use template to work with Stagehand v3 API, which introduced breaking changes from v2. ### Changes #### 🔧 Stagehand v3 Migration (`index.ts`) - **Removed OpenAI dependency**: The template now only requires `GOOGLE_API_KEY` (Stagehand v3 no longer requires a separate OpenAI key for its internal operations) - **Updated Stagehand initialization**: - `domSettleTimeoutMs` → `domSettleTimeout` - Removed `modelName` and `modelClientOptions` from constructor - **Updated page access**: `stagehand.page` → `stagehand.context.pages()[0]` - **Updated agent configuration** to v3 format: // v2 (old) stagehand.agent({ provider: "google", model: "gemini-2.5-...", instructions: "...", options: { apiKey: ... } }) // v3 (new) stagehand.agent({ cua: true, model: { modelName: "google/gemini-2.5-...", apiKey: ... }, systemPrompt: "..." }) #### ✨ Template Improvements - Made task configurable via `startingUrl` and `instruction` payload parameters - Changed default example to use magnitasks.com Kanban board demo (more reliable than YCombinator search) - Added `CuaTaskInput` interface for typed payload support #### 📚 Documentation Updates - Removed `OPENAI_API_KEY` from setup requirements - Added "Alternative Model Providers" section showing how to switch to OpenAI or Anthropic - Improved deployment instructions to use `--env-file .env` approach - Updated `.env.example` to remove unnecessary OpenAI key ### Testing - [x] `make build` passes - [x] `make test` passes - [x] `/qa` cursor command passes ### Related Fixes KERNEL-716 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is generating a summary for commit 5678806. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent b0321ad commit b1a3793

File tree

5 files changed

+61
-44
lines changed

5 files changed

+61
-44
lines changed

.cursor/commands/qa.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Here are all valid language + template combinations:
5656
| typescript | anthropic-computer-use | ts-anthropic-cua | ts-anthropic-cua | Yes | ANTHROPIC_API_KEY |
5757
| typescript | magnitude | ts-magnitude | ts-magnitude | Yes | ANTHROPIC_API_KEY |
5858
| typescript | openai-computer-use | ts-openai-cua | ts-openai-cua | Yes | OPENAI_API_KEY |
59-
| typescript | gemini-computer-use | ts-gemini-cua | ts-gemini-cua | Yes | GOOGLE_API_KEY, OPENAI_API_KEY |
59+
| typescript | gemini-computer-use | ts-gemini-cua | ts-gemini-cua | Yes | GOOGLE_API_KEY |
6060
| python | sample-app | py-sample-app | python-basic | No | - |
6161
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
6262
| python | browser-use | py-browser-use | python-bu | Yes | OPENAI_API_KEY |
@@ -154,14 +154,11 @@ echo "OPENAI_API_KEY=<value from human>" > .env
154154
cd ..
155155
```
156156

157-
**ts-gemini-cua** (needs GOOGLE_API_KEY and OPENAI_API_KEY):
157+
**ts-gemini-cua** (needs GOOGLE_API_KEY):
158158

159159
```bash
160160
cd ts-gemini-cua
161-
cat > .env << EOF
162-
GOOGLE_API_KEY=<value from human>
163-
OPENAI_API_KEY=<value from human>
164-
EOF
161+
echo "GOOGLE_API_KEY=<value from human>" > .env
165162
../bin/kernel deploy index.ts --env-file .env
166163
cd ..
167164
```
@@ -214,7 +211,7 @@ kernel invoke ts-stagehand teamsize-task --payload '{"company": "Kernel"}'
214211
kernel invoke ts-anthropic-cua cua-task --payload '{"query": "Return the first url of a search result for NYC restaurant reviews Pete Wells"}'
215212
kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipedia.org/wiki/Special:Random"}'
216213
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
217-
kernel invoke ts-gemini-cua gemini-cua-task
214+
kernel invoke ts-gemini-cua gemini-cua-task --payload '{"startingUrl": "https://www.magnitasks.com/", "instruction": "Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board? You are done successfully when the items are moved."}'
218215

219216
# Python apps
220217
kernel invoke python-basic get-page-title --payload '{"url": "https://www.google.com"}'
@@ -232,8 +229,6 @@ kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navig
232229
If the human agrees, invoke each template and collect results. Present findings in this format:
233230

234231
### Testing Guidelines
235-
236-
- **Timeout:** Cancel each invocation after 90 seconds if it has not completed. Mark the status as `TIMEOUT` in the results table.
237232
- **Parallel execution:** You may run multiple invocations in parallel to speed up testing.
238233
- **Error handling:** Capture any runtime errors and include them in the Notes column.
239234

@@ -258,7 +253,6 @@ If the human agrees, invoke each template and collect results. Present findings
258253
Status values:
259254
- **SUCCESS**: App started and returned a result
260255
- **FAILED**: App encountered a runtime error
261-
- **TIMEOUT**: App did not complete within 90 seconds (cancelled)
262256

263257
Notes should include brief error messages for failures or confirmation of successful output.
264258

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ Create an API key from the [Kernel dashboard](https://dashboard.onkernel.com).
121121
- `browser-use` - Template with Browser Use SDK (Python only)
122122
- `anthropic-computer-use` - Anthropic Computer Use prompt loop
123123
- `openai-computer-use` - OpenAI Computer Use Agent sample
124-
- `gemini-computer-use` - Gemini Computer Use Agent sample (TypeScript only)
124+
- `gemini-computer-use` - Implements a Gemini computer use agent (TypeScript only)
125125
- `openagi-computer-use` - OpenAGI Lux computer-use models (Python only)
126126
- `magnitude` - Magnitude framework sample (TypeScript only)
127127

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
11
# Copy this file to .env and fill in your API keys
22
GOOGLE_API_KEY=your_google_api_key_here
3-
OPENAI_API_KEY=your_openai_api_key_here

pkg/templates/typescript/gemini-computer-use/README.md

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,13 @@ A Kernel application that demonstrates Computer Use Agent (CUA) capabilities usi
44

55
## What It Does
66

7-
This app uses [Gemini 2.5's computer use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) capabilities to autonomously navigate websites and complete tasks. The example task searches for Kernel's company page on YCombinator and writes a blog post about their product.
7+
This app uses [Gemini 2.5's computer use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) capabilities to autonomously navigate websites and complete tasks. The agent can interact with web pages just like a human would - clicking, typing, scrolling, and extracting information.
88

99
## Setup
1010

1111
1. **Add your API keys as environment variables:**
1212
- `KERNEL_API_KEY` - Get from [Kernel dashboard](https://dashboard.onkernel.com/sign-in)
1313
- `GOOGLE_API_KEY` - Get from [Google AI Studio](https://aistudio.google.com/apikey)
14-
- `OPENAI_API_KEY` - Get from [OpenAI platform](https://platform.openai.com/api-keys)
1514

1615
## Running Locally
1716

@@ -25,9 +24,10 @@ This runs the agent without a Kernel invocation context and provides the browser
2524

2625
## Deploying to Kernel
2726

28-
1. **Deploy the application:**
27+
1. **Copy the example env file, add your API keys, and deploy:**
2928
```bash
30-
kernel deploy index.ts --env GOOGLE_API_KEY=XXX --env OPENAI_API_KEY=XXX
29+
cp .example.env .env
30+
kernel deploy index.ts --env-file .env
3131
```
3232

3333
2. **Invoke the action:**
@@ -37,6 +37,30 @@ This runs the agent without a Kernel invocation context and provides the browser
3737

3838
The action creates a Kernel-managed browser and associates it with the invocation for tracking and monitoring.
3939

40+
## Alternative Model Providers
41+
42+
Stagehand's CUA agent supports multiple model providers. You can switch from Gemini to OpenAI or Anthropic by changing the model configuration in `index.ts` and redeploying your Kernel app:
43+
44+
**OpenAI Computer Use:**
45+
```typescript
46+
model: {
47+
modelName: "openai/computer-use-preview",
48+
apiKey: process.env.OPENAI_API_KEY
49+
}
50+
```
51+
52+
**Anthropic Claude Sonnet:**
53+
```typescript
54+
model: {
55+
modelName: "anthropic/claude-sonnet-4-20250514",
56+
apiKey: process.env.ANTHROPIC_API_KEY
57+
}
58+
```
59+
60+
When using alternative providers, make sure to:
61+
1. Add the corresponding API key to your environment variables
62+
2. Update the deploy command to include the new API key (e.g., `--env OPENAI_API_KEY=XXX`)
63+
4064
## Documentation
4165

4266
- [Kernel Documentation](https://docs.onkernel.com/quickstart)

pkg/templates/typescript/gemini-computer-use/index.ts

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,32 @@ const kernel = new Kernel({
77

88
const app = kernel.app('ts-gemini-cua');
99

10+
interface CuaTaskInput {
11+
startingUrl?: string;
12+
instruction?: string;
13+
}
14+
1015
interface SearchQueryOutput {
1116
success: boolean;
1217
result: string;
1318
error?: string;
1419
}
1520

16-
// API Keys for LLM providers
21+
// API Key for LLM provider
1722
// - GOOGLE_API_KEY: Required for Gemini 2.5 Computer Use Agent
18-
// - OPENAI_API_KEY: Required for Stagehand's GPT-4o model
1923
// Set via environment variables or `kernel deploy <filename> --env-file .env`
2024
// See https://docs.onkernel.com/launch/deploy#environment-variables
2125
const GOOGLE_API_KEY = process.env.GOOGLE_API_KEY;
22-
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
23-
24-
if (!OPENAI_API_KEY) {
25-
throw new Error('OPENAI_API_KEY is not set');
26-
}
2726

2827
if (!GOOGLE_API_KEY) {
2928
throw new Error('GOOGLE_API_KEY is not set');
3029
}
3130

32-
async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutput> {
31+
async function runStagehandTask(
32+
invocationId?: string,
33+
startingUrl: string = "https://www.magnitasks.com/",
34+
instruction: string = "Click the Tasks option in the left-side bar, and move the 5 items in the 'To Do' and 'In Progress' items to the 'Done' section of the Kanban board? You are done successfully when the items are moved."
35+
): Promise<SearchQueryOutput> {
3336
// Executes a Computer Use Agent (CUA) task using Gemini 2.5 and Stagehand
3437

3538
const browserOptions = {
@@ -49,11 +52,7 @@ async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutpu
4952
const stagehand = new Stagehand({
5053
env: "LOCAL",
5154
verbose: 1,
52-
domSettleTimeoutMs: 30_000,
53-
modelName: "gpt-4o",
54-
modelClientOptions: {
55-
apiKey: OPENAI_API_KEY
56-
},
55+
domSettleTimeout: 30_000,
5756
localBrowserLaunchOptions: {
5857
cdpUrl: kernelBrowser.cdp_ws_url
5958
}
@@ -64,24 +63,21 @@ async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutpu
6463
// Your Stagehand implementation here
6564
/////////////////////////////////////
6665
try {
67-
const page = stagehand.page;
66+
const page = stagehand.context.pages()[0];
6867

6968
const agent = stagehand.agent({
70-
provider: "google",
71-
model: "gemini-2.5-computer-use-preview-10-2025",
72-
instructions: `You are a helpful assistant that can use a web browser.
69+
cua: true,
70+
model: {
71+
modelName: "google/gemini-2.5-computer-use-preview-10-2025",
72+
apiKey: GOOGLE_API_KEY,
73+
},
74+
systemPrompt: `You are a helpful assistant that can use a web browser.
7375
You are currently on the following page: ${page.url()}.
7476
Do not ask follow up questions, the user will trust your judgement.`,
75-
options: {
76-
apiKey: GOOGLE_API_KEY,
77-
}
7877
});
7978

80-
// Navigate to YCombinator's website
81-
await page.goto("https://www.ycombinator.com/companies");
82-
83-
// Define the instructions for the CUA agent
84-
const instruction = "Find Kernel's company page on the YCombinator website and write a blog post about their product offering.";
79+
// Navigate to the starting website
80+
await page.goto(startingUrl);
8581

8682
// Execute the instruction
8783
const result = await agent.execute({
@@ -105,10 +101,14 @@ async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutpu
105101

106102
// Register Kernel action handler for remote invocation
107103
// Invoked via: kernel invoke ts-gemini-cua gemini-cua-task
108-
app.action<void, SearchQueryOutput>(
104+
app.action<CuaTaskInput, SearchQueryOutput>(
109105
'gemini-cua-task',
110-
async (ctx: KernelContext): Promise<SearchQueryOutput> => {
111-
return runStagehandTask(ctx.invocation_id);
106+
async (ctx: KernelContext, payload?: CuaTaskInput): Promise<SearchQueryOutput> => {
107+
return runStagehandTask(
108+
ctx.invocation_id,
109+
payload?.startingUrl,
110+
payload?.instruction
111+
);
112112
},
113113
);
114114

0 commit comments

Comments
 (0)