Skip to content

Commit 6966201

Browse files
tkattkatsameelarifgreptile-apps[bot]miguelg719
authored
replace operator agent with base of new agent (#1014)
# Why Replace operator agent with new agent handler The operator agent was an older implementation that did not use tool calling and used a single model for both high-level reasoning and low-level action execution. # What Changed - **Removed operator agent** (`StagehandOperatorHandler`) - **Added new agent handler** (`StagehandAgentHandler`) - Leverages AI SDK for proper tool call handling - **New `executionModel` option** for dual-model architecture - Better error handling and retry mechanisms - Structured tool system with Zod schema validation - **ExecutionModel feature:** - Use a powerful model (like claude 4 sonnet) for reasoning and planning - Use a faster model (like gemini 2.0 flash) for Stagehand operations like `act()` and `extract()` - Enables cost and performance optimization # Test Plan - Tested locally with various agent tasks - Verified backward compatibility - Tested dual-model execution with different model combinations - Installed package from branch, for additional local testing to catch any additional edge cases --------- Co-authored-by: Sameel <[email protected]> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: miguel <[email protected]>
1 parent 1788ee9 commit 6966201

28 files changed

+915
-649
lines changed

.changeset/pink-snakes-sneeze.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@browserbasehq/stagehand": patch
3+
---
4+
5+
Replace operator handler with base of new agent

.changeset/tired-cats-repeat.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@browserbasehq/stagehand": patch
3+
---
4+
5+
replace operator agent with scaffold for new stagehand agent

docs/basics/agent.mdx

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,11 @@ agent.execute("apply for a job at browserbase")
2626

2727
## Using `agent()`
2828

29-
Here is how you can use `agent()` to create an agent.
29+
There are two ways to create agents in Stagehand:
30+
31+
### Computer Use Agents
32+
33+
Use computer use agents with specialized models from OpenAI or Anthropic:
3034

3135
<CodeGroup>
3236
```typescript TypeScript
@@ -54,6 +58,18 @@ await agent.execute("apply for a job at Browserbase")
5458
```
5559
</CodeGroup>
5660

61+
### Use Stagehand Agent with Any LLM
62+
63+
Use the agent without specifying a provider to utilize any model or LLM provider:
64+
65+
<Note>Non CUA agents are currently only supported in TypeScript</Note>
66+
67+
```typescript TypeScript
68+
const agent = stagehand.agent();
69+
await agent.execute("apply for a job at Browserbase")
70+
```
71+
72+
5773
## MCP Integrations
5874

5975
Agents can be enhanced with external tools and services through MCP (Model Context Protocol) integrations. This allows your agent to access external APIs and data sources beyond just browser interactions.

evals/index.eval.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ import { CustomOpenAIClient } from "@/examples/external_clients/customOpenAI";
3333
import OpenAI from "openai";
3434
import { initStagehand } from "./initStagehand";
3535
import { AgentProvider } from "@/lib/agent/AgentProvider";
36-
import { AISdkClient } from "@/examples/external_clients/aisdk";
36+
import { AISdkClient } from "@/lib/llm/aisdk";
3737
import { getAISDKLanguageModel } from "@/lib/llm/LLMProvider";
3838
import { loadApiKeyFromEnv } from "@/lib/utils";
3939
import { LogLine } from "@/types/log";

evals/initStagehand.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,11 @@ export const initStagehand = async ({
114114
model: modelName,
115115
provider: modelName.startsWith("claude") ? "anthropic" : "openai",
116116
} as AgentConfig;
117+
} else {
118+
agentConfig = {
119+
model: modelName,
120+
executionModel: "google/gemini-2.5-flash",
121+
} as AgentConfig;
117122
}
118123

119124
const agent = stagehand.agent(agentConfig);

evals/taskConfig.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,11 @@ const DEFAULT_EVAL_MODELS = process.env.EVAL_MODELS
106106

107107
const DEFAULT_AGENT_MODELS = process.env.EVAL_AGENT_MODELS
108108
? process.env.EVAL_AGENT_MODELS.split(",")
109-
: ["computer-use-preview-2025-03-11", "claude-sonnet-4-20250514"];
109+
: [
110+
"computer-use-preview-2025-03-11",
111+
"claude-sonnet-4-20250514",
112+
"anthropic/claude-sonnet-4-20250514",
113+
];
110114

111115
/**
112116
* getModelList:

evals/tasks/agent/sf_library_card.ts

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,15 @@ export const sf_library_card: EvalFunction = async ({
1010
}) => {
1111
try {
1212
await stagehand.page.goto("https://sflib1.sfpl.org/selfreg");
13-
1413
const agentResult = await agent.execute({
15-
instruction:
16-
"Fill in the 'Residential Address' field with '166 Geary St'",
14+
instruction: "Fill in the 'street Address' field with '166 Geary St'",
1715
maxSteps: Number(process.env.AGENT_EVAL_MAX_STEPS) || 3,
1816
});
1917
logger.log(agentResult);
20-
21-
await stagehand.page.mouse.wheel(0, -1000);
2218
const evaluator = new Evaluator(stagehand);
2319
const result = await evaluator.ask({
2420
question:
25-
"Does the page show the 'Residential Address' field filled with '166 Geary St'?",
21+
"Does the page show the 'street Address' field filled with '166 Geary St'?",
2622
});
2723

2824
if (result.evaluation !== "YES" && result.evaluation !== "NO") {

lib/agent/tools/act.ts

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import { tool } from "ai";
2+
import { z } from "zod/v3";
3+
import { StagehandPage } from "../../StagehandPage";
4+
5+
export const createActTool = (
6+
stagehandPage: StagehandPage,
7+
executionModel?: string,
8+
) =>
9+
tool({
10+
description: "Perform an action on the page (click, type)",
11+
parameters: z.object({
12+
action: z.string()
13+
.describe(`Describe what to click, or type within in a short, specific phrase that mentions the element type.
14+
Examples:
15+
- "click the Login button"
16+
- "click the language dropdown"
17+
- type "John" into the first name input
18+
- type "Doe" into the last name input`),
19+
}),
20+
execute: async ({ action }) => {
21+
try {
22+
let result;
23+
if (executionModel) {
24+
result = await stagehandPage.page.act({
25+
action,
26+
modelName: executionModel,
27+
});
28+
} else {
29+
result = await stagehandPage.page.act(action);
30+
}
31+
const isIframeAction = result.action === "an iframe";
32+
33+
if (isIframeAction) {
34+
const fallback = await stagehandPage.page.act(
35+
executionModel
36+
? { action, modelName: executionModel, iframes: true }
37+
: { action, iframes: true },
38+
);
39+
return {
40+
success: fallback.success,
41+
action: fallback.action,
42+
isIframe: true,
43+
};
44+
}
45+
46+
return {
47+
success: result.success,
48+
action: result.action,
49+
isIframe: false,
50+
};
51+
} catch (error) {
52+
return { success: false, error: error.message };
53+
}
54+
},
55+
});

lib/agent/tools/ariaTree.ts

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
import { tool } from "ai";
2+
import { z } from "zod/v3";
3+
import { StagehandPage } from "../../StagehandPage";
4+
5+
export const createAriaTreeTool = (stagehandPage: StagehandPage) =>
6+
tool({
7+
description:
8+
"gets the accessibility (ARIA) tree from the current page. this is useful for understanding the page structure and accessibility features. it should provide full context of what is on the page",
9+
parameters: z.object({}),
10+
execute: async () => {
11+
const { page_text } = await stagehandPage.page.extract();
12+
const pageUrl = stagehandPage.page.url();
13+
14+
let content = page_text;
15+
const MAX_CHARACTERS = 70000;
16+
17+
const estimatedTokens = Math.ceil(content.length / 4);
18+
19+
if (estimatedTokens > MAX_CHARACTERS) {
20+
const maxCharacters = MAX_CHARACTERS * 4;
21+
content =
22+
content.substring(0, maxCharacters) +
23+
"\n\n[CONTENT TRUNCATED: Exceeded 70,000 token limit]";
24+
}
25+
26+
return {
27+
content,
28+
pageUrl,
29+
};
30+
},
31+
experimental_toToolResultContent: (result) => {
32+
const content = typeof result === "string" ? result : result.content;
33+
return [{ type: "text", text: `Accessibility Tree:\n${content}` }];
34+
},
35+
});

lib/agent/tools/close.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import { tool } from "ai";
2+
import { z } from "zod/v3";
3+
4+
export const createCloseTool = () =>
5+
tool({
6+
description: "Complete the task and close",
7+
parameters: z.object({
8+
reasoning: z.string().describe("Summary of what was accomplished"),
9+
taskComplete: z
10+
.boolean()
11+
.describe("Whether the task was completed successfully"),
12+
}),
13+
execute: async ({ reasoning, taskComplete }) => {
14+
return { success: true, reasoning, taskComplete };
15+
},
16+
});

0 commit comments

Comments
 (0)