Skip to content

Commit 8ab3dde

Browse files
Evals (#9)
* update readme * work on evals * add sample evals * update
1 parent a3070bd commit 8ab3dde

File tree

8 files changed

+708
-82
lines changed

8 files changed

+708
-82
lines changed

README.md

Lines changed: 128 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -34,47 +34,57 @@
3434
</a>
3535
</p>
3636

37-
38-
<div class="note" style="background-color: #808096; border-left: 5px solid #ffeb3b; padding: 15px; margin: 10px 0; color: white;">
39-
<strong>NOTE:</strong> This is a Python SDK for Stagehand. Original implementation is in TypeScript and is available <a href="https://github.com/browserbase/stagehand" style="color: blue;">here</a>.
40-
</div>
37+
<div class="note" style="background-color: #808096; border-left: 5px solid #ffeb3b; padding: 15px; margin: 10px 0; color: white;">
38+
<strong>NOTE:</strong> This is a Python SDK for Stagehand. The original implementation is in TypeScript and is available <a href="https://github.com/browserbase/stagehand" style="color: blue;">here</a>.
39+
</div>
4140

4241
---
4342

44-
A Python SDK for [Stagehand](https://stagehand.dev), enabling automated browser control and data extraction.
45-
46-
Stagehand is the easiest way to build browser automations. It is fully compatible with Playwright, offering three simple AI APIs (act, extract, and observe) on top of the base Playwright Page class that provide the building blocks for web automation via natural language.
47-
48-
You can write all of your Playwright commands as you normally would, while offloading the AI-powered `act/extract/observe` operations to Stagehand hosted on our Stagehand API.
49-
50-
51-
Here's a sample of what you can do with Stagehand:
52-
53-
```python
54-
import asyncio
55-
56-
async def main():
57-
# Keep your existing Playwright code unchanged
58-
await page.goto("https://docs.stagehand.dev");
59-
60-
# Stagehand AI: Act on the page via Stagehand API
61-
await page.act("click on the 'Quickstart'");
62-
63-
# Stagehand AI: Extract data from the page
64-
from pydantic import BaseModel
65-
66-
class DescriptionSchema(BaseModel):
67-
description: str
68-
69-
data = await page.extract(
70-
instruction="extract the description of the page",
71-
schema=DescriptionSchema
72-
)
73-
description = data.description
74-
75-
if __name__ == "__main__":
76-
asyncio.run(main())
77-
```
43+
Stagehand is the easiest way to build browser automations with AI-powered interactions. It extends the Playwright API with three powerful AI primitives:
44+
45+
- **act** — Instruct the AI to perform actions (e.g. click a button or scroll).
46+
- **extract** — Extract and validate data from a page using a JSON schema (generated either manually or via a Pydantic model).
47+
- **observe** — Get natural language interpretations to, for example, identify selectors or elements from the DOM.
48+
## Pydantic Schemas
49+
50+
Stagehand uses Pydantic models to define the options for AI commands:
51+
52+
- **ActOptions**
53+
The `ActOptions` model takes an `action` field that tells the AI what to do on the page, plus optional fields such as `useVision` and `variables`:
54+
```python
55+
from stagehand.schemas import ActOptions
56+
57+
# Example:
58+
await page.act(ActOptions(action="click on the 'Quickstart' button"))
59+
```
60+
61+
- **ObserveOptions**
62+
The `ObserveOptions` model lets you find elements on the page using natural language. The `onlyVisible` option helps limit the results:
63+
```python
64+
from stagehand.schemas import ObserveOptions
65+
66+
# Example:
67+
await page.observe(ObserveOptions(instruction="find the button labeled 'News'", onlyVisible=True))
68+
```
69+
70+
- **ExtractOptions**
71+
The `ExtractOptions` model extracts structured data from the page. Pass your instructions and a schema defining your expected data format. **Note:** If you are using a Pydantic model for the schema, call its `.model_json_schema()` method to ensure JSON serializability.
72+
```python
73+
from stagehand.schemas import ExtractOptions
74+
from pydantic import BaseModel
75+
76+
class DescriptionSchema(BaseModel):
77+
description: str
78+
79+
# Example:
80+
data = await page.extract(
81+
ExtractOptions(
82+
instruction="extract the description of the page",
83+
schemaDefinition=DescriptionSchema.model_json_schema()
84+
)
85+
)
86+
description = data.get("description") if isinstance(data, dict) else data.description
87+
```
7888

7989
## Why?
8090
**Stagehand adds determinism to otherwise unpredictable agents.**
@@ -87,99 +97,140 @@ While there's no limit to what you could instruct Stagehand to do, our primitive
8797

8898
## Installation
8999

100+
Install the Python package via pip:
101+
90102
```bash
91103
pip install stagehand-py
92104
```
93105

94-
## Quickstart
106+
## Environment Variables
95107

96-
Before running your script, make sure you have exported the necessary environment variables:
108+
Before running your script, set the following environment variables:
97109

98110
```bash
99111
export BROWSERBASE_API_KEY="your-api-key"
100112
export BROWSERBASE_PROJECT_ID="your-project-id"
101-
export OPENAI_API_KEY="your-openai-api-key" # or other model
102-
export STAGEHAND_SERVER_URL="url-of-stagehand-server"
113+
export OPENAI_API_KEY="your-openai-api-key" # or your preferred model's API key
114+
export STAGEHAND_SERVER_URL="url-of-stagehand-server"
103115
```
104116

105-
## Usage
117+
## Quickstart
106118

107-
Here is a minimal example to get started:
119+
Below is a minimal example to get started with Stagehand using the new schema-based options:
108120

109121
```python
110122
import asyncio
111123
import os
112124
from stagehand.client import Stagehand
125+
from stagehand.schemas import ActOptions, ExtractOptions
126+
from pydantic import BaseModel
113127
from dotenv import load_dotenv
114128

115129
load_dotenv()
116130

131+
class DescriptionSchema(BaseModel):
132+
description: str
133+
117134
async def main():
118-
# Create a Stagehand client - it will create a new session automatically
135+
# Create a Stagehand client - it will automatically create a new session if needed
119136
stagehand = Stagehand(
120-
model_name="gpt-4o", # optional - defaults to server's default
137+
model_name="gpt-4o", # Optional: defaults are available from the server
121138
)
122139

123-
# Initialize - this will create a new session
124-
await stagehand.page.init()
140+
# Initialize Stagehand and create a new session
141+
await stagehand.init()
125142
print(f"Created new session: {stagehand.session_id}")
126143

127-
# Example: navigate to google.com - from Playwright in Python
128-
await stagehand.page.goto("https://www.google.com")
144+
# Navigate to a webpage using local Playwright controls
145+
await stagehand.page.goto("https://www.example.com")
129146
print("Navigation complete.")
130147

131-
# Example: ACT to do something like 'search for openai'
132-
# executes remote on a Typescript server and logs are streamed back
133-
await stagehand.page.act("search for openai")
148+
# Perform an action using the AI (e.g. simulate a button click)
149+
await stagehand.page.act(ActOptions(action="click on the 'Quickstart' button"))
134150

135-
# Pure client side Playwright - after searching for OpenAI, click on the News tab
136-
await stagehand.page.get_by_role("link", name="News", exact=True).first.click()
137-
print("Clicked on News tab")
151+
# Extract data from the page with schema validation
152+
data = await stagehand.page.extract(
153+
ExtractOptions(
154+
instruction="extract the description of the page",
155+
schemaDefinition=DescriptionSchema.model_json_schema()
156+
)
157+
)
158+
description = data.get("description") if isinstance(data, dict) else data.description
159+
print("Extracted description:", description)
138160

139-
# Close the session (if needed)
140161
await stagehand.close()
141162

142163
if __name__ == "__main__":
143164
asyncio.run(main())
144165
```
145166

146167

147-
## More Examples
168+
## Running Evaluations
148169

149-
For further examples, you can check out the scripts in the “examples/” directory:
170+
To test all evaluations, run the following command in your terminal:
150171

151-
1. “examples/example.py”: Demonstrates combined server-side/page navigation and AI-based actions.
152-
2. “examples/extract-example.py”: Shows how to use the “extract” functionality with JSON schema or a pydantic model.
153-
3. “examples/observe-example.py”: Demonstrates the “observe” functionality to get natural-language readings of the page.
154172

173+
```bash
174+
python evals/run_all_evals.py
175+
```
176+
177+
This script will dynamically discover and execute every evaluation module within the `evals` directory and print the results for each.
178+
179+
180+
## More Examples
181+
182+
For further examples, check out the scripts in the `examples/` directory:
183+
184+
1. **examples/example.py**: Demonstrates combined server-side/page navigation with AI-based actions.
185+
2. **examples/extract-example.py**: Shows how to use the extract functionality with a JSON schema or a Pydantic model.
186+
3. **examples/observe-example.py**: Demonstrates the observe functionality to get natural-language readings of the page.
155187

156188
## Configuration
157189

158-
- `stagehand_server_url`: The Stagehand API server URL
159-
- `browserbase_api_key`: Your BrowserBase API key (can also be set via BROWSERBASE_API_KEY environment variable)
160-
- `browserbase_project_id`: Your BrowserBase project ID (can also be set via BROWSERBASE_PROJECT_ID environment variable)
161-
- `model_api_key`: Your model API key (e.g. OpenAI, Anthropic, etc) (can also be set via MODEL_API_KEY environment variable)
162-
- `verbose`: Verbosity level (default: 1)
163-
- `model_name`: (optional) Model name to use for the conversation
164-
- `dom_settle_timeout_ms`: (optional) Additional time for the DOM to settle
165-
- `debug_dom`: (optional) Whether or not to enable DOM debug mode
190+
Stagehand can be configured via environment variables or through a `StagehandConfig` object. Available configuration options include:
191+
192+
- `stagehand_server_url`: URL of the Stagehand API server.
193+
- `browserbase_api_key`: Your Browserbase API key (`BROWSERBASE_API_KEY`).
194+
- `browserbase_project_id`: Your Browserbase project ID (`BROWSERBASE_PROJECT_ID`).
195+
- `model_api_key`: Your model API key (e.g. OpenAI, Anthropic, etc.) (`MODEL_API_KEY`).
196+
- `verbose`: Verbosity level (default: 1).
197+
- `model_name`: Optional model name for the AI.
198+
- `dom_settle_timeout_ms`: Additional time (in ms) to have the DOM settle.
199+
- `debug_dom`: Enable debug mode for DOM operations.
200+
201+
Example using a unified configuration:
202+
203+
```python
204+
from stagehand.config import StagehandConfig
205+
import os
206+
207+
config = StagehandConfig(
208+
env="BROWSERBASE" if os.getenv("BROWSERBASE_API_KEY") and os.getenv("BROWSERBASE_PROJECT_ID") else "LOCAL",
209+
api_key=os.getenv("BROWSERBASE_API_KEY"),
210+
project_id=os.getenv("BROWSERBASE_PROJECT_ID"),
211+
debug_dom=True,
212+
headless=False,
213+
dom_settle_timeout_ms=3000,
214+
model_name="gpt-4o-mini",
215+
model_client_options={"apiKey": os.getenv("MODEL_API_KEY")}
216+
)
217+
```
166218

167219
## Features
168220

169-
- Automated browser control with natural language commands
170-
- Data extraction with schema validation (either pydantic or JSON schema)
171-
- Async/await support
172-
- Extension of Playwright - run playwright commands normally, with act/extract/observe offloaded to an API
221+
- **AI-powered Browser Control**: Execute natural language instructions over a running browser.
222+
- **Validated Data Extraction**: Use JSON schemas (or Pydantic models) to extract and validate information from pages.
223+
- **Async/Await Support**: Built using Python's asyncio, making it easy to build scalable web automation workflows.
224+
- **Extensible**: Seamlessly extend Playwright functionality with AI enrichments.
173225

174226
## Requirements
175227

176228
- Python 3.7+
177229
- httpx
178230
- asyncio
179231
- pydantic
180-
- python-dotenv (optional if using a .env file)
232+
- python-dotenv (optional, for .env support)
181233

182234
## License
183235

184-
MIT License (c) Browserbase, Inc.
185-
236+
MIT License (c) 2025 Browserbase, Inc.

0 commit comments

Comments
 (0)