-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Description
Environment Information
Stagehand:
- Language/SDK: Python
- Stagehand version: 0.5.4
AI Provider:
- Provider: OpenAI
- Model: Doesn't work with gpt-4.1-mini (but seems to work when not specifying a model, probably will use gpt-4o then)
Issue Description
As described in the documentation Stagehand uses internal ID's to map URLs to the ExtractedResult. My final result however doesn't include the URLs for all items found on the page
Steps to Reproduce
- Run code below (will not work)
- Comment the line that specifies the model (will work)
Minimal Reproduction Code
from stagehand import Stagehand
import asyncio
from dotenv import load_dotenv
from pydantic import BaseModel, HttpUrl
from typing import Optional
import json
load_dotenv()
class Agency(BaseModel):
name: str
url: HttpUrl
description: Optional[str] = None
location: Optional[str] = None
class Agencies(BaseModel):
agencies: list[Agency]
async def main():
stagehand = Stagehand(
env="LOCAL",
headless=False,
model_name="gpt-4.1-mini"
)
all_agencies = []
await stagehand.init()
page = stagehand.page
await page.goto("<url I want to scrape>")
should_continue = True
while should_continue:
agencies = await page.extract("Extract all agencies from the page", schema=Agencies)
all_agencies.extend([agency.model_dump(mode='json') for agency in agencies.agencies])
next_page_button = await page.observe("Find the next page button")
should_continue = has_next_page(next_page_button)
if should_continue:
await page.click(next_page_button[0].selector)
print(all_agencies)
with open("agencies.json", "w") as f:
json.dump(all_agencies, f)
if __name__ == "__main__":
asyncio.run(main())Error Messages / Log trace
Screenshots / Videos
Related Issues
No related PRs found (also not in Typescript version)
Metadata
Metadata
Assignees
Labels
No labels