Skip to content

Can't pass JSON schema to the extract function #193

@kushalpatil07

Description

@kushalpatil07

Before submitting an issue, please:

Environment Information

Please provide the following information to help us reproduce and resolve your issue:

Stagehand:

  • Language/SDK: Python
  • Stagehand version: 0.5.1

AI Provider:
OpenAI

  • Provider: OpenAI
  • Model: gpt-4o

Issue Description

According to the code, I expected to pass a json Schema as a argument to extract using the ExtractOptions.schema_definition, which accepts a dict as well. 

But in ExtractHandler.extract() the json Schema is passed on as a BaseModel type, and the code stops working in the validate path.

Steps to Reproduce

Minimal Reproduction Code

This is what I am using to try it out. The schema might be wrong, I'll fix that, if I get past this issue.

### Steps to Reproduce
# Define JSON schema directly without Pydantic classes
    json_schema = {
        "type": "object",
        "properties": {
            "list_of_links": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "link": {
                            "type": "string",
                            "format": "uri"
                        }
                    },
                    "required": ["link"]
                }
            }
        },
        "required": ["list_of_links"]
    }

    extraction = await stagehand.page.extract(
        options_or_instruction=ExtractOptions(schema_definition=json_schema, instruction ="extract the links of all videos that are present in this page.")
    )

Error Messages / Log trace

Traceback (most recent call last):
  File "/Users/kushal/Desktop/muscle-mem/.venv/lib/python3.11/site-packages/stagehand/handlers/extract_handler.py", line 155, in extract
    validated_model_instance = schema.model_validate(raw_data_dict)
                               ^^^^^^
  File "/Users/kushal/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kushal/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
                      ^^^^^^^^^^^^^
bdb.BdbQuit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kushal/Desktop/muscle-mem/test.py", line 75, in <module>
    asyncio.run(run())
  File "/Users/kushal/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Users/kushal/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kushal/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/kushal/Desktop/muscle-mem/test.py", line 61, in run
    extraction = await stagehand.page.extract(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kushal/Desktop/muscle-mem/.venv/lib/python3.11/site-packages/stagehand/main.py", line 86, in wrapped
    return await attr(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kushal/Desktop/muscle-mem/.venv/lib/python3.11/site-packages/stagehand/page.py", line 384, in extract
    result = await self._extract_handler.extract(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kushal/Desktop/muscle-mem/.venv/lib/python3.11/site-packages/stagehand/handlers/extract_handler.py", line 159, in extract
    f"Failed to validate extracted data against schema {schema.__name__}: {e}. Keeping raw data dict in .data field."
                                                        ^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute '__name__'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions