Skip to content

Computer Use needs the full GUI screenshot, not what's provided by Playwright #14

@juecd

Description

@juecd

Currently, the Anthropic Computer Use implementations (Typescript, Python coming soon) rely on Playwright's page.screenshot() functionality and passes the output to the LLM for its next instructions. However, this screenshot doesn't include the full browser window i.e. the address bar. Because of this, Claude cannot see the navbar it types into when it uses cmd-l to focus it.

We should consider using a different mechanism to take screenshots for Claude. The original reference implementation uses gnome-screenshot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions