Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Oct 15, 2025

Note

Adds /computer/screenshot (PNG via ffmpeg, optional region) and /computer/type, tightens mouse APIs with screen-bounds checks and STZ guards, makes resolution queries error-aware, updates OpenAPI/client, and adds e2e screenshot tests.

  • API Endpoints:
    • POST /computer/screenshot: capture PNG (ffmpeg x11grab), optional region crop; streams image.
    • POST /computer/type: type arbitrary text with optional per-keystroke delay.
  • Mouse APIs:
    • MoveMouse/ClickMouse: add display bounds validation (via getCurrentResolution), STZ disable/enable guards, improved logging; return 400 on OOB coords.
  • Display:
    • getCurrentResolution now returns (w,h,rate,error); PatchDisplay handles errors.
  • OpenAPI/Client:
    • New schemas: ScreenshotRequest/ScreenshotRegion, TypeTextRequest.
    • Generated server routes, responses, and client helpers for screenshot/typing; swagger spec updated.
  • Tests:
    • E2E: add headless/headful screenshot tests; PNG validation helper (isPNG).

Written by Cursor Bugbot for commit 4094aca. This will update automatically on new commits. Configure here.

@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Oct 15, 2025

Mesa Description

This PR introduces the capability to take OS-level screenshots, capturing the entire desktop rather than just the browser content.

This is useful for debugging and support, allowing users to capture system dialogs and other applications running within the environment.

Changes

  • Added a new API endpoint to trigger and retrieve OS-level screenshots.
  • Implemented backend logic to capture the full content of the virtual display.
  • The screenshot is returned as a PNG image.

Testing

  • Manually triggered the new endpoint.
  • Verified that the resulting image correctly captures the entire desktop, including the browser and other OS elements.
  • Tested with different running applications to ensure they are captured correctly.

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of a730519...b51a533

Analysis

  1. The implementation hardcodes display :1, which may cause issues in environments with different display configurations or multiple displays.

  2. The use of external dependencies (ffmpeg, xdpyinfo) creates potential points of failure if these tools are unavailable or have version incompatibilities on the target system.

  3. While PNG format provides lossless compression, it may result in large response payloads for high-resolution screenshots, potentially causing network or performance issues for clients.

  4. The current implementation may not handle concurrent screenshot requests efficiently, as resource-intensive ffmpeg operations could impact overall system performance under load.

  5. The design might lack proper authentication or authorization controls specific to screenshot functionality, which could expose sensitive visual information.

Tip

⚡ Quick Actions

This review was generated by Mesa.

Actions:

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

4 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@rgarcia rgarcia requested a review from hiroTamada October 15, 2025 17:48
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

What is the use case for click mouse + type endpoints? Is it just nice to have for abstracting out os level interaction?

@rgarcia
Copy link
Contributor Author

rgarcia commented Oct 15, 2025

@hiroTamada some people require more high-fidelity computer actions vs. what playwright does (synthetic DOM events)

@rgarcia rgarcia merged commit 0260b79 into main Oct 15, 2025
4 checks passed
@rgarcia rgarcia deleted the raf/kernel-149-os-level-screenshots branch October 15, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants