Skip to content

Conversation

@raiden-staging
Copy link
Contributor

@raiden-staging raiden-staging commented Aug 16, 2025

  • Extending server with /screenshot endpoint
# Screenshot

# | GET /screenshot
curl http://localhost:10001/screenshot -o screenshot.png

# | POST /screenshot
curl -X POST -H "Content-Type: application/json" --data '{"x":0,"y":0,"width":200,"height":200}' -o screenshot_region.png http://localhost:10001/screenshot

TL;DR

Added new /screenshot API endpoints to capture full or partial screen images, and an uptime health check.

Why we made these changes

To extend the server's API capabilities, providing programmatic screenshot capture for integration or automation, and to expose server health and uptime information.

What changed?

  • server/cmd/api/api/screenshot.go: New file introducing GET /screenshot for full-screen captures and POST /screenshot for region-specific captures, leveraging ffmpeg.
  • server/cmd/api/api/api.go: Implemented server uptime tracking and exposed it via a new GetHealth endpoint.
  • server/README.md & server/openapi.yaml: Updated documentation to reflect the new /screenshot API routes and their usage.
  • server/cmd/api/api/screenshot_test.go: Added comprehensive unit tests for the health check and screenshot capture functionalities, including input validation.
  • server/go.mod & server/go.sum: Updated Go module dependencies.

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...0377a23

7 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

if err != nil {
// Fallback to default dimensions if xdpyinfo fails
return 1024, 768, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Screen Dimensions Fallback Hides Errors

The getScreenDimensions function returns a nil error even when xdpyinfo fails, causing it to silently use fallback dimensions. This prevents captureScreenshot from logging a warning and leads to incorrect bounds validation, potentially rejecting valid screenshot regions or allowing invalid ones that cause ffmpeg to fail.

Fix in Cursor Fix in Web

@matthewjmarangoni
Copy link
Contributor

This feature will be nice to use!

  1. At least one agent uses a base 64 encoded PNG (see Computer Use). Making a base 64 encoding format available would be useful as users could relay it.
  2. Should the screenshot endpoint be under /screen to group it with other screen operations?
  3. Instead of separating the region capture by different actions consider using an optional parameter for the POST action or a separate endpoint.

}

// Default dimensions if parsing fails
return 1024, 768, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fallback is needed would it be useful to read fallback dimensions from something such as environment variables before fixed values?

Copy link
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @raiden-staging !

@@ -1,6 +1,6 @@
// Package oapi provides primitives to interact with the openapi HTTP API.
//
// Code generated by github.com/oapi-codegen/oapi-codegen/v2 version v2.5.0 DO NOT EDIT.
// Code generated by github.com/deepmap/oapi-codegen version v1.12.4 DO NOT EDIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oapi/go version conflicts fix induced slop 🙃 sorry for the mess will fix

"400":
$ref: "#/components/responses/BadRequestError"
"500":
$ref: "#/components/responses/InternalError"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer merging these into one POST request

}

// Read the screenshot data
data, err := io.ReadAll(screenshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how oapi does response body reading under the hood but I'd assume it's doing this read of the reader anyway, so this feels like duplicate / unnecessary reading of the screenshot into memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we serve as base64 string [ @matthewjmarangoni ] or binary (or add optional ?format={} ) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the question correctly, offering both with a way for the client to choose either the original binary or base 64 encoded string is what I had in mind.

@rgarcia
Copy link
Contributor

rgarcia commented Oct 17, 2025

thanks again for this @raiden-staging -- screenshot functionality has been added as part of #76

@rgarcia rgarcia closed this Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants