Skip to content

Feature: Add image recognition capabilities #15

@NakaokaRei

Description

@NakaokaRei

Description

SwiftAutoGUI provides powerful image recognition features that can locate images on the screen. This functionality is not currently exposed through the MCP interface but would enable visual-based automation workflows.

Implementation Details

Add new tools to enable image recognition:

  1. findImage - Locate a single image on screen
  2. findImageCenter - Find the center coordinates of an image
  3. findAllImages - Find all occurrences of an image

SwiftAutoGUI Methods to Use:

  • locateOnScreen(imageName: String, confidence?: Double, region?: CGRect) - Find first match
  • locateCenterOnScreen(imageName: String, confidence?: Double, region?: CGRect) - Find center of first match
  • locateAllOnScreen(imageName: String, confidence?: Double, region?: CGRect) - Find all matches

Parameters

Common Parameters

  • imagePath: Path to the reference image file
  • confidence: Optional confidence threshold (0.0-1.0, default: 0.9)
  • region: Optional search region (x, y, width, height)

Return Values

findImage

Returns the bounding box of the found image:

{
  "found": true,
  "x": 100,
  "y": 200,
  "width": 50,
  "height": 30
}

findImageCenter

Returns the center coordinates:

{
  "found": true,
  "x": 125,
  "y": 215
}

findAllImages

Returns an array of all matches:

{
  "matches": [
    {"x": 100, "y": 200, "width": 50, "height": 30},
    {"x": 300, "y": 400, "width": 50, "height": 30}
  ]
}

Use Cases

  • GUI test automation
  • Finding and clicking buttons based on appearance
  • Waiting for visual elements to appear
  • Verifying UI states
  • Automating applications without accessible APIs

Example Usage

{
  "tool": "findImage",
  "arguments": {
    "imagePath": "/path/to/button.png",
    "confidence": 0.95,
    "region": {
      "x": 0,
      "y": 0,
      "width": 1920,
      "height": 1080
    }
  }
}

Priority

Medium - While powerful, image recognition is more advanced than basic automation features and may have performance implications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions