Skip to content

Add screenshot as context option for SpeakMCP input #217

@aj47

Description

@aj47

Feature Request: Screenshot as Context Option

Description

Add screenshot functionality as a context option for SpeakMCP input. This should include:

  1. Input UI Enhancement: Add a checkbox in the input UI to enable screenshot capture
  2. Agent Settings: Add settings for agents to configure screenshot behavior
  3. Multimodal Support: Research and implement proper data transmission for multimodal models

Technical Requirements

UI Components

  • Add checkbox in input UI for screenshot option
  • Integrate with system screenshot capture
  • Provide visual feedback when screenshot is captured

Agent Settings

  • Add screenshot configuration options in agent settings
  • Allow agents to enable/disable screenshot context
  • Configure screenshot quality/format preferences

Multimodal Model Integration

  • Research standards for multimodal models over OpenAI base URL
  • Implement proper image encoding/formatting
  • Ensure compatibility with various multimodal models

Research Questions

  1. Standard Formats: What is the standard format for sending image data to multimodal models over OpenAI-compatible APIs?
  2. Encoding Methods: Should we use base64 encoding or direct binary transmission?
  3. Size Limits: What are the typical size limits for image data in API requests?
  4. Model Compatibility: How do different multimodal models (GPT-4V, Claude, Llama) handle image input?

Implementation Considerations

  • Performance: Optimize screenshot capture and transmission
  • Privacy: Ensure user consent and data security
  • Compatibility: Support across different platforms and models
  • User Experience: Make the feature intuitive and seamless

Acceptance Criteria

  • Users can capture screenshots via checkbox in input UI
  • Agents can be configured to use screenshot context
  • Screenshot data is properly formatted for multimodal models
  • Feature works with major multimodal model providers
  • Performance impact is minimal

Priority

Medium - This feature would significantly enhance the multimodal capabilities of SpeakMCP and improve user experience for visual context.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions