Skip to content

feat: Enhance console output handling, screenshot capture reliabilit…#288

Open
jaeko44 wants to merge 4 commits intomicrosoft:mainfrom
jaeko44:main
Open

feat: Enhance console output handling, screenshot capture reliabilit…#288
jaeko44 wants to merge 4 commits intomicrosoft:mainfrom
jaeko44:main

Conversation

@jaeko44
Copy link

@jaeko44 jaeko44 commented Feb 6, 2026

…y and enable responses API support

  • Introduced _safe_console_text function to handle UnicodeEncodeError on legacy Windows consoles by stripping emojis in various logging outputs across app and host agent processors.
  • Improved error handling in AppScreenshotCaptureStrategy and DesktopDataCollectionStrategy to ensure valid image data is captured and logged warnings for invalid or tiny images.
  • Updated ControlPhotographer to implement a fallback mechanism for screenshot capturing using win32 APIs when PIL ImageGrab fails.
  • Enhanced UI MCP server to validate screenshots and fall back to desktop captures when necessary.
  • Added support for using the Responses API in BaseOpenAIService, including error handling for unsupported response formats.
  • Updated utility functions to handle directory paths and invalid image strings gracefully.

… and enable responses API support

- Introduced _safe_console_text function to handle UnicodeEncodeError on legacy Windows consoles by stripping emojis in various logging outputs across app and host agent processors.
- Improved error handling in AppScreenshotCaptureStrategy and DesktopDataCollectionStrategy to ensure valid image data is captured and logged warnings for invalid or tiny images.
- Updated ControlPhotographer to implement a fallback mechanism for screenshot capturing using win32 APIs when PIL ImageGrab fails.
- Enhanced UI MCP server to validate screenshots and fall back to desktop captures when necessary.
- Added support for using the Responses API in BaseOpenAIService, including error handling for unsupported response formats.
- Updated utility functions to handle directory paths and invalid image strings gracefully.
@jaeko44
Copy link
Author

jaeko44 commented Feb 6, 2026

@microsoft-github-policy-service agree company="DET-IO PTY LTD"

…in evaluation, fix directory-as-path bug

- screenshot.py: Add _win32_print_window() using PrintWindow API (works on
  disconnected RDP sessions). Add PrintWindow as intermediate fallback in
  ControlPhotographer.capture() and _win32_grab_screen(). Add foreground
  window fallback for desktop capture when all methods fail.
- eva_prompter.py: Add MAX_EVAL_IMAGES=40 cap and _is_valid_screenshot()
  filtering to prevent 50-image API limit errors from placeholder images.
  Add text-only evaluation fallback when no valid screenshots exist.
- parser.py: Fix _load_single_screenshot() to handle empty string paths
  that resolved to directories, use os.path.isfile() check.
@vyokky
Copy link
Contributor

vyokky commented Feb 9, 2026

Hi @jaeko44 , thanks so much for your contribution, this is awesome work! Before merging, could you please roll back the default config YAML to the previous version, to avoid breaking existing usage? Also, could you test the change using an older setting (e.g., GPT-4o) to make sure everything still works as expected? Thanks a lot!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves runtime robustness across UFO’s Windows console logging and screenshot capture pipeline, and adds an opt-in path to use OpenAI/Azure OpenAI “Responses API” in the LLM service layer.

Changes:

  • Adds console-output sanitization helpers to avoid UnicodeEncodeError on non-UTF consoles.
  • Hardens screenshot capture/encoding flows with validation, retries, and fallbacks (including Win32 PrintWindow/BitBlt paths).
  • Adds an opt-in Responses API execution path in the OpenAI service plus related config support (env-var expansion and USE_RESPONSES flags).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ufo/utils/init.py Makes image loading/saving more defensive (directory path + invalid data URL handling).
ufo/trajectory/parser.py Avoids treating empty screenshot paths as directories; requires files for load.
ufo/prompter/eva_prompter.py Filters placeholder/invalid screenshots and caps screenshot count in eval prompts.
ufo/module/basic.py Adds _safe_console_text() and applies it to Rich console prints.
ufo/llm/openai.py Adds Responses API request path and Azure preview header plumbing.
ufo/client/mcp/local_servers/ui_mcp_server.py Adds screenshot validation + desktop fallback and safer failure behavior.
ufo/automator/ui_control/screenshot.py Adds Win32 screenshot fallbacks (PrintWindow/BitBlt) and ImageGrab retry logic.
ufo/automator/ui_control/controller.py Adds layered text-entry fallbacks and additional result/error checks.
ufo/agents/processors/strategies/host_agent_processing_strategy.py Validates screenshot strings and retries/cascades to placeholder instead of hard failing.
ufo/agents/processors/strategies/app_agent_processing_strategy.py Adds fallback-to-desktop behavior on invalid/tiny window screenshots and placeholder returns on failure.
ufo/agents/processors/host_agent_processor.py Adds _safe_console_text() and applies it to panel output.
ufo/agents/processors/app_agent_processor.py Adds _safe_console_text() and applies it to panel output.
ufo/agents/presenters/rich_presenter.py Adds encoding-aware text sanitization and ASCII fallbacks for separators/icons.
config/ufo/system.yaml Changes default INPUT_TEXT_API and disables markdown logging by default.
config/ufo/agents.yaml.template Documents USE_RESPONSES flag for agents.
config/config_schemas.py Adds AgentConfig.to_dict() to preserve uppercase keys + dynamic extras.
config/config_loader.py Adds env-var expansion in YAML load + adjusts AOAI API_BASE transform when USE_RESPONSES is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jaeko44
Copy link
Author

jaeko44 commented Feb 9, 2026

Hi @jaeko44 , thanks so much for your contribution, this is awesome work! Before merging, could you please roll back the default config YAML to the previous version, to avoid breaking existing usage? Also, could you test the change using an older setting (e.g., GPT-4o) to make sure everything still works as expected? Thanks a lot!

Will test with existing chat completions api on old models and responses api as well to make sure no breaking changes. Will revert default config yaml.

I will ping you once it's pushed here.

@jaeko44
Copy link
Author

jaeko44 commented Feb 13, 2026

@vyokky

I've tested it with model-router (Azure Model that works with lots of different models through routing) using Chat Completions API (& reverted default config settings) - everything works well - added an extra exception handler as i ran into an error when I misconfigured the URL to the wrong deployment.

a387eb6 -> Exception Handler
7291eed -> Revert Settings

I have confirmed it works with:

Chat Completion API Schema: Yes
Responses API Schema: Yes
Thinking Enabled/Disabled Models: Yes
Vision Enabled/Disabled: Yes

@jaeko44
Copy link
Author

jaeko44 commented Feb 28, 2026

@vyokky please let me know if we can get this PR sorted because we are looking to add UFO support into : https://github.com/virtengine/bosun AI Supervisor,

Unless UFO Project is no longer actively maintained and there is another project being developed/or if another project is more suitable for us to integrate let me know your opinions.

Thanks!

Jonathan
@virtengine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants