feat: Enhance console output handling, screenshot capture reliabilit…#288
feat: Enhance console output handling, screenshot capture reliabilit…#288jaeko44 wants to merge 4 commits intomicrosoft:mainfrom
Conversation
… and enable responses API support - Introduced _safe_console_text function to handle UnicodeEncodeError on legacy Windows consoles by stripping emojis in various logging outputs across app and host agent processors. - Improved error handling in AppScreenshotCaptureStrategy and DesktopDataCollectionStrategy to ensure valid image data is captured and logged warnings for invalid or tiny images. - Updated ControlPhotographer to implement a fallback mechanism for screenshot capturing using win32 APIs when PIL ImageGrab fails. - Enhanced UI MCP server to validate screenshots and fall back to desktop captures when necessary. - Added support for using the Responses API in BaseOpenAIService, including error handling for unsupported response formats. - Updated utility functions to handle directory paths and invalid image strings gracefully.
|
@microsoft-github-policy-service agree company="DET-IO PTY LTD" |
…in evaluation, fix directory-as-path bug - screenshot.py: Add _win32_print_window() using PrintWindow API (works on disconnected RDP sessions). Add PrintWindow as intermediate fallback in ControlPhotographer.capture() and _win32_grab_screen(). Add foreground window fallback for desktop capture when all methods fail. - eva_prompter.py: Add MAX_EVAL_IMAGES=40 cap and _is_valid_screenshot() filtering to prevent 50-image API limit errors from placeholder images. Add text-only evaluation fallback when no valid screenshots exist. - parser.py: Fix _load_single_screenshot() to handle empty string paths that resolved to directories, use os.path.isfile() check.
|
Hi @jaeko44 , thanks so much for your contribution, this is awesome work! Before merging, could you please roll back the default config YAML to the previous version, to avoid breaking existing usage? Also, could you test the change using an older setting (e.g., GPT-4o) to make sure everything still works as expected? Thanks a lot! |
There was a problem hiding this comment.
Pull request overview
This PR improves runtime robustness across UFO’s Windows console logging and screenshot capture pipeline, and adds an opt-in path to use OpenAI/Azure OpenAI “Responses API” in the LLM service layer.
Changes:
- Adds console-output sanitization helpers to avoid UnicodeEncodeError on non-UTF consoles.
- Hardens screenshot capture/encoding flows with validation, retries, and fallbacks (including Win32 PrintWindow/BitBlt paths).
- Adds an opt-in Responses API execution path in the OpenAI service plus related config support (env-var expansion and USE_RESPONSES flags).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| ufo/utils/init.py | Makes image loading/saving more defensive (directory path + invalid data URL handling). |
| ufo/trajectory/parser.py | Avoids treating empty screenshot paths as directories; requires files for load. |
| ufo/prompter/eva_prompter.py | Filters placeholder/invalid screenshots and caps screenshot count in eval prompts. |
| ufo/module/basic.py | Adds _safe_console_text() and applies it to Rich console prints. |
| ufo/llm/openai.py | Adds Responses API request path and Azure preview header plumbing. |
| ufo/client/mcp/local_servers/ui_mcp_server.py | Adds screenshot validation + desktop fallback and safer failure behavior. |
| ufo/automator/ui_control/screenshot.py | Adds Win32 screenshot fallbacks (PrintWindow/BitBlt) and ImageGrab retry logic. |
| ufo/automator/ui_control/controller.py | Adds layered text-entry fallbacks and additional result/error checks. |
| ufo/agents/processors/strategies/host_agent_processing_strategy.py | Validates screenshot strings and retries/cascades to placeholder instead of hard failing. |
| ufo/agents/processors/strategies/app_agent_processing_strategy.py | Adds fallback-to-desktop behavior on invalid/tiny window screenshots and placeholder returns on failure. |
| ufo/agents/processors/host_agent_processor.py | Adds _safe_console_text() and applies it to panel output. |
| ufo/agents/processors/app_agent_processor.py | Adds _safe_console_text() and applies it to panel output. |
| ufo/agents/presenters/rich_presenter.py | Adds encoding-aware text sanitization and ASCII fallbacks for separators/icons. |
| config/ufo/system.yaml | Changes default INPUT_TEXT_API and disables markdown logging by default. |
| config/ufo/agents.yaml.template | Documents USE_RESPONSES flag for agents. |
| config/config_schemas.py | Adds AgentConfig.to_dict() to preserve uppercase keys + dynamic extras. |
| config/config_loader.py | Adds env-var expansion in YAML load + adjusts AOAI API_BASE transform when USE_RESPONSES is enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Will test with existing chat completions api on old models and responses api as well to make sure no breaking changes. Will revert default config yaml. I will ping you once it's pushed here. |
|
I've tested it with model-router (Azure Model that works with lots of different models through routing) using Chat Completions API (& reverted default config settings) - everything works well - added an extra exception handler as i ran into an error when I misconfigured the URL to the wrong deployment. a387eb6 -> Exception Handler I have confirmed it works with: Chat Completion API Schema: Yes |
|
@vyokky please let me know if we can get this PR sorted because we are looking to add UFO support into : https://github.com/virtengine/bosun AI Supervisor, Unless UFO Project is no longer actively maintained and there is another project being developed/or if another project is more suitable for us to integrate let me know your opinions. Thanks! Jonathan |
…y and enable responses API support