MigoXLab
diff --git a/‎README.md‎
Lines changed: 20 additions & 8 deletions b/‎README.md‎
Lines changed: 20 additions & 8 deletions
diff --git a/‎README_zh-CN.md‎
Lines changed: 20 additions & 8 deletions b/‎README_zh-CN.md‎
Lines changed: 20 additions & 8 deletions
diff --git a/‎config/config.yaml.example‎
Lines changed: 3 additions & 3 deletions b/‎config/config.yaml.example‎
Lines changed: 3 additions & 3 deletions
@@ -54,9 +54,9 @@ vibecoding, vibe coding, web evaluation, autonomous exploration, web testing aut
 
 ### 📋 Feature Highlights
 
-- **🤖 AI-Powered Testing**: Performs autonomous website testing—explores pages, plans actions, and executes end-to-end flows without manual scripting.
-- **📊 Multi-Dimensional Observation**: Covers functionality, performance, user experience, and basic security; evaluates load speed, design details, and links to surface issues.
-- **🎯 Actionable Recommendations**: Runs in real browsers and provides concrete suggestions for improvement.
+- **🤖 AI-Powered Testing**: Performs autonomous website testing with intelligent planning and reflection—explores pages, plans actions, and executes end-to-end flows without manual scripting. Features 2-stage architecture (lightweight filtering + comprehensive planning) and dynamic test generation for newly appeared UI elements.
+- **📊 Multi-Dimensional Observation**: Covers functionality, performance, user experience, and basic security; evaluates load speed, design details, and links to surface issues. Uses multi-modal analysis (screenshots + DOM structure + text content) and DOM diff detection to discover new test opportunities.
+- **🎯 Actionable Recommendations**: Runs in real browsers with smart element prioritization and automatic viewport management. Provides concrete suggestions for improvement with adaptive recovery mechanisms for robust test execution.
 - **📈 Visual Reports**: Generates detailed HTML test reports with clear, multi-dimensional views for analysis and tracking.
 
 ## 📹 Examples
@@ -137,6 +137,7 @@ python webqa-agent.py
 target:
   url: https://example.com/                       # Website URL to test
   description: example description
+  # max_concurrent_tests: 2                       # Optional, default parallel 2
 
 test_config:                                      # Test configuration
   function_test:                                  # Functional testing
@@ -145,8 +146,8 @@ test_config:                                      # Test configuration
     business_objectives: example business objectives  # Recommended to include test scope, e.g., test search functionality
     dynamic_step_generation:                      # Optional, configuration for dynamic steps generation
       enabled: True                               # Optional, default False, recommended to set True to enable dynamic step generation
-      max_dynamic_steps: 5                        # Optional, default 5 test steps generated per trigger
-      min_elements_threshold: 2                   # Optional, default trigger threshold is 2 DOM element differences
+      max_dynamic_steps: 10                       # Optional, default 5, this example uses 10
+      min_elements_threshold: 1                   # Optional, default 2, this example uses 1 for higher sensitivity
   ux_test:                                        # User experience testing
     enabled: True
   performance_test:                               # Performance analysis
@@ -155,28 +156,39 @@ test_config:                                      # Test configuration
     enabled: False
 
 llm_config:                                       # Vision model configuration, currently supports OpenAI SDK compatible format only
-  model: gpt-4.1-2025-04-14                       # Recommended
+  model: gpt-4.1-2025-04-14                       # Primary model for Stage 2 test planning (Recommended)
+  filter_model: gpt-4o-mini                       # Lightweight model for Stage 1 element filtering (cost-effective)
   api_key: your_api_key
   base_url: https://api.example.com/v1
+  temperature: 0.1                                # Optional, default 0.1
+  # top_p: 0.9                                    # Optional, if not set, this parameter will not be passed
+  # max_tokens: 8192                              # Optional, maximum output tokens (supports generating more test cases)
 
 browser_config:
   viewport: {"width": 1280, "height": 720}
   headless: False                                 # Automatically overridden to True in Docker environment
   language: zh-CN
   cookies: []
+  save_screenshots: False                         # Whether to save screenshots to local disk (default: False)
+
+report:
+  language: en-US                                 # zh-CN, en-US
+
+log:
+  level: info
 ```
 
 Please note the following important considerations when configuring and running tests:
 
 #### 1. Functional Testing Notes
 
-- **AI Mode**: When specifying the number of test cases to generate in the configuration file, the system may re-plan based on actual page conditions. This may result in the final number of executed test cases differing from the initial configuration to ensure coverage and effectiveness.
+- **AI Mode**: Uses a 2-stage planning architecture where Stage 1 (filter_model) prioritizes elements for efficient analysis, and Stage 2 (primary model) generates comprehensive test cases. The system may reflect and re-plan based on actual page conditions and test coverage, which may result in the final number of executed test cases differing from the initial configuration to ensure effectiveness. When `dynamic_step_generation` is enabled, the system automatically generates additional test steps for newly appeared UI elements (e.g., dropdowns, modals) detected through DOM diff analysis.
 
 - **Default Mode**: The `default` mode focuses on whether UI interactions (e.g., clicks and navigations) complete successfully.
 
 #### 2. User Experience Testing Notes
 
-UX (User Experience) testing focuses on usability, and user-friendliness. The model output in the results provides suggestions based on best practices to guide optimization.
+UX (User Experience) testing focuses on usability and user-friendliness. Uses multi-modal analysis combining screenshots, DOM structure, and text content to evaluate visual quality, detect typos/grammar issues, and validate layout rendering. The model output in the results provides suggestions based on best practices to guide optimization.
 
 ### 🧠 Recommended Models
 
 
@@ -53,9 +53,9 @@ Vibecoding, Vibe coding, 网页测试自动化, 浏览器测试工具, AI驱动
 
 ### 📋 特性概览
 
-- **🤖 AI 自主测试**：WebQA-Agent能够自主进行网站测试，无需手写脚本，自动探索页面、规划动作并执行端到端流程
-- **📊 多维度观测**：覆盖功能、性能、用户体验、安全等核心场景，评估页面加载速度、设计细节和链接，全面保障系统质量
-- **🎯 可执行建议**：基于真实浏览器运行，输出具体的优化与改进建议
+- **🤖 AI 自主测试**：WebQA-Agent具备智能规划与反思能力，能够自主进行网站测试，无需手写脚本，自动探索页面、规划动作并执行端到端流程。采用两阶段架构（轻量级过滤+全面规划），并支持动态生成针对新出现UI元素的测试步骤
+- **📊 多维度观测**：覆盖功能、性能、用户体验、安全等核心场景，评估页面加载速度、设计细节和链接，全面保障系统质量。采用多模态分析（截图+DOM结构+文本内容）和DOM差异检测，发现新的测试机会
+- **🎯 可执行建议**：基于真实浏览器运行，具备智能元素优先级排序和自动视口管理能力，输出具体的优化与改进建议，并提供自适应恢复机制确保测试稳健执行
 - **📈 可视化报告**：生成详细的HTML测试报告，多维度、可视化展示执行结果，便于分析与追踪
 
 ## 📹 示例演示
@@ -140,6 +140,7 @@ python webqa-agent.py
 target:
   url: https://example.com/                       # 需要测试的网站URL
   description: example description
+  # max_concurrent_tests: 2                       # 可选，默认并行2个
 
 test_config:                                      # 测试项配置
   function_test:                                  # 功能测试
@@ -148,8 +149,8 @@ test_config:                                      # 测试项配置
     business_objectives: example business objectives  # 建议加入测试范围，如：测试搜索功能
     dynamic_step_generation:                      # 可选，动态生成步骤配置
       enabled: True                               # 可选, 默认False，建议设置为True使能动态步骤生成
-      max_dynamic_steps: 5                        # 可选，默认每次触发生成5步测试步骤
-      min_elements_threshold: 2                   # 可选，默认触发阈值为2个dom元素差异
+      max_dynamic_steps: 10                       # 可选，默认为5，此示例使用10
+      min_elements_threshold: 1                   # 可选，默认为2，此示例使用1以提高灵敏度
   ux_test:                                        # 用户体验测试
     enabled: True
   performance_test:                               # 性能分析
@@ -158,28 +159,39 @@ test_config:                                      # 测试项配置
     enabled: False
 
 llm_config:                                       # 视觉模型配置，当前仅支持 OpenAI SDK 兼容格式
-  model: gpt-4.1-2025-04-14                       # 推荐使用
+  model: gpt-4.1-2025-04-14                       # 第二阶段测试规划的主模型（推荐）
+  filter_model: gpt-4o-mini                       # 第一阶段元素过滤的轻量级模型（经济实用）
   api_key: your_api_key
   base_url: https://api.example.com/v1
+  temperature: 0.1                                # 可选，默认0.1
+  # top_p: 0.9                                    # 可选，如未设置将不传递该参数
+  # max_tokens: 8192                              # 可选，最大输出token数（支持生成更多测试用例）
 
 browser_config:
   viewport: {"width": 1280, "height": 720}
   headless: False                                 # Docker环境会自动覆盖为True
   language: zh-CN
   cookies: []
+  save_screenshots: False                         # 是否将截图保存到本地磁盘（默认：False）
+
+report:
+  language: en-US                                 # zh-CN, en-US
+
+log:
+  level: info
 ```
 
 在配置和运行测试时，请注意以下重要事项：
 
 #### 1. 功能测试说明
 
-- **AI模式**：当在配置文件中指定生成测试用例的数量时，系统可能会根据实际测试情况进行代理重新规划和调整。这可能导致最终执行的测试用例数量与初始设定存在一定出入，以确保测试的准确性和有效性。
+- **AI模式**：采用两阶段规划架构，第一阶段（filter_model）优先排序元素以实现高效分析，第二阶段（主模型）生成全面的测试用例。系统会根据实际页面情况和测试覆盖率进行反思和重新规划，这可能导致最终执行的测试用例数量与初始设定存在一定出入，以确保测试的有效性。当启用 `dynamic_step_generation` 时，系统会通过DOM差异分析自动为新出现的UI元素（如下拉菜单、模态框）生成额外的测试步骤。
 
 - **Default模式**：功能测试的 `default` 模式主要验证UI元素的点击行为是否成功执行，包括按钮点击、链接跳转等基本交互功能。
 
 #### 2. 用户体验测试说明
 
-UX（用户体验）评估关注网页可用性与友好性。结果中的模型输出基于最佳实践给出改进建议，便于设计与开发参考。
+UX（用户体验）评估关注网页可用性与友好性。采用多模态分析，结合截图、DOM结构和文本内容来评估视觉质量、检测拼写/语法问题以及验证布局渲染。结果中的模型输出基于最佳实践给出改进建议，便于设计与开发参考。
 
 ### 🧠 推荐模型
 
 
@@ -9,9 +9,9 @@ test_config: # Test configuration
     type: ai  # default or ai
     business_objectives: Test Baidu search functionality, generate 3 test cases
     dynamic_step_generation:
-      enabled: True
-      max_dynamic_steps: 10
-      min_elements_threshold: 1
+      enabled: True  # Default: False (this example enables it as recommended)
+      max_dynamic_steps: 10  # Default: 5 (this example increases the limit)
+      min_elements_threshold: 1  # Default: 2 (this example uses 1 for higher sensitivity)
   ux_test:
     enabled: True
   performance_test: