feat: checkpoint3

seancoding-day · seancoding-day · commit 539cc60e3944 · 2025-09-24T19:36:40.000+08:00
diff --git a/webqa_agent/llm/prompt.py b/webqa_agent/llm/prompt.py
@@ -83,13 +83,44 @@ class LLMPrompt:
     - All plans must reflect actual context in screenshot.
     - Always output strict **valid JSON**. No comments or markdown.
 
+    ## Navigation Strategy Guidelines (CRITICAL)
+
+    ### Action Selection Priority for Navigation
+    When planning navigation actions, follow this STRICT priority order:
+
+    1. **GoToPage (HIGHEST RELIABILITY - PREFERRED)**
+       - Use when: Target URL is known or can be determined
+       - Best for: Returning to original tabs/pages, switching between known pages, going to homepage
+       - Reliability: 100% - Direct URL manipulation, no UI dependency
+       - Example: Returning to original tab with known URL
+
+    2. **GoBack (HIGH RELIABILITY)**  
+       - Use when: Browser history navigation is appropriate
+       - Best for: Sequential backward navigation
+       - Reliability: 95% - Browser-native functionality
+       - Example: Returning to previous form after submission
+
+    3. **Tap/Click (LOWER RELIABILITY - USE WITH CAUTION)**
+       - Use when: Target URL is unknown AND element interaction is required
+       - Best for: Discovering new pages, triggering dynamic content
+       - Reliability: 60-80% - Depends on element state, page load, icon behavior
+       - Example: Clicking unexplored menu items
+
+    ### Critical Decision Rule
+    **IF you know the target URL → ALWAYS use GoToPage over Tap**
+    - This includes: returning to original tab, going to homepage, switching between tabs
+    - Rationale: URL navigation is deterministic, UI element clicks are probabilistic
+
     ## Actions
 
     Each action includes `type` and `param`, optionally with `locate`.
 
         Each action has a
-        - type: 'Tap', tap the located element
+        - type: 'Tap', tap the located element [USE ONLY WHEN URL UNKNOWN]
         * {{ locate: {{ id: string }}, param: null }}
+        * WARNING: Less reliable for navigation - UI elements may fail or behave inconsistently
+        * Use ONLY when: target URL is unknown AND you need to discover new pages
+        * Do NOT use for: returning to known pages, switching tabs when URLs are available
         - type: 'Hover', move mouse over to the located element
         * {{ locate: {{ id: string }}, param: null }}
         - type: 'Input', replace the value in the input field
@@ -116,9 +147,12 @@ class LLMPrompt:
         - type: 'GetNewPage', get the new page
         * {{ param: null }}
         * use this action when the instruction is a "get new page" statement or "open in new tab" or "open in new window".
-        - type: 'GoToPage', navigate directly to a specific URL
+        - type: 'GoToPage', navigate directly to a specific URL [PREFERRED FOR RELIABLE NAVIGATION]
         * {{ param: {{ url: string }} }}
-        * use this action when you need to navigate to a specific web page URL, useful for returning to homepage or navigating to known pages.
+        * CRITICAL: This is the MOST RELIABLE navigation method - use whenever target URL is known
+        * PREFERRED for: returning to original tab/page, switching between known pages, going to homepage
+        * AVOID clicking UI elements (logos, icons) for navigation when URL is available
+        * Example: To return to original tab, use GoToPage with the original URL instead of clicking browser tabs or page icons
         - type: 'GoBack', navigate back to the previous page
         * {{ param: null }}
         * use this action when you need to go back to the previous page in the browser history, similar to clicking the browser's back button.
@@ -412,6 +446,41 @@ class LLMPrompt:
         }
         ```
 
+        #### Example 8: Return to Original Tab/Page (CRITICAL PATTERN)
+        "Return to the original tab/page where we started"
+        ```json
+        {
+          "actions": [
+            {
+              "type": "GoToPage",
+              "thought": "Using GoToPage for guaranteed navigation back to original URL. This is more reliable than clicking UI elements which may fail or behave unpredictably.",
+              "param": {"url": "https://original-site.com/original-page"},
+              "locate": null
+            }
+          ],
+          "taskWillBeAccomplished": true,
+          "furtherPlan": null,
+          "error": null
+        }
+        ```
+
+        #### Counter-Example: What NOT to do for Navigation
+        "Return to the original tab"
+        ```json
+        // ❌ WRONG - Unreliable approach
+        {
+          "actions": [
+            {
+              "type": "Tap",
+              "thought": "Click the site logo to return",
+              "param": null,
+              "locate": {"id": "1"}
+            }
+          ],
+          "error": "This approach is unreliable - UI elements may not achieve intended navigation"
+        }
+        ```
+
         #### Example of what NOT to do
         - If the action's `locate` is null and element is **not in the screenshot**, don't continue planning. Instead:
         ```json
diff --git a/webqa_agent/testers/case_gen/agents/execute_agent.py b/webqa_agent/testers/case_gen/agents/execute_agent.py
@@ -21,7 +21,51 @@
 from webqa_agent.testers.case_gen.utils.message_converter import convert_intermediate_steps_to_messages
 from webqa_agent.utils.log_icon import icon
 
-LONG_STEPS = 10
+LONG_STEPS = 25
+
+# ============================================================================
+# Critical Failure Detection Patterns
+# ============================================================================
+
+# Literal patterns for exact substring matching (backward compatible)
+CRITICAL_LITERAL_PATTERNS = [
+    "element not found",
+    "cannot find", 
+    "page crashed",
+    "permission denied",
+    "access denied",
+    "network timeout",
+    "browser error",
+    "navigation failed",
+    "session expired",
+    "server error",
+    "connection timeout",
+    "unable to load",
+    "page not accessible",
+    "critical error",
+    "missing locator",
+    "not found in the buffer",
+    "could not be retrieved",
+    "failed due to a missing",
+    "dropdown options could not be retrieved",
+]
+
+# Regex patterns for flexible matching
+CRITICAL_REGEX_PATTERNS = [
+    r"not found in\s+.*buffer",
+    r"failed due to\s+.*missing",
+    r"locator.*not.*found",
+    r"element.*not.*available", 
+    r"missing.*for.*action",
+    r"missing.*parameter",
+    r"element with id.*not found",
+]
+
+# Pre-compile regex for performance
+CRITICAL_REGEX = re.compile(
+    '|'.join(CRITICAL_REGEX_PATTERNS),
+    re.IGNORECASE
+)
 
 # ============================================================================
 # Dynamic Step Generation Helper Functions
@@ -106,10 +150,22 @@ def format_elements_for_llm(dom_diff: dict) -> list[dict]:
         # Add important attribute information
         important_attrs = {}
         if attributes:
-            # Extract important attributes
-            for key in ['class', 'id', 'role', 'type', 'placeholder', 'aria-label']:
-                if key in attributes:
-                    important_attrs[key] = attributes[key]
+            # Define comprehensive attribute whitelist
+            navigation_attrs = ['href', 'target', 'rel', 'download']
+            form_attrs = ['type', 'placeholder', 'value', 'name', 'required', 'disabled']
+            semantic_attrs = ['role', 'aria-label', 'aria-describedby', 'aria-expanded']
+            
+            for key, value in attributes.items():
+                # Include whitelisted attributes
+                if key in ['class', 'id'] + navigation_attrs + form_attrs + semantic_attrs:
+                    important_attrs[key] = value
+                # Include data-* attributes (often contain behavior info)
+                elif key.startswith('data-'):
+                    # Limit length to prevent token explosion
+                    important_attrs[key] = value[:200] if isinstance(value, str) and len(value) > 200 else value
+                # Include style if it indicates visibility/interactivity
+                elif key == 'style' and isinstance(value, str) and ('display' in value or 'visibility' in value):
+                    important_attrs[key] = value[:200] + "..." if len(value) > 200 else value
         
         if important_attrs:
             formatted_elem["attributes"] = important_attrs
@@ -1018,6 +1074,10 @@ def _is_objective_achieved(tool_output: str) -> tuple[bool, str]:
 def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> bool:
     """Check if a single step output indicates a critical failure that should stop execution.
     
+    Uses hybrid detection approach:
+    1. Primary: Structured error tags [CRITICAL_ERROR:category] (preferred)
+    2. Fallback: Pattern matching for backward compatibility and enhanced coverage
+    
     Args:
         tool_output: The output from the step execution
         step_instruction: The instruction that was executed (for context)
@@ -1030,30 +1090,22 @@ def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> b
     
     output_lower = tool_output.lower()
     
-    # Critical failure patterns for immediate exit
-    critical_step_patterns = [
-        "element not found",
-        "cannot find",
-        "page crashed", 
-        "permission denied",
-        "access denied",
-        "network timeout",
-        "browser error",
-        "navigation failed",
-        "session expired",
-        "server error", 
-        "connection timeout",
-        "unable to load",
-        "page not accessible",
-        "critical error"
-    ]
+    # Phase 1: Check for structured critical error tags (preferred method)
+    if "[critical_error:" in output_lower:
+        logging.debug("Critical failure detected via structured error tag")
+        return True
     
-    # Check for critical patterns
-    for pattern in critical_step_patterns:
+    # Phase 2a: Check literal patterns (backward compatibility)
+    for pattern in CRITICAL_LITERAL_PATTERNS:
         if pattern in output_lower:
-            logging.debug(f"Critical failure detected in step: pattern '{pattern}' found")
+            logging.debug(f"Critical failure detected via literal pattern: '{pattern}'")
             return True
     
+    # Phase 2b: Check regex patterns (enhanced matching)
+    if CRITICAL_REGEX.search(output_lower):
+        logging.debug("Critical failure detected via regex pattern")
+        return True
+    
     return False
 
 
diff --git a/webqa_agent/testers/case_gen/prompts/agent_prompts.py b/webqa_agent/testers/case_gen/prompts/agent_prompts.py
@@ -89,6 +89,53 @@ def get_execute_system_prompt(case: dict) -> str:
 - **Single Tool Call**: Execute only ONE `execute_ui_action` or `execute_ui_assertion` per instruction
 - **Error Handling**: If any action in the sequence fails, stop and report the error - do not attempt subsequent actions
 
+## Navigation Reliability Guidelines (CRITICAL)
+
+### Navigation Action Selection Priority
+When executing navigation-related actions, follow these reliability guidelines:
+
+**1. Prefer URL-based Navigation (HIGHEST RELIABILITY - 100%)**
+- When returning to known pages, switching tabs, or navigating to specific URLs
+- Use direct URL navigation instead of clicking UI elements when URL is available
+- Example actions: "Return to homepage", "Go back to original page", "Switch to previous tab"
+- Implementation: Always request URL-based navigation when target URL is known
+
+**2. Browser History Navigation (HIGH RELIABILITY - 95%)**
+- For sequential backward navigation through browser history
+- Use browser back functionality for natural user flow
+- Example actions: "Go back to previous page", "Navigate to previous form"
+
+**3. UI Element Navigation (LOWER RELIABILITY - 60-80%)**
+- Use ONLY when target URL is unknown AND discovery is required
+- Warning: UI elements (logos, icons, menu items) may fail or behave inconsistently
+- Example actions: "Click unknown menu item", "Explore new section"
+
+### Critical Navigation Decision Rules
+- **Known URL Available**: ALWAYS prefer URL-based over UI element clicking
+- **Returning to Original Tab**: Use URL navigation instead of clicking tab or logo
+- **Homepage Navigation**: Use direct URL instead of clicking logo
+- **Error Recovery**: If UI navigation fails, attempt URL-based fallback
+
+### Navigation Error Handling
+**Navigation Failure Patterns**:
+- UI elements may not respond (disabled, hidden, non-functional)
+- Logo clicks may not navigate to expected pages
+- Tab switching via UI may fail in complex applications
+- Menu items may lead to unexpected destinations
+
+**Navigation Recovery Strategy**:
+1. Detect navigation failure through page URL verification
+2. Identify target URL from context or previous navigation
+3. Attempt direct URL-based navigation as fallback
+4. Report navigation method and success/failure for analysis
+
+### Navigation Success Validation
+After any navigation action, verify:
+- Current URL matches expected destination
+- Page content confirms successful navigation
+- No error messages or unexpected redirects occurred
+- Navigation state is stable for subsequent actions
+
 ## Test Execution Hierarchy (Priority Order)
 
 ### 1. Single Action Imperative (HIGHEST PRIORITY)
@@ -221,6 +268,37 @@ def get_execute_system_prompt(case: dict) -> str:
 - Include recovery steps taken for future test improvement
 - Maintain clear audit trail of all actions performed
 
+## Structured Error Reporting Protocol
+
+**Critical Rule**: For failures that should immediately stop test execution, you MUST use structured error tags to ensure reliable detection.
+
+### Critical Error Format
+When encountering critical failures, include structured tags: **[CRITICAL_ERROR:category]** followed by detailed description.
+
+### Critical Error Categories
+- **ELEMENT_NOT_FOUND**: Target element cannot be located, accessed, or interacted with
+- **NAVIGATION_FAILED**: Page navigation, loading, or routing failures  
+- **PERMISSION_DENIED**: Access, authorization, or security restriction issues
+- **PAGE_CRASHED**: Browser crashes, page errors, or unrecoverable page states
+- **NETWORK_ERROR**: Network connectivity, timeout, or server communication issues
+- **SESSION_EXPIRED**: Authentication session, login, or credential issues
+
+### Critical Error Examples
+**Element Access Failure**:
+`[CRITICAL_ERROR:ELEMENT_NOT_FOUND] The language selector dropdown could not be located in the navigation bar. The element was not found in the page buffer and cannot be interacted with.`
+
+**Navigation Issue**:
+`[CRITICAL_ERROR:NAVIGATION_FAILED] Page navigation to the target URL failed due to network timeout. The page is not accessible and the test cannot continue.`
+
+**Permission Issue**:
+`[CRITICAL_ERROR:PERMISSION_DENIED] Access to the admin panel was denied. User lacks sufficient privileges to proceed with the test.`
+
+### Non-Critical Failures
+Standard failures that allow test continuation should use the regular `[FAILURE]` format without structured tags. These include:
+- Validation errors that can be corrected
+- Dropdown option mismatches with alternatives available
+- Minor UI state changes that don't block core functionality
+
 ## Advanced Error Recovery Patterns
 
 ### Pattern 1: Form Validation Errors
diff --git a/webqa_agent/testers/case_gen/prompts/planning_prompts.py b/webqa_agent/testers/case_gen/prompts/planning_prompts.py