Skip to content

Commit 539cc60

Browse files
feat: checkpoint3
1 parent fc83354 commit 539cc60

File tree

4 files changed

+308
-29
lines changed

4 files changed

+308
-29
lines changed

webqa_agent/llm/prompt.py

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,44 @@ class LLMPrompt:
8383
- All plans must reflect actual context in screenshot.
8484
- Always output strict **valid JSON**. No comments or markdown.
8585
86+
## Navigation Strategy Guidelines (CRITICAL)
87+
88+
### Action Selection Priority for Navigation
89+
When planning navigation actions, follow this STRICT priority order:
90+
91+
1. **GoToPage (HIGHEST RELIABILITY - PREFERRED)**
92+
- Use when: Target URL is known or can be determined
93+
- Best for: Returning to original tabs/pages, switching between known pages, going to homepage
94+
- Reliability: 100% - Direct URL manipulation, no UI dependency
95+
- Example: Returning to original tab with known URL
96+
97+
2. **GoBack (HIGH RELIABILITY)**
98+
- Use when: Browser history navigation is appropriate
99+
- Best for: Sequential backward navigation
100+
- Reliability: 95% - Browser-native functionality
101+
- Example: Returning to previous form after submission
102+
103+
3. **Tap/Click (LOWER RELIABILITY - USE WITH CAUTION)**
104+
- Use when: Target URL is unknown AND element interaction is required
105+
- Best for: Discovering new pages, triggering dynamic content
106+
- Reliability: 60-80% - Depends on element state, page load, icon behavior
107+
- Example: Clicking unexplored menu items
108+
109+
### Critical Decision Rule
110+
**IF you know the target URL → ALWAYS use GoToPage over Tap**
111+
- This includes: returning to original tab, going to homepage, switching between tabs
112+
- Rationale: URL navigation is deterministic, UI element clicks are probabilistic
113+
86114
## Actions
87115
88116
Each action includes `type` and `param`, optionally with `locate`.
89117
90118
Each action has a
91-
- type: 'Tap', tap the located element
119+
- type: 'Tap', tap the located element [USE ONLY WHEN URL UNKNOWN]
92120
* {{ locate: {{ id: string }}, param: null }}
121+
* WARNING: Less reliable for navigation - UI elements may fail or behave inconsistently
122+
* Use ONLY when: target URL is unknown AND you need to discover new pages
123+
* Do NOT use for: returning to known pages, switching tabs when URLs are available
93124
- type: 'Hover', move mouse over to the located element
94125
* {{ locate: {{ id: string }}, param: null }}
95126
- type: 'Input', replace the value in the input field
@@ -116,9 +147,12 @@ class LLMPrompt:
116147
- type: 'GetNewPage', get the new page
117148
* {{ param: null }}
118149
* use this action when the instruction is a "get new page" statement or "open in new tab" or "open in new window".
119-
- type: 'GoToPage', navigate directly to a specific URL
150+
- type: 'GoToPage', navigate directly to a specific URL [PREFERRED FOR RELIABLE NAVIGATION]
120151
* {{ param: {{ url: string }} }}
121-
* use this action when you need to navigate to a specific web page URL, useful for returning to homepage or navigating to known pages.
152+
* CRITICAL: This is the MOST RELIABLE navigation method - use whenever target URL is known
153+
* PREFERRED for: returning to original tab/page, switching between known pages, going to homepage
154+
* AVOID clicking UI elements (logos, icons) for navigation when URL is available
155+
* Example: To return to original tab, use GoToPage with the original URL instead of clicking browser tabs or page icons
122156
- type: 'GoBack', navigate back to the previous page
123157
* {{ param: null }}
124158
* use this action when you need to go back to the previous page in the browser history, similar to clicking the browser's back button.
@@ -412,6 +446,41 @@ class LLMPrompt:
412446
}
413447
```
414448
449+
#### Example 8: Return to Original Tab/Page (CRITICAL PATTERN)
450+
"Return to the original tab/page where we started"
451+
```json
452+
{
453+
"actions": [
454+
{
455+
"type": "GoToPage",
456+
"thought": "Using GoToPage for guaranteed navigation back to original URL. This is more reliable than clicking UI elements which may fail or behave unpredictably.",
457+
"param": {"url": "https://original-site.com/original-page"},
458+
"locate": null
459+
}
460+
],
461+
"taskWillBeAccomplished": true,
462+
"furtherPlan": null,
463+
"error": null
464+
}
465+
```
466+
467+
#### Counter-Example: What NOT to do for Navigation
468+
"Return to the original tab"
469+
```json
470+
// ❌ WRONG - Unreliable approach
471+
{
472+
"actions": [
473+
{
474+
"type": "Tap",
475+
"thought": "Click the site logo to return",
476+
"param": null,
477+
"locate": {"id": "1"}
478+
}
479+
],
480+
"error": "This approach is unreliable - UI elements may not achieve intended navigation"
481+
}
482+
```
483+
415484
#### Example of what NOT to do
416485
- If the action's `locate` is null and element is **not in the screenshot**, don't continue planning. Instead:
417486
```json

webqa_agent/testers/case_gen/agents/execute_agent.py

Lines changed: 77 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,51 @@
2121
from webqa_agent.testers.case_gen.utils.message_converter import convert_intermediate_steps_to_messages
2222
from webqa_agent.utils.log_icon import icon
2323

24-
LONG_STEPS = 10
24+
LONG_STEPS = 25
25+
26+
# ============================================================================
27+
# Critical Failure Detection Patterns
28+
# ============================================================================
29+
30+
# Literal patterns for exact substring matching (backward compatible)
31+
CRITICAL_LITERAL_PATTERNS = [
32+
"element not found",
33+
"cannot find",
34+
"page crashed",
35+
"permission denied",
36+
"access denied",
37+
"network timeout",
38+
"browser error",
39+
"navigation failed",
40+
"session expired",
41+
"server error",
42+
"connection timeout",
43+
"unable to load",
44+
"page not accessible",
45+
"critical error",
46+
"missing locator",
47+
"not found in the buffer",
48+
"could not be retrieved",
49+
"failed due to a missing",
50+
"dropdown options could not be retrieved",
51+
]
52+
53+
# Regex patterns for flexible matching
54+
CRITICAL_REGEX_PATTERNS = [
55+
r"not found in\s+.*buffer",
56+
r"failed due to\s+.*missing",
57+
r"locator.*not.*found",
58+
r"element.*not.*available",
59+
r"missing.*for.*action",
60+
r"missing.*parameter",
61+
r"element with id.*not found",
62+
]
63+
64+
# Pre-compile regex for performance
65+
CRITICAL_REGEX = re.compile(
66+
'|'.join(CRITICAL_REGEX_PATTERNS),
67+
re.IGNORECASE
68+
)
2569

2670
# ============================================================================
2771
# Dynamic Step Generation Helper Functions
@@ -106,10 +150,22 @@ def format_elements_for_llm(dom_diff: dict) -> list[dict]:
106150
# Add important attribute information
107151
important_attrs = {}
108152
if attributes:
109-
# Extract important attributes
110-
for key in ['class', 'id', 'role', 'type', 'placeholder', 'aria-label']:
111-
if key in attributes:
112-
important_attrs[key] = attributes[key]
153+
# Define comprehensive attribute whitelist
154+
navigation_attrs = ['href', 'target', 'rel', 'download']
155+
form_attrs = ['type', 'placeholder', 'value', 'name', 'required', 'disabled']
156+
semantic_attrs = ['role', 'aria-label', 'aria-describedby', 'aria-expanded']
157+
158+
for key, value in attributes.items():
159+
# Include whitelisted attributes
160+
if key in ['class', 'id'] + navigation_attrs + form_attrs + semantic_attrs:
161+
important_attrs[key] = value
162+
# Include data-* attributes (often contain behavior info)
163+
elif key.startswith('data-'):
164+
# Limit length to prevent token explosion
165+
important_attrs[key] = value[:200] if isinstance(value, str) and len(value) > 200 else value
166+
# Include style if it indicates visibility/interactivity
167+
elif key == 'style' and isinstance(value, str) and ('display' in value or 'visibility' in value):
168+
important_attrs[key] = value[:200] + "..." if len(value) > 200 else value
113169

114170
if important_attrs:
115171
formatted_elem["attributes"] = important_attrs
@@ -1018,6 +1074,10 @@ def _is_objective_achieved(tool_output: str) -> tuple[bool, str]:
10181074
def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> bool:
10191075
"""Check if a single step output indicates a critical failure that should stop execution.
10201076
1077+
Uses hybrid detection approach:
1078+
1. Primary: Structured error tags [CRITICAL_ERROR:category] (preferred)
1079+
2. Fallback: Pattern matching for backward compatibility and enhanced coverage
1080+
10211081
Args:
10221082
tool_output: The output from the step execution
10231083
step_instruction: The instruction that was executed (for context)
@@ -1030,30 +1090,22 @@ def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> b
10301090

10311091
output_lower = tool_output.lower()
10321092

1033-
# Critical failure patterns for immediate exit
1034-
critical_step_patterns = [
1035-
"element not found",
1036-
"cannot find",
1037-
"page crashed",
1038-
"permission denied",
1039-
"access denied",
1040-
"network timeout",
1041-
"browser error",
1042-
"navigation failed",
1043-
"session expired",
1044-
"server error",
1045-
"connection timeout",
1046-
"unable to load",
1047-
"page not accessible",
1048-
"critical error"
1049-
]
1093+
# Phase 1: Check for structured critical error tags (preferred method)
1094+
if "[critical_error:" in output_lower:
1095+
logging.debug("Critical failure detected via structured error tag")
1096+
return True
10501097

1051-
# Check for critical patterns
1052-
for pattern in critical_step_patterns:
1098+
# Phase 2a: Check literal patterns (backward compatibility)
1099+
for pattern in CRITICAL_LITERAL_PATTERNS:
10531100
if pattern in output_lower:
1054-
logging.debug(f"Critical failure detected in step: pattern '{pattern}' found")
1101+
logging.debug(f"Critical failure detected via literal pattern: '{pattern}'")
10551102
return True
10561103

1104+
# Phase 2b: Check regex patterns (enhanced matching)
1105+
if CRITICAL_REGEX.search(output_lower):
1106+
logging.debug("Critical failure detected via regex pattern")
1107+
return True
1108+
10571109
return False
10581110

10591111

webqa_agent/testers/case_gen/prompts/agent_prompts.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,53 @@ def get_execute_system_prompt(case: dict) -> str:
8989
- **Single Tool Call**: Execute only ONE `execute_ui_action` or `execute_ui_assertion` per instruction
9090
- **Error Handling**: If any action in the sequence fails, stop and report the error - do not attempt subsequent actions
9191
92+
## Navigation Reliability Guidelines (CRITICAL)
93+
94+
### Navigation Action Selection Priority
95+
When executing navigation-related actions, follow these reliability guidelines:
96+
97+
**1. Prefer URL-based Navigation (HIGHEST RELIABILITY - 100%)**
98+
- When returning to known pages, switching tabs, or navigating to specific URLs
99+
- Use direct URL navigation instead of clicking UI elements when URL is available
100+
- Example actions: "Return to homepage", "Go back to original page", "Switch to previous tab"
101+
- Implementation: Always request URL-based navigation when target URL is known
102+
103+
**2. Browser History Navigation (HIGH RELIABILITY - 95%)**
104+
- For sequential backward navigation through browser history
105+
- Use browser back functionality for natural user flow
106+
- Example actions: "Go back to previous page", "Navigate to previous form"
107+
108+
**3. UI Element Navigation (LOWER RELIABILITY - 60-80%)**
109+
- Use ONLY when target URL is unknown AND discovery is required
110+
- Warning: UI elements (logos, icons, menu items) may fail or behave inconsistently
111+
- Example actions: "Click unknown menu item", "Explore new section"
112+
113+
### Critical Navigation Decision Rules
114+
- **Known URL Available**: ALWAYS prefer URL-based over UI element clicking
115+
- **Returning to Original Tab**: Use URL navigation instead of clicking tab or logo
116+
- **Homepage Navigation**: Use direct URL instead of clicking logo
117+
- **Error Recovery**: If UI navigation fails, attempt URL-based fallback
118+
119+
### Navigation Error Handling
120+
**Navigation Failure Patterns**:
121+
- UI elements may not respond (disabled, hidden, non-functional)
122+
- Logo clicks may not navigate to expected pages
123+
- Tab switching via UI may fail in complex applications
124+
- Menu items may lead to unexpected destinations
125+
126+
**Navigation Recovery Strategy**:
127+
1. Detect navigation failure through page URL verification
128+
2. Identify target URL from context or previous navigation
129+
3. Attempt direct URL-based navigation as fallback
130+
4. Report navigation method and success/failure for analysis
131+
132+
### Navigation Success Validation
133+
After any navigation action, verify:
134+
- Current URL matches expected destination
135+
- Page content confirms successful navigation
136+
- No error messages or unexpected redirects occurred
137+
- Navigation state is stable for subsequent actions
138+
92139
## Test Execution Hierarchy (Priority Order)
93140
94141
### 1. Single Action Imperative (HIGHEST PRIORITY)
@@ -221,6 +268,37 @@ def get_execute_system_prompt(case: dict) -> str:
221268
- Include recovery steps taken for future test improvement
222269
- Maintain clear audit trail of all actions performed
223270
271+
## Structured Error Reporting Protocol
272+
273+
**Critical Rule**: For failures that should immediately stop test execution, you MUST use structured error tags to ensure reliable detection.
274+
275+
### Critical Error Format
276+
When encountering critical failures, include structured tags: **[CRITICAL_ERROR:category]** followed by detailed description.
277+
278+
### Critical Error Categories
279+
- **ELEMENT_NOT_FOUND**: Target element cannot be located, accessed, or interacted with
280+
- **NAVIGATION_FAILED**: Page navigation, loading, or routing failures
281+
- **PERMISSION_DENIED**: Access, authorization, or security restriction issues
282+
- **PAGE_CRASHED**: Browser crashes, page errors, or unrecoverable page states
283+
- **NETWORK_ERROR**: Network connectivity, timeout, or server communication issues
284+
- **SESSION_EXPIRED**: Authentication session, login, or credential issues
285+
286+
### Critical Error Examples
287+
**Element Access Failure**:
288+
`[CRITICAL_ERROR:ELEMENT_NOT_FOUND] The language selector dropdown could not be located in the navigation bar. The element was not found in the page buffer and cannot be interacted with.`
289+
290+
**Navigation Issue**:
291+
`[CRITICAL_ERROR:NAVIGATION_FAILED] Page navigation to the target URL failed due to network timeout. The page is not accessible and the test cannot continue.`
292+
293+
**Permission Issue**:
294+
`[CRITICAL_ERROR:PERMISSION_DENIED] Access to the admin panel was denied. User lacks sufficient privileges to proceed with the test.`
295+
296+
### Non-Critical Failures
297+
Standard failures that allow test continuation should use the regular `[FAILURE]` format without structured tags. These include:
298+
- Validation errors that can be corrected
299+
- Dropdown option mismatches with alternatives available
300+
- Minor UI state changes that don't block core functionality
301+
224302
## Advanced Error Recovery Patterns
225303
226304
### Pattern 1: Form Validation Errors

0 commit comments

Comments
 (0)