Skip to content

Commit 3084ddb

Browse files
[WIP]feat: support full page planning and scroll
1 parent 8322704 commit 3084ddb

File tree

11 files changed

+1257
-431
lines changed

11 files changed

+1257
-431
lines changed

webqa_agent/actions/action_executor.py

Lines changed: 439 additions & 23 deletions
Large diffs are not rendered by default.

webqa_agent/actions/action_handler.py

Lines changed: 570 additions & 24 deletions
Large diffs are not rendered by default.

webqa_agent/llm/prompt.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,11 @@ class LLMPrompt:
1414
1515
## Context Provided
1616
- **`pageDescription (interactive elements)`**: A map of all interactive elements on the page, each with a unique ID. Use these IDs for actions.
17-
- **`page_structure (full text content)`**: The complete text content of the page, including non-interactive elements.
1817
- **`Screenshot`**: A visual capture of the current page state.
1918
2019
## Objective
2120
- Decompose the user's instruction into a **series of actionable steps**, each representing a single UI interaction.
22-
- **Unified Context Analysis**: You MUST analyze BOTH `pageDescription` and `page_structure` together. Use `page_structure` to understand the meaning and context of the interactive elements in `pageDescription` (e.g., matching a label to a nearby input field). This unified view is critical for making correct decisions.
21+
- **Unified Context Analysis**: Analyze the `pageDescription` together with the visual `Screenshot`. Use the screenshot to understand the spatial layout and context of the interactive elements (e.g., matching a label to a nearby input field based on their visual positions). This unified view is critical for making correct decisions.
2322
- Identify and locate the target element if applicable.
2423
- Validate if the planned target matches the user's intent, especially in cases of **duplicate or ambiguous elements**.
2524
- Avoid redundant operations such as repeated scrolling or re-executing completed steps.
@@ -187,8 +186,8 @@ class LLMPrompt:
187186
- Example: if you see element '1' with internal id 917, use "id": "1" in your action
188187
189188
### Contextual Decision Making:
190-
- **Crucially, use the `page_structure` (full text content) to understand the context of the interactive elements from `pageDescription`**. For example, if `page_structure` shows "Username:" next to an input field, you know that input field is for the username.
191-
- If you see error text like "Invalid email format" in `page_structure`, use this information to correct your next action.
189+
- **Crucially, use the `Screenshot` to understand the context of the interactive elements from `pageDescription`**. For example, if the screenshot shows "Username:" next to an input field, you know that input field is for the username.
190+
- If you see error text like "Invalid email format" in the screenshot, use this information to correct your next action.
192191
193192
### Supported Actions:
194193
- Tap: Click on a specified page element (such as a button or link). Typically used to trigger a click event.

webqa_agent/testers/case_gen/graph.py

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -85,30 +85,25 @@ async def plan_test_cases(state: MainGraphState) -> Dict[str, List[Dict[str, Any
8585
logging.info(f"Deep crawling page structure and elements for initial test plan...")
8686
page = await ui_tester.get_current_page()
8787
dp = DeepCrawler(page)
88-
await dp.crawl(highlight=True, viewport_only=True)
88+
await dp.crawl(highlight=True, viewport_only=False)
8989
screenshot = await ui_tester._actions.b64_page_screenshot(
90-
file_name="plan_or_replan", save_to_log=False, full_page=False
90+
file_name="plan_or_replan", save_to_log=False, full_page=True
9191
)
9292
await dp.remove_marker()
93-
await dp.crawl(highlight=False, filter_text=True, viewport_only=True)
93+
await dp.crawl(highlight=False, filter_text=True, viewport_only=False)
9494
page_structure = dp.get_text()
9595
logging.debug(f"----- plan cases ---- Page structure: {page_structure}")
9696

9797
business_objectives = state.get("business_objectives", "No specific business objectives provided.")
98-
completed_cases = state.get("completed_cases")
9998

10099
language = state.get('language', 'zh-CN')
101100
system_prompt = get_test_case_planning_system_prompt(
102101
business_objectives=business_objectives,
103-
completed_cases=completed_cases,
104102
language=language,
105103
)
106104

107105
user_prompt = get_test_case_planning_user_prompt(
108106
state_url=state["url"],
109-
completed_cases=completed_cases,
110-
reflection_history=state.get("reflection_history"),
111-
remaining_objectives=state.get("remaining_objectives"),
112107
)
113108

114109
logging.info("Generating initial test plan - Sending request to LLM...")
@@ -283,7 +278,7 @@ async def reflect_and_replan(state: MainGraphState) -> dict:
283278
# Use DeepCrawler to get interactive elements mapping and highlighted screenshot
284279
logging.info(f"Deep crawling page structure and elements for reflection and replanning analysis...")
285280
dp = DeepCrawler(page)
286-
curr = await dp.crawl(highlight=True, viewport_only=True)
281+
curr = await dp.crawl(highlight=True, viewport_only=False)
287282
# Include position information for better replanning decisions
288283
reflect_template = [
289284
str(ElementKey.TAG_NAME),
@@ -294,9 +289,9 @@ async def reflect_and_replan(state: MainGraphState) -> dict:
294289
]
295290
page_content_summary = curr.clean_dict(reflect_template)
296291
logging.debug(f"current page crawled result: {page_content_summary}")
297-
screenshot = await ui_tester._actions.b64_page_screenshot(file_name="reflection", save_to_log=False, full_page=False)
292+
screenshot = await ui_tester._actions.b64_page_screenshot(file_name="reflection", save_to_log=False, full_page=True)
298293
await dp.remove_marker()
299-
await dp.crawl(highlight=False, filter_text=True, viewport_only=True)
294+
await dp.crawl(highlight=False, filter_text=True, viewport_only=False)
300295
page_structure = dp.get_text()
301296
logging.debug(f"----- reflection ---- Page structure: {page_structure}")
302297

webqa_agent/testers/case_gen/prompts/agent_prompts.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ def get_execute_system_prompt(case: dict) -> str:
2727
- **Layout Comprehension**: Analyze the layout to understand the spatial relationship between elements, which is crucial for complex interactions.
2828
- **Anomaly Detection**: Identify unexpected visual states like error pop-ups, unloaded content, or graphical glitches that may not be present in the text structure.
2929
30+
**IMPORTANT - Automatic Viewport Management**:
31+
The system automatically handles element visibility through intelligent scrolling. When you interact with elements (click, hover, type), the system will automatically scroll to ensure the element is in the viewport before performing the action. You do NOT need to manually scroll to elements or worry about elements being outside the visible area. Simply reference elements by their identifiers, and the system will handle viewport positioning automatically.
32+
33+
**IMPORTANT - Screenshot Context**:
34+
The screenshots you receive during test execution show ONLY the current viewport (visible portion of the page), not the entire webpage. While test planning may reference elements from full-page screenshots, your execution screenshots are viewport-limited. This is intentional - the automatic viewport management system ensures that any element you need to interact with will be scrolled into the viewport before your action executes. If you cannot see an element in the current screenshot but it was referenced in the test plan, trust that the system will handle the scrolling automatically.
35+
3036
## Available Tools
3137
You have access to two specialized testing tools:
3238
@@ -281,6 +287,43 @@ def get_execute_system_prompt(case: dict) -> str:
281287
2. Check for dynamic content appearance
282288
3. Retry interaction after content stabilization
283289
290+
### Pattern 5: Automatic Scroll Management Failures
291+
**Scenario**: Element interaction fails due to scroll or viewport positioning issues
292+
**Recognition Signals**:
293+
- Error messages containing "element not in viewport", "not visible", "not clickable", or "scroll failed"
294+
- Element was referenced in test plan from full-page screenshot but not visible in current viewport
295+
- Interaction timeout errors for elements that should exist
296+
297+
**Understanding the Issue**:
298+
The system uses automatic viewport management with intelligent scrolling. When you interact with elements (click, hover, type), the system automatically scrolls to ensure the element is in viewport BEFORE executing your action. This process:
299+
1. Detects if the target element is outside viewport
300+
2. Attempts scroll using CSS selector → XPath → coordinate-based fallback
301+
3. Implements retry logic for lazy-loaded content (up to 3 attempts)
302+
4. Waits for page stability after scroll (handles infinite scroll and dynamic loading)
303+
304+
**Recovery Solution**:
305+
If automatic scroll fails, the error will indicate the specific issue:
306+
1. **Element Not Found**: Element may not exist yet due to lazy loading
307+
- Use `execute_ui_action(action='Sleep', value='2000')` to wait for content to load
308+
- Verify element identifier is correct by checking page structure
309+
- Consider that element may appear conditionally based on previous actions
310+
311+
2. **Scroll Timeout**: Page is loading slowly or has infinite scroll
312+
- Increase wait time: `execute_ui_action(action='Sleep', value='3000')`
313+
- Manually trigger scroll if needed: `execute_ui_action(action='Scroll', value='down')`
314+
- Check for loading spinners or progress indicators
315+
316+
3. **Element Obscured**: Element exists but is covered by another element (modal, overlay, popup)
317+
- Close the obscuring element first (dismiss modal, close popup)
318+
- Use `execute_ui_action(action='KeyboardPress', value='Escape')` to dismiss overlays
319+
- Verify no sticky headers or floating elements are blocking the target
320+
321+
**Important Notes**:
322+
- You do NOT need to manually scroll in normal circumstances - the system handles this automatically
323+
- Only use manual scroll actions when automatic scroll explicitly fails with error messages
324+
- If you see an error about scroll failure, report it as-is - these are rare and indicate system issues
325+
- Trust the automatic viewport management for elements referenced from full-page planning screenshots
326+
284327
## Test Execution Examples
285328
286329
### Example 1: Form Field Validation Recovery
@@ -330,6 +373,29 @@ def get_execute_system_prompt(case: dict) -> str:
330373
**Tool Response**: `[SUCCESS] Action 'Input' on 'username field' completed successfully`
331374
**Agent Reporting**: Report completion of the single action and allow framework to proceed to next step
332375
376+
### Example 8: Mouse Action - Cursor Positioning
377+
**Context**: Drawing canvas requiring precise cursor positioning
378+
**Action**: `execute_ui_action(action='Mouse', target='canvas drawing area', value='move:250,150', description='Position cursor at specific canvas coordinates for drawing')`
379+
**Tool Response**: `[SUCCESS] Action 'Mouse' on 'canvas drawing area' completed successfully. Mouse moved to (250, 150)`
380+
**Use Case**: When standard click/hover actions are insufficient and precise coordinate-based cursor control is needed (e.g., drawing tools, custom interactive visualizations, coordinate-based maps)
381+
382+
### Example 9: Mouse Action - Wheel Scrolling
383+
**Context**: Custom scrollable container with horizontal scroll
384+
**Action**: `execute_ui_action(action='Mouse', target='horizontal gallery container', value='wheel:100,0', description='Scroll gallery horizontally to the right')`
385+
**Tool Response**: `[SUCCESS] Action 'Mouse' on 'horizontal gallery container' completed successfully. Mouse wheel scrolled (deltaX: 100, deltaY: 0)`
386+
**Use Case**: When standard Scroll action doesn't support custom scroll directions or precise delta control needed (e.g., horizontal scrolling, custom scroll containers)
387+
388+
### Example 10: Page Navigation Actions
389+
**Context 1 - Direct Navigation**: Navigate to specific URL for cross-site testing
390+
**Action**: `execute_ui_action(action='GoToPage', target='https://example.com/test-page', description='Navigate to external test page for integration testing')`
391+
**Tool Response**: `[SUCCESS] Action 'GoToPage' on 'https://example.com/test-page' completed successfully. Navigated to page`
392+
**Use Case**: Direct URL navigation for multi-site workflows, external authentication redirects, or testing cross-domain functionality
393+
394+
**Context 2 - Browser Back**: Return to previous page after completing action
395+
**Action**: `execute_ui_action(action='GoBack', target='', description='Navigate back to main product listing page')`
396+
**Tool Response**: `[SUCCESS] Action 'GoBack' completed successfully. Successfully navigated back to previous page`
397+
**Use Case**: Test browser back button functionality, validate state preservation after navigation, or reset to previous page state
398+
333399
## Test Completion Protocol
334400
When all test steps are completed or an unrecoverable error occurs:
335401

0 commit comments

Comments
 (0)