You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: webqa_agent/llm/prompt.py
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,11 @@ class LLMPrompt:
14
14
15
15
## Context Provided
16
16
- **`pageDescription (interactive elements)`**: A map of all interactive elements on the page, each with a unique ID. Use these IDs for actions.
17
-
- **`page_structure (full text content)`**: The complete text content of the page, including non-interactive elements.
18
17
- **`Screenshot`**: A visual capture of the current page state.
19
18
20
19
## Objective
21
20
- Decompose the user's instruction into a **series of actionable steps**, each representing a single UI interaction.
22
-
- **Unified Context Analysis**: You MUST analyze BOTH `pageDescription` and `page_structure` together. Use `page_structure` to understand the meaning and context of the interactive elements in `pageDescription` (e.g., matching a label to a nearby input field). This unified view is critical for making correct decisions.
21
+
- **Unified Context Analysis**: Analyze the `pageDescription` together with the visual `Screenshot`. Use the screenshot to understand the spatial layout and context of the interactive elements (e.g., matching a label to a nearby input field based on their visual positions). This unified view is critical for making correct decisions.
23
22
- Identify and locate the target element if applicable.
24
23
- Validate if the planned target matches the user's intent, especially in cases of **duplicate or ambiguous elements**.
25
24
- Avoid redundant operations such as repeated scrolling or re-executing completed steps.
@@ -187,8 +186,8 @@ class LLMPrompt:
187
186
- Example: if you see element '1' with internal id 917, use "id": "1" in your action
188
187
189
188
### Contextual Decision Making:
190
-
- **Crucially, use the `page_structure` (full text content) to understand the context of the interactive elements from `pageDescription`**. For example, if `page_structure` shows "Username:" next to an input field, you know that input field is for the username.
191
-
- If you see error text like "Invalid email format" in `page_structure`, use this information to correct your next action.
189
+
- **Crucially, use the `Screenshot` to understand the context of the interactive elements from `pageDescription`**. For example, if the screenshot shows "Username:" next to an input field, you know that input field is for the username.
190
+
- If you see error text like "Invalid email format" in the screenshot, use this information to correct your next action.
192
191
193
192
### Supported Actions:
194
193
- Tap: Click on a specified page element (such as a button or link). Typically used to trigger a click event.
- **Layout Comprehension**: Analyze the layout to understand the spatial relationship between elements, which is crucial for complex interactions.
28
28
- **Anomaly Detection**: Identify unexpected visual states like error pop-ups, unloaded content, or graphical glitches that may not be present in the text structure.
29
29
30
+
**IMPORTANT - Automatic Viewport Management**:
31
+
The system automatically handles element visibility through intelligent scrolling. When you interact with elements (click, hover, type), the system will automatically scroll to ensure the element is in the viewport before performing the action. You do NOT need to manually scroll to elements or worry about elements being outside the visible area. Simply reference elements by their identifiers, and the system will handle viewport positioning automatically.
32
+
33
+
**IMPORTANT - Screenshot Context**:
34
+
The screenshots you receive during test execution show ONLY the current viewport (visible portion of the page), not the entire webpage. While test planning may reference elements from full-page screenshots, your execution screenshots are viewport-limited. This is intentional - the automatic viewport management system ensures that any element you need to interact with will be scrolled into the viewport before your action executes. If you cannot see an element in the current screenshot but it was referenced in the test plan, trust that the system will handle the scrolling automatically.
**Scenario**: Element interaction fails due to scroll or viewport positioning issues
292
+
**Recognition Signals**:
293
+
- Error messages containing "element not in viewport", "not visible", "not clickable", or "scroll failed"
294
+
- Element was referenced in test plan from full-page screenshot but not visible in current viewport
295
+
- Interaction timeout errors for elements that should exist
296
+
297
+
**Understanding the Issue**:
298
+
The system uses automatic viewport management with intelligent scrolling. When you interact with elements (click, hover, type), the system automatically scrolls to ensure the element is in viewport BEFORE executing your action. This process:
299
+
1. Detects if the target element is outside viewport
**Action**: `execute_ui_action(action='Mouse', target='canvas drawing area', value='move:250,150', description='Position cursor at specific canvas coordinates for drawing')`
379
+
**Tool Response**: `[SUCCESS] Action 'Mouse' on 'canvas drawing area' completed successfully. Mouse moved to (250, 150)`
380
+
**Use Case**: When standard click/hover actions are insufficient and precise coordinate-based cursor control is needed (e.g., drawing tools, custom interactive visualizations, coordinate-based maps)
381
+
382
+
### Example 9: Mouse Action - Wheel Scrolling
383
+
**Context**: Custom scrollable container with horizontal scroll
384
+
**Action**: `execute_ui_action(action='Mouse', target='horizontal gallery container', value='wheel:100,0', description='Scroll gallery horizontally to the right')`
**Use Case**: When standard Scroll action doesn't support custom scroll directions or precise delta control needed (e.g., horizontal scrolling, custom scroll containers)
387
+
388
+
### Example 10: Page Navigation Actions
389
+
**Context 1 - Direct Navigation**: Navigate to specific URL for cross-site testing
390
+
**Action**: `execute_ui_action(action='GoToPage', target='https://example.com/test-page', description='Navigate to external test page for integration testing')`
391
+
**Tool Response**: `[SUCCESS] Action 'GoToPage' on 'https://example.com/test-page' completed successfully. Navigated to page`
392
+
**Use Case**: Direct URL navigation for multi-site workflows, external authentication redirects, or testing cross-domain functionality
393
+
394
+
**Context 2 - Browser Back**: Return to previous page after completing action
395
+
**Action**: `execute_ui_action(action='GoBack', target='', description='Navigate back to main product listing page')`
396
+
**Tool Response**: `[SUCCESS] Action 'GoBack' completed successfully. Successfully navigated back to previous page`
397
+
**Use Case**: Test browser back button functionality, validate state preservation after navigation, or reset to previous page state
398
+
333
399
## Test Completion Protocol
334
400
When all test steps are completed or an unrecoverable error occurs:
0 commit comments