You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: webqa_agent/llm/prompt.py
+27-8Lines changed: 27 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ class LLMPrompt:
51
51
52
52
## Anchor Usage Rule
53
53
Anchors are strictly used for reference during disambiguation.
54
-
**NEVER** interact (Tap/Hover/Check) with anchor elements directly.
54
+
**NEVER** interact (Tap/Hover) with anchor elements directly.
55
55
56
56
## Scroll Behavior Constraints
57
57
- Avoid planning `Scroll` if the page is already at the bottom.
@@ -97,6 +97,8 @@ class LLMPrompt:
97
97
* `value` is the final required input value based on the existing input. No matter what modifications are required, just provide the final value to replace the existing input value.
98
98
* For Input actions, if the page or validation message requires a minimum length, the value you generate MUST strictly meet or exceed this length. For Chinese, count each character as 1.
99
99
* `clear_before_type`: Set to `true` if the instruction explicitly says to 'clear' the field before typing, or if you are correcting a previous failed input. Defaults to `false`.
100
+
- type: 'Clear', clear the content of an input field
101
+
* {{ locate: {{ id: string }}, param: null }}
100
102
- type: 'KeyboardPress', press a key
101
103
* {{ param: {{ value: string }} }}
102
104
- type: 'Upload', upload a file (or click the upload button)
@@ -124,9 +126,6 @@ class LLMPrompt:
124
126
* use this action when you need to go back to the previous page in the browser history, similar to clicking the browser's back button.
125
127
- type: 'Sleep'
126
128
* {{ param: {{ timeMs: number }} }}
127
-
- type: 'Check'
128
-
* {{ param: null }}
129
-
* use this action when the instruction is a "check" or "verify" or "validate" statement.
130
129
- type: 'Drag', drag an slider or element from source to target position
131
130
For Drag action, use the following format:
132
131
{
@@ -152,6 +151,21 @@ class LLMPrompt:
152
151
* selection_path is the text of the option to be selected.
153
152
* if the selection_path is a string, it means the option is the first level of the dropdown.
154
153
* if the selection_path is a list, it means the option is the nth level of the dropdown.
154
+
- type: 'Mouse', unified mouse action for move and wheel
155
+
{
156
+
"param": {
157
+
"op": 'move' | 'wheel',
158
+
// move operation
159
+
"x"?: number,
160
+
"y"?: number,
161
+
// wheel operation
162
+
"deltaX"?: number,
163
+
"deltaY"?: number
164
+
},
165
+
"locate": null
166
+
}
167
+
* When op is omitted, auto-detect by provided fields: x+y => move; deltaX/deltaY => wheel.
168
+
155
169
156
170
## Further Plan Format
157
171
If the task isn't completed:
@@ -178,13 +192,19 @@ class LLMPrompt:
178
192
179
193
### Supported Actions:
180
194
- Tap: Click on a specified page element (such as a button or link). Typically used to trigger a click event.
195
+
- Hover: Move the mouse over a specified page element (such as a button or link). Typically used to show tooltip or hover effect.
181
196
- Scroll: Scroll the page or a specific region. You can specify the direction (down, up), the scroll distance, or scroll to the edge of the page/region.
182
197
- Input: Enter text into an input field or textarea. This action will replace the current value with the specified final value.
198
+
- Clear: Clear the content of an input field. Requires the input's external id in locate.
183
199
- Sleep: Wait for a specified amount of time (in milliseconds). Useful for waiting for page loads or asynchronous content to render.
184
200
- Upload: Upload a file
185
201
- KeyboardPress: Simulate a keyboard key press, such as Enter, Tab, or arrow keys.
186
202
- Drag: Perform a drag-and-drop operation. Moves the mouse from a starting coordinate to a target coordinate, often used for sliders, sorting, or drag-and-drop interfaces. Requires both source and target coordinates.
187
203
- SelectDropdown: Select an option from a dropdown menu which is user's expected option. The dropdown element is the first level of the dropdown menu. IF You can see the dropdown element, you cannot click the dropdown element, you should directly select the option.
204
+
- GoToPage: Navigate directly to a specific URL. Useful for returning to the homepage, navigating to known pages, or entering a new web address. Requires a URL parameter.
205
+
- GoBack: Navigate back to the previous page in the browser history, similar to clicking the browser's back button. Does not require any parameters.
206
+
- GetNewPage: Get the new page or open in new tab or open in new window. Use this action when the previous action (e.g., clicking a link that opens in a new tab) creates a new browser context that needs to be accessed.
207
+
- Mouse: Unified mouse action for move and wheel.
188
208
189
209
Please ensure the output is a valid **JSON** object. Do **not** include any markdown, backticks, or code block indicators.
190
210
@@ -193,7 +213,7 @@ class LLMPrompt:
193
213
"actions": [
194
214
{
195
215
"thought": "Reasoning for this action and why it's feasible on the current page.",
0 commit comments