Skip to content

Commit a67d3d8

Browse files
committed
Update readme
1 parent ff152a3 commit a67d3d8

File tree

3 files changed

+47
-16
lines changed

3 files changed

+47
-16
lines changed

README.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,14 +52,39 @@ This includes instructions for model deployment using huggingface endpoint, and
5252

5353
### ✅ Step 2: Post Processing
5454

55-
👉 <a href="codes/action_parser.py">Prediction Post-Processing</a>.
56-
This includes parsing model predictions to executable pyautogui codes.
57-
#### Coordinates processing
55+
#### Installation
56+
```bash
57+
pip install ui-tars
58+
# or
59+
uv pip install ui-tars
60+
```
61+
#### Usage
62+
```python
63+
from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code
64+
65+
response = "Thought: Click the button\nAction: click(start_box='(100,200)')"
66+
original_image_width, original_image_height = 1920, 1080
67+
parsed_dict = parse_action_to_structure_output(
68+
response,
69+
factor=1000,
70+
origin_resized_height=original_image_height,
71+
origin_resized_width=original_image_width,
72+
model_type="qwen25vl"
73+
)
74+
print(parsed_dict)
75+
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
76+
responses=parsed_dict,
77+
image_height=original_image_height,
78+
image_width=original_image_width
79+
)
80+
print(parsed_pyautogui_code)
81+
```
82+
##### FYI: Coordinates visualization
5883
To help you better understand the coordinate processing, we also provide a <a href="README_coordinates.md">guide</a> for coordinates processing visualization.
5984

6085
## Prompt Usage Guide
6186

62-
To accommodate different device environments and task complexities, the following three prompt templates in <a href="codes/prompts.py">codes/prompts.py</a>. are designed to guide GUI agents in generating appropriate actions. Choose the template that best fits your use case:
87+
To accommodate different device environments and task complexities, the following three prompt templates in <a href="codes/ui_tars/prompt.py">codes/ui_tars/prompt.py</a>. are designed to guide GUI agents in generating appropriate actions. Choose the template that best fits your use case:
6388

6489
### 🖥️ `COMPUTER_USE`
6590

codes/README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# ui-tars
22

3-
A python package for parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
3+
A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.
44

55
---
66

77
## Introduction
88

9-
`ui-tars` is a Python package for parsing LLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
9+
`ui-tars` is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
1010

11-
- Supports multiple LLM output formats (e.g., Qwen, Doubao)
11+
- Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
1212
- Automatically handles coordinate scaling and format conversion
1313
- One-click generation of pyautogui automation scripts
1414

@@ -24,12 +24,12 @@ pip install ui-tars
2424
uv pip install ui-tars
2525
```
2626

27-
### Parse LLM output into structured actions
27+
### Parse output into structured actions
2828

2929
```python
30-
from ui_tars.action_parser import parse_action_to_structure_output
30+
from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code
3131

32-
response = "Thought: Click the button\nAction: click(start_box='(0.1,0.2,0.1,0.2)')"
32+
response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
3333
original_image_width, original_image_height = 1920, 1080
3434
parsed_dict = parse_action_to_structure_output(
3535
response,
@@ -39,6 +39,12 @@ parsed_dict = parse_action_to_structure_output(
3939
model_type="doubao"
4040
)
4141
print(parsed_dict)
42+
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
43+
responses=parsed_dict,
44+
image_height=original_image_height,
45+
image_width=original_image_width
46+
)
47+
print(parsed_pyautogui_code)
4248
```
4349

4450
### Generate pyautogui automation script
@@ -90,10 +96,10 @@ def parse_action_to_structure_output(
9096
```
9197

9298
**Description:**
93-
Parses LLM output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.
99+
Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.
94100

95101
**Parameters:**
96-
- `text`: The LLM output string
102+
- `text`: The output string
97103
- `factor`: Scaling factor
98104
- `origin_resized_height`/`origin_resized_width`: Original image height/width
99105
- `model_type`: Model type (e.g., "qwen25vl", "doubao")

codes/tests/action_parser_test.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,15 @@
1414

1515
class TestActionParser(unittest.TestCase):
1616
def test_parse_action(self):
17-
action_str = "click(start_box='(10,20,30,40)')"
17+
action_str = "click(point='<point>200 300</point>')"
1818
result = parse_action(action_str)
1919
self.assertEqual(result['function'], 'click')
20-
self.assertEqual(result['args']['start_box'], '(10,20,30,40)')
20+
self.assertEqual(result['args']['point'], '<point>200 300</point>')
2121

2222
def test_parse_action_to_structure_output(self):
23-
text = "Thought: test\nAction: click(start_box='(10,20,30,40)')"
23+
text = "Thought: test\nAction: click(point='<point>200 300</point>')"
2424
actions = parse_action_to_structure_output(
25-
text, factor=28, origin_resized_height=224, origin_resized_width=224
25+
text, factor=1000, origin_resized_height=224, origin_resized_width=224
2626
)
2727
self.assertEqual(actions[0]['action_type'], 'click')
2828
self.assertIn('start_box', actions[0]['action_inputs'])

0 commit comments

Comments
 (0)