Skip to content

Commit fed0d58

Browse files
committed
update
1 parent 39ce85a commit fed0d58

File tree

1 file changed

+14
-11
lines changed

1 file changed

+14
-11
lines changed

articles/ai-services/openai/how-to/responses.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -810,18 +810,21 @@ async def handle_action(page, action):
810810
print(f"\tUnrecognized action: {action_type}")
811811
```
812812

813-
This function attempts to handle various types of actions such as:
814-
815-
- Clicking and dragging the mouse.
816-
- Clicking (left, right, middle buttons).
817-
- Double-clicking.
818-
- Scrolling.
819-
- Key presses (including combinations).
820-
- Typing text.
813+
This function attempts to handle various types of actions. We need to translate between the commands that the `computer-use-preview` will generate and the Playwright library which will execute the actions. For more information refer to the reference documentation for `ComputerAction`.
814+
815+
- [Click](/azure/ai-services/openai/reference-preview#click)
816+
- [DoubleClick](/azure/ai-services/openai/reference-preview#doubleclick)
817+
- [Drag](/azure/ai-services/openai/reference-preview#drag)
818+
- [KeyPress](/azure/ai-services/openai/reference-preview#keypress)
819+
- [Move](/azure/ai-services/openai/reference-preview#move)
820+
- [Screenshot](/azure/ai-services/openai/reference-preview#screenshot)
821+
- [Scroll](/azure/ai-services/openai/reference-preview#scroll)
822+
- [Type](/azure/ai-services/openai/reference-preview#type)
823+
- [Wait](azure/ai-services/openai/reference-preview#wait)
821824

822825
### Screenshot capture
823826

824-
In order for the model to be able to see what it's interacting with the model needs a way to capture screenshots. For this code we're using Playwright to capture the screenshots and we're limiting the view to just the content in the browser window. The screenshot won't include the url bar or other aspects of the browser GUI. If you need the model to see outside the main browser window you could augment the model by creating your own screenshot function.
827+
In order for the model to be able to see what it's interacting with the model needs a way to capture screenshots. For this code we're using Playwright to capture the screenshots and we're limiting the view to just the content in the browser window. The screenshot won't include the url bar or other aspects of the browser GUI. If you need the model to see outside the main browser window you could augment the model by creating your own screenshot function.
825828

826829
```python
827830
async def take_screenshot(page):
@@ -839,7 +842,7 @@ async def take_screenshot(page):
839842
return last_successful_screenshot
840843
```
841844

842-
This function captures the current browser state as an image and returns it as a base64-encoded string, ready to be sent to the model. We'll constantly do this in a loop after each step allowing the model to see if the command it tried to execute was successful or not, which then allows it to adjust based on the contents of the screenshot.
845+
This function captures the current browser state as an image and returns it as a base64-encoded string, ready to be sent to the model. We'll constantly do this in a loop after each step allowing the model to see if the command it tried to execute was successful or not, which then allows it to adjust based on the contents of the screenshot. We could let the model decide if it needs to take a screenshot, but for simplicity we will force a screenshot to be taken for each iteration.
843846

844847
### Model response processing
845848

@@ -1005,7 +1008,7 @@ In this section we have added code that:
10051008
- Handles potential safety checks requiring user confirmation.
10061009
- Executes the requested action.
10071010
- Captures a new screenshot.
1008-
- Sends the updated state back to the model.
1011+
- Sends the updated state back to the model and defines the [`ComputerTool`](azure/ai-services/openai/reference-preview#computertool).
10091012
- Repeats this process for multiple iterations.
10101013

10111014
### Main function

0 commit comments

Comments
 (0)