You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This simple utility attempts to prevent out-of-bounds errors by clamping coordinates to the window dimensions.
713
713
714
-
### Action Handling
714
+
### Action handling
715
715
716
716
The core of our browser automation is the action handler that processes various types of user interactions and convert them into actions within the browser.
717
717
@@ -811,7 +811,7 @@ This function attempts to handle various types of actions such as:
811
811
- Key presses (including combinations).
812
812
- Typing text.
813
813
814
-
### Screenshot Capture
814
+
### Screenshot capture
815
815
816
816
In order for the model to be able to see what it's interacting with the model needs a way to capture screenshots. For this code we're using Playwright to capture the screenshots and we're limiting the view to just the content in the browser window. The screenshot won't include the url bar or other aspects of the browser GUI. If you need the model to see outside the main browser window you could augment the model by creating your own screenshot function.
This function captures the current browser state as an image and returns it as a base64-encoded string, ready to be sent to the model. We'll constantly do this in a loop after each step allowing the model to see if the command it tried to execute was successful or not, which then allows it to adjust based on the contents of the screenshot.
826
826
827
-
### Model Response Processing
827
+
### Model response processing
828
828
829
829
This function processes the model's responses and executes the requested actions:
830
830
@@ -966,7 +966,7 @@ In this section we have added code that:
966
966
- Sends the updated state back to the model.
967
967
- Repeats this process for multiple iterations.
968
968
969
-
## Main Function
969
+
###Main function
970
970
971
971
The main function coordinates the entire process:
972
972
@@ -1067,7 +1067,7 @@ The main function:
1067
1067
- Repeats until the user exits.
1068
1068
- Ensures the browser is properly closed.
1069
1069
1070
-
### Complete Script
1070
+
### Complete script
1071
1071
1072
1072
> [!CAUTION]
1073
1073
> This code is experimental andfor demonstration purposes only. It's only intended to illustrate the basic flow of the responses API and the `computer-use-preview` model. While you can execute this code on your local computer, we strongly recommend running this code on a low privilege virtual machine with no access to sensitive data. This code is for basic testing purposes only.
0 commit comments