Skip to content

Commit fab9277

Browse files
authored
Merge pull request #3737 from MicrosoftDocs/main
Publish to live, Wednesday 4AM PST, 3/26
2 parents 85ad620 + bb8b4b2 commit fab9277

File tree

4 files changed

+22
-13
lines changed

4 files changed

+22
-13
lines changed

articles/ai-services/openai/how-to/responses.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -654,6 +654,8 @@ print(response)
654654

655655
In this section, we provide a simple example script that integrates Azure OpenAI's `computer-use-preview` model with [Playwright](https://playwright.dev/) to automate basic browser interactions. Combining the model with [Playwright](https://playwright.dev/) allows the model to see the browser screen, make decisions, and perform actions like clicking, typing, and navigating websites. You should exercise caution when running this example code. This code is designed to be run locally but should only be executed in a test environment. Use a human to confirm decisions and don't give the model access to sensitive data.
656656

657+
:::image type="content" source="../media/computer-use-preview.gif" alt-text="Animated gif of computer-use-preview model integrated with playwright." lightbox="../media/computer-use-preview.gif":::
658+
657659
First you'll need to install the Python library for [Playwright](https://playwright.dev/).
658660

659661
```cmd
@@ -808,18 +810,21 @@ async def handle_action(page, action):
808810
print(f"\tUnrecognized action: {action_type}")
809811
```
810812

811-
This function attempts to handle various types of actions such as:
813+
This function attempts to handle various types of actions. We need to translate between the commands that the `computer-use-preview` will generate and the Playwright library which will execute the actions. For more information refer to the reference documentation for `ComputerAction`.
812814

813-
- Clicking and dragging the mouse.
814-
- Clicking (left, right, middle buttons).
815-
- Double-clicking.
816-
- Scrolling.
817-
- Key presses (including combinations).
818-
- Typing text.
815+
- [Click](/azure/ai-services/openai/reference-preview#click)
816+
- [DoubleClick](/azure/ai-services/openai/reference-preview#doubleclick)
817+
- [Drag](/azure/ai-services/openai/reference-preview#drag)
818+
- [KeyPress](/azure/ai-services/openai/reference-preview#keypress)
819+
- [Move](/azure/ai-services/openai/reference-preview#move)
820+
- [Screenshot](/azure/ai-services/openai/reference-preview#screenshot)
821+
- [Scroll](/azure/ai-services/openai/reference-preview#scroll)
822+
- [Type](/azure/ai-services/openai/reference-preview#type)
823+
- [Wait](/azure/ai-services/openai/reference-preview#wait)
819824

820825
### Screenshot capture
821826

822-
In order for the model to be able to see what it's interacting with the model needs a way to capture screenshots. For this code we're using Playwright to capture the screenshots and we're limiting the view to just the content in the browser window. The screenshot won't include the url bar or other aspects of the browser GUI. If you need the model to see outside the main browser window you could augment the model by creating your own screenshot function.
827+
In order for the model to be able to see what it's interacting with the model needs a way to capture screenshots. For this code we're using Playwright to capture the screenshots and we're limiting the view to just the content in the browser window. The screenshot won't include the url bar or other aspects of the browser GUI. If you need the model to see outside the main browser window you could augment the model by creating your own screenshot function.
823828

824829
```python
825830
async def take_screenshot(page):
@@ -837,7 +842,7 @@ async def take_screenshot(page):
837842
return last_successful_screenshot
838843
```
839844

840-
This function captures the current browser state as an image and returns it as a base64-encoded string, ready to be sent to the model. We'll constantly do this in a loop after each step allowing the model to see if the command it tried to execute was successful or not, which then allows it to adjust based on the contents of the screenshot.
845+
This function captures the current browser state as an image and returns it as a base64-encoded string, ready to be sent to the model. We'll constantly do this in a loop after each step allowing the model to see if the command it tried to execute was successful or not, which then allows it to adjust based on the contents of the screenshot. We could let the model decide if it needs to take a screenshot, but for simplicity we will force a screenshot to be taken for each iteration.
841846

842847
### Model response processing
843848

@@ -1003,7 +1008,7 @@ In this section we have added code that:
10031008
- Handles potential safety checks requiring user confirmation.
10041009
- Executes the requested action.
10051010
- Captures a new screenshot.
1006-
- Sends the updated state back to the model.
1011+
- Sends the updated state back to the model and defines the [`ComputerTool`](/azure/ai-services/openai/reference-preview#computertool).
10071012
- Repeats this process for multiple iterations.
10081013

10091014
### Main function
@@ -1110,7 +1115,7 @@ The main function:
11101115
### Complete script
11111116

11121117
> [!CAUTION]
1113-
> This code is experimental and for demonstration purposes only. It's only intended to illustrate the basic flow of the responses API and the `computer-use-preview` model. While you can execute this code on your local computer, we strongly recommend running this code on a low privilege virtual machine with no access to sensitive data. This code is for basic testing purposes only.
1118+
> This code is experimental and for demonstration purposes only. It's only intended to illustrate the basic flow of the responses API and the `computer-use-preview` model. While you can execute this code on your local computer, we strongly recommend running this code on a low privilege virtual machine with no access to sensitive data. This code is for basic testing purposes only.
11141119
11151120
```python
11161121
import os
10.5 MB
Loading

articles/ai-services/openai/whats-new.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ Request access: [`computer-use-preview` limited access model application](https:
3131

3232
For more information on model capabilities, and region availability see the [models documentation](./concepts/models.md#computer-use-preview).
3333

34+
:::image type="content" source="./media/computer-use-preview.gif" alt-text="Animated gif of computer-use-preview model integrated with playwright." lightbox="./media/computer-use-preview.gif":::
35+
36+
[Playwright integration demo code](./how-to/responses.md#computer-use).
37+
3438
### Provisioned spillover (preview)
3539

3640
Spillover manages traffic fluctuations on provisioned deployments by routing overages to a designated standard deployment. To learn more about how to maximize utilization for your provisioned deployments with spillover, see [Manage traffic with spillover for provisioned deployments (preview)](./how-to/spillover-traffic-management.md).

articles/ai-services/speech-service/how-to-get-speech-session-id.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,10 @@ Enable logging for your application as described in [this article](how-to-use-lo
4141

4242
### Get Session ID from the log
4343

44-
Open the log file your application produced and look for `SessionId:`. The number that would follow is the Session ID you need. In the following log excerpt example, `0b734c41faf8430380d493127bd44631` is the Session ID.
44+
Open the log file your application produced and look for `SessionId:`. The number that would follow is the Session ID you need. In the following log excerpt example, `0b734c41faf8430380d493127bd44632` is the Session ID.
4545

4646
```
47-
[874193]: 218ms SPX_DBG_TRACE_VERBOSE: audio_stream_session.cpp:1238 [0000023981752A40]CSpxAudioStreamSession::FireSessionStartedEvent: Firing SessionStarted event: SessionId: 0b734c41faf8430380d493127bd44631
47+
[874193]: 218ms SPX_DBG_TRACE_VERBOSE: audio_stream_session.cpp:1238 [0000023981752A40]CSpxAudioStreamSession::FireSessionStartedEvent: Firing SessionStarted event: SessionId: 0b734c41faf8430380d493127bd44632
4848
```
4949
### Get Session ID using JavaScript
5050

0 commit comments

Comments
 (0)