Skip to content

Commit bd56c0b

Browse files
committed
computer use feedback
1 parent c387712 commit bd56c0b

File tree

2 files changed

+166
-63
lines changed

2 files changed

+166
-63
lines changed

articles/ai-foundry/agents/how-to/tools/computer-use-samples.md

Lines changed: 160 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Use this article to learn how to use the Computer Use tool with the Azure AI Pro
3131

3232
Save the name of your model's deployment name as an environment variable named `COMPUTER_USE_MODEL_DEPLOYMENT_NAME`.
3333

34+
* Before using the tool, you need to set up an environment that can capture screenshots and execute the recommended actions by the agent. We recommend using a sandboxed environment, such as Playwright for safety reasons.
35+
3436
The Computer Use tool requires the latest prerelease versions of the `azure-ai-projects` library. First we recommend creating a [virtual environment](https://docs.python.org/3/library/venv.html) to work in:
3537

3638
```console
@@ -50,69 +52,167 @@ pip install --pre azure-ai-projects, azure-identity, azure-ai-agents
5052
The following code sample shows a basic API request. Once the initial API request is sent, you would perform a loop where the specified action is performed in your application code, sending a screenshot with each turn so the model can evaluate the updated state of the environment. You can see an example integration for a similar API in the [Azure OpenAI documentation](../../../openai/how-to/computer-use.md#playwright-integration).
5153

5254
```python
55+
import os, time, base64
56+
from typing import List
57+
from azure.ai.agents.models._models import ComputerScreenshot, TypeAction
5358
from azure.ai.projects import AIProjectClient
54-
from azure.ai.agents.models import ComputerUseTool
55-
from azure.identity import DefaultAzureCredential
56-
import os
57-
import time
58-
59-
# Initialize client
60-
project_client = AIProjectClient(
61-
endpoint=os.environ["PROJECT_ENDPOINT"],
62-
credential=DefaultAzureCredential(),
63-
)
64-
65-
# Create agent with computer use capability (similar to OpenAI but enhanced)
66-
computer_use_tool = ComputerUseTool(
67-
display_width=1024,
68-
display_height=768,
69-
environment="browser" # "browser", "mac", "windows", "ubuntu"
70-
)
71-
72-
agent = project_client.agents.create_agent(
73-
model=os.environ["COMPUTER_USE_MODEL_DEPLOYMENT_NAME"], # computer-capable model
74-
name="computer-assistant",
75-
instructions="You are an assistant that can interact with computer interfaces to help users automate tasks. Always take a screenshot first to understand the current state.",
76-
tools=[computer_use_tool]
59+
from azure.ai.agents.models import (
60+
MessageRole,
61+
RunStepToolCallDetails,
62+
RunStepComputerUseToolCall,
63+
ComputerUseTool,
64+
ComputerToolOutput,
65+
MessageInputContentBlock,
66+
MessageImageUrlParam,
67+
MessageInputTextBlock,
68+
MessageInputImageUrlBlock,
69+
RequiredComputerUseToolCall,
70+
SubmitToolOutputsAction,
7771
)
72+
from azure.identity import DefaultAzureCredential
7873

79-
# Create a thread for persistent computer automation
80-
thread = project_client.agents.create_thread()
81-
82-
# Ask agent to automate a task
83-
message = project_client.agents.create_message(
84-
thread_id=thread.id,
85-
role="user",
86-
content="Check the latest Azure AI news on bing.com and summarize the top 3 articles"
87-
)
88-
89-
# Run the agent - it will use computer use tool automatically
90-
run = project_client.agents.create_run(
91-
thread_id=thread.id,
92-
agent_id=agent.id
93-
)
94-
95-
# Poll for completion and get results
96-
while run.status in ["queued", "in_progress", "requires_action"]:
97-
if run.status == "requires_action":
98-
# Handle safety checks or user confirmations
99-
required_action = run.required_action
100-
if required_action.type == "computer_call":
101-
# User must acknowledge safety checks
102-
project_client.agents.submit_tool_outputs(
103-
thread_id=thread.id,
104-
run_id=run.id,
105-
tool_outputs=[
106-
# TODO
107-
]
108-
)
109-
110-
time.sleep(1)
111-
run = project_client.agents.retrieve_run(thread_id=thread.id, run_id=run.id)
112-
113-
# Get the final response
114-
messages = project_client.agents.list_messages(thread_id=thread.id)
115-
print(messages.data[0].content[0].text.value)
74+
def image_to_base64(image_path: str) -> str:
75+
"""
76+
Convert an image file to a Base64-encoded string.
77+
78+
:param image_path: The path to the image file (e.g. 'image_file.png')
79+
:return: A Base64-encoded string representing the image.
80+
:raises FileNotFoundError: If the provided file path does not exist.
81+
:raises OSError: If there's an error reading the file.
82+
"""
83+
if not os.path.isfile(image_path):
84+
raise FileNotFoundError(f"File not found at: {image_path}")
85+
86+
try:
87+
with open(image_path, "rb") as image_file:
88+
file_data = image_file.read()
89+
return base64.b64encode(file_data).decode("utf-8")
90+
except Exception as exc:
91+
raise OSError(f"Error reading file '{image_path}'") from exc
92+
93+
asset_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot.jpg"))
94+
action_result_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot_next.jpg"))
95+
project_client = AIProjectClient(endpoint=os.environ["PROJECT_ENDPOINT"], credential=DefaultAzureCredential())
96+
97+
# Initialize Computer Use tool with a browser-sized viewport
98+
environment = os.environ.get("COMPUTER_USE_ENVIRONMENT", "windows")
99+
computer_use = ComputerUseTool(display_width=1026, display_height=769, environment=environment)
100+
101+
with project_client:
102+
103+
agents_client = project_client.agents
104+
105+
# Create a new Agent that has the Computer Use tool attached.
106+
agent = agents_client.create_agent(
107+
model=os.environ["MODEL_DEPLOYMENT_NAME"],
108+
name="my-agent-computer-use",
109+
instructions="""
110+
You are an computer automation assistant.
111+
Use the computer_use_preview tool to interact with the screen when needed.
112+
""",
113+
tools=computer_use.definitions,
114+
)
115+
116+
print(f"Created agent, ID: {agent.id}")
117+
118+
# Create thread for communication
119+
thread = agents_client.threads.create()
120+
print(f"Created thread, ID: {thread.id}")
121+
122+
input_message = (
123+
"I can see a web browser with bing.com open and the cursor in the search box."
124+
"Type 'movies near me' without pressing Enter or any other key. Only type 'movies near me'."
125+
)
126+
image_base64 = image_to_base64(asset_file_path)
127+
img_url = f"data:image/jpeg;base64,{image_base64}"
128+
url_param = MessageImageUrlParam(url=img_url, detail="high")
129+
content_blocks: List[MessageInputContentBlock] = [
130+
MessageInputTextBlock(text=input_message),
131+
MessageInputImageUrlBlock(image_url=url_param),
132+
]
133+
# Create message to thread
134+
message = agents_client.messages.create(thread_id=thread.id, role=MessageRole.USER, content=content_blocks)
135+
print(f"Created message, ID: {message.id}")
136+
137+
run = agents_client.runs.create(thread_id=thread.id, agent_id=agent.id)
138+
print(f"Created run, ID: {run.id}")
139+
140+
# create a fake screenshot showing the text typed in
141+
result_image_base64 = image_to_base64(action_result_file_path)
142+
result_img_url = f"data:image/jpeg;base64,{result_image_base64}"
143+
computer_screenshot = ComputerScreenshot(image_url=result_img_url)
144+
145+
while run.status in ["queued", "in_progress", "requires_action"]:
146+
time.sleep(1)
147+
run = agents_client.runs.get(thread_id=thread.id, run_id=run.id)
148+
149+
if run.status == "requires_action" and isinstance(run.required_action, SubmitToolOutputsAction):
150+
print("Run requires action:")
151+
tool_calls = run.required_action.submit_tool_outputs.tool_calls
152+
if not tool_calls:
153+
print("No tool calls provided - cancelling run")
154+
agents_client.runs.cancel(thread_id=thread.id, run_id=run.id)
155+
break
156+
157+
tool_outputs = []
158+
for tool_call in tool_calls:
159+
if isinstance(tool_call, RequiredComputerUseToolCall):
160+
print(tool_call)
161+
try:
162+
action = tool_call.computer_use_preview.action
163+
print(f"Executing computer use action: {action.type}")
164+
if isinstance(action, TypeAction):
165+
print(f" Text to type: {action.text}")
166+
#(add hook to input text in managed environment API here)
167+
168+
tool_outputs.append(
169+
ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
170+
)
171+
if isinstance(action, ComputerScreenshot):
172+
print(f" Screenshot requested")
173+
# (add hook to take screenshot in managed environment API here)
174+
175+
tool_outputs.append(
176+
ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
177+
)
178+
except Exception as e:
179+
print(f"Error executing tool_call {tool_call.id}: {e}")
180+
181+
print(f"Tool outputs: {tool_outputs}")
182+
if tool_outputs:
183+
agents_client.runs.submit_tool_outputs(thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs)
184+
185+
print(f"Current run status: {run.status}")
186+
187+
print(f"Run completed with status: {run.status}")
188+
if run.status == "failed":
189+
print(f"Run failed: {run.last_error}")
190+
191+
# Fetch run steps to get the details of the agent run
192+
run_steps = agents_client.run_steps.list(thread_id=thread.id, run_id=run.id)
193+
for step in run_steps:
194+
print(f"Step {step.id} status: {step.status}")
195+
print(step)
196+
197+
if isinstance(step.step_details, RunStepToolCallDetails):
198+
print(" Tool calls:")
199+
run_step_tool_calls = step.step_details.tool_calls
200+
201+
for call in run_step_tool_calls:
202+
print(f" Tool call ID: {call.id}")
203+
print(f" Tool call type: {call.type}")
204+
205+
if isinstance(call, RunStepComputerUseToolCall):
206+
details = call.computer_use_preview
207+
print(f" Computer use action type: {details.action.type}")
208+
209+
print() # extra newline between tool calls
210+
211+
print() # extra newline between run steps
212+
213+
# Optional: Delete the agent once the run is finished.
214+
agents_client.delete_agent(agent.id)
215+
print("Deleted agent")
116216
```
117217

118218
## Next steps

articles/ai-foundry/agents/how-to/tools/computer-use.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-agent-service
88
ms.topic: how-to
9-
ms.date: 08/22/2025
9+
ms.date: 09/09/2025
1010
author: aahill
1111
ms.author: aahi
1212
ms.custom: references_regions
@@ -47,11 +47,11 @@ The following table lists some of the differences between the Computer Use Tool
4747
| How it acts | A list of actions provided by the model | Virtual keyboard and mouse |
4848
| Is it multi-step? | Yes | Yes |
4949
| Interfaces | Browser | Computer and browser |
50-
| Do I need to bring my own resource? | Your own Playwright resource with the keys stored as a connection. | No additional resource needed |
50+
| Do I need to bring my own resource? | Your own Playwright resource with the keys stored as a connection. | No additional resource required but we highly recommend running this tool in a sandboxed environment. |
5151

5252
## Regional support
5353

54-
In order to use the Computer Use Tool, you need to have a Computer Use model deployment. The Computer Use model is available in the following regions:
54+
In order to use the Computer Use Tool, you need to have a [Computer Use model](../../../foundry-models/concepts/models-sold-directly-by-azure.md#computer-use-preview) deployment. The Computer Use model is available in the following regions:
5555
* `eastus2`
5656
* `swedencentral`
5757
* `southindia`
@@ -70,6 +70,9 @@ When working with the Computer Use tool, you typically would perform the followi
7070

7171
1. Send a new request with the updated state as a `tool_call_output`, and repeat this loop until the model stops requesting actions or you decide to stop.
7272

73+
> [!NOTE]
74+
> Before using the tool, you need to set up an environment that can capture screenshots and execute the recommended actions by the agent. We recommend using a sandboxed environment, such as Playwright for safety reasons.
75+
7376
## Handling conversation history
7477

7578
You can use the `tool_call_id` parameter to link the current request to the previous response. Using this parameter is recommended if you don't want to manage the conversation history.

0 commit comments

Comments
 (0)