|
| 1 | +--- |
| 2 | +title: 'How to use the Computer Use Tool' |
| 3 | +titleSuffix: Azure AI Foundry |
| 4 | +description: Find code samples and instructions for using the Computer Use model in the Azure AI Foundry Agent Service. |
| 5 | +services: cognitive-services |
| 6 | +manager: nitinme |
| 7 | +ms.service: azure-ai-agent-service |
| 8 | +ms.topic: how-to |
| 9 | +ms.date: 08/22/2025 |
| 10 | +author: aahill |
| 11 | +ms.author: aahi |
| 12 | +--- |
| 13 | + |
| 14 | +# How to use the Computer Use Tool |
| 15 | + |
| 16 | +Use this article to learn how to use the Computer Use tool with the Azure AI Projects SDK. |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +* The requirements in the [Computer Use Tool overview](./deep-research.md). |
| 21 | +* Your Azure AI Foundry Project endpoint. |
| 22 | + |
| 23 | + |
| 24 | + [!INCLUDE [endpoint-string-portal](../../includes/endpoint-string-portal.md)] |
| 25 | + |
| 26 | + Save this endpoint to an environment variable named `PROJECT_ENDPOINT`. |
| 27 | + |
| 28 | +* The deployment name of your Computer Use model. You can find it in **Models + Endpoints** in the left navigation menu. |
| 29 | + |
| 30 | + :::image type="content" source="../../media/tools/computer-use-model-deployment.png" alt-text="A screenshot showing the model deployment screen the AI Foundry portal." lightbox="../../media/tools/computer-use-model-deployment.png"::: |
| 31 | + |
| 32 | + Save the name of your model's deployment name as an environment variable named `COMPUTER_USE_MODEL_DEPLOYMENT_NAME`. |
| 33 | + |
| 34 | +* Before using the tool, you need to set up an environment that can capture screenshots and execute the recommended actions by the agent. We recommend using a sandboxed environment, such as Playwright for safety reasons. |
| 35 | + |
| 36 | +The Computer Use tool requires the latest prerelease versions of the `azure-ai-projects` library. First we recommend creating a [virtual environment](https://docs.python.org/3/library/venv.html) to work in: |
| 37 | + |
| 38 | +```console |
| 39 | +python -m venv env |
| 40 | +# after creating the virtual environment, activate it with: |
| 41 | +.\env\Scripts\activate |
| 42 | +``` |
| 43 | + |
| 44 | +You can install the package with the following command: |
| 45 | + |
| 46 | +```console |
| 47 | +pip install --pre azure-ai-projects, azure-identity, azure-ai-agents |
| 48 | +``` |
| 49 | + |
| 50 | +## Code example |
| 51 | + |
| 52 | +The following code sample shows a basic API request. Once the initial API request is sent, you would perform a loop where the specified action is performed in your application code, sending a screenshot with each turn so the model can evaluate the updated state of the environment. You can see an example integration for a similar API in the [Azure OpenAI documentation](../../../openai/how-to/computer-use.md#playwright-integration). |
| 53 | + |
| 54 | +```python |
| 55 | +import os, time, base64 |
| 56 | +from typing import List |
| 57 | +from azure.ai.agents.models._models import ComputerScreenshot, TypeAction |
| 58 | +from azure.ai.projects import AIProjectClient |
| 59 | +from azure.ai.agents.models import ( |
| 60 | + MessageRole, |
| 61 | + RunStepToolCallDetails, |
| 62 | + RunStepComputerUseToolCall, |
| 63 | + ComputerUseTool, |
| 64 | + ComputerToolOutput, |
| 65 | + MessageInputContentBlock, |
| 66 | + MessageImageUrlParam, |
| 67 | + MessageInputTextBlock, |
| 68 | + MessageInputImageUrlBlock, |
| 69 | + RequiredComputerUseToolCall, |
| 70 | + SubmitToolOutputsAction, |
| 71 | +) |
| 72 | +from azure.identity import DefaultAzureCredential |
| 73 | + |
| 74 | +def image_to_base64(image_path: str) -> str: |
| 75 | + """ |
| 76 | + Convert an image file to a Base64-encoded string. |
| 77 | +
|
| 78 | + :param image_path: The path to the image file (e.g. 'image_file.png') |
| 79 | + :return: A Base64-encoded string representing the image. |
| 80 | + :raises FileNotFoundError: If the provided file path does not exist. |
| 81 | + :raises OSError: If there's an error reading the file. |
| 82 | + """ |
| 83 | + if not os.path.isfile(image_path): |
| 84 | + raise FileNotFoundError(f"File not found at: {image_path}") |
| 85 | + |
| 86 | + try: |
| 87 | + with open(image_path, "rb") as image_file: |
| 88 | + file_data = image_file.read() |
| 89 | + return base64.b64encode(file_data).decode("utf-8") |
| 90 | + except Exception as exc: |
| 91 | + raise OSError(f"Error reading file '{image_path}'") from exc |
| 92 | + |
| 93 | +asset_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot.jpg")) |
| 94 | +action_result_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot_next.jpg")) |
| 95 | +project_client = AIProjectClient(endpoint=os.environ["PROJECT_ENDPOINT"], credential=DefaultAzureCredential()) |
| 96 | + |
| 97 | +# Initialize Computer Use tool with a browser-sized viewport |
| 98 | +environment = os.environ.get("COMPUTER_USE_ENVIRONMENT", "windows") |
| 99 | +computer_use = ComputerUseTool(display_width=1026, display_height=769, environment=environment) |
| 100 | + |
| 101 | +with project_client: |
| 102 | + |
| 103 | + agents_client = project_client.agents |
| 104 | + |
| 105 | + # Create a new Agent that has the Computer Use tool attached. |
| 106 | + agent = agents_client.create_agent( |
| 107 | + model=os.environ["MODEL_DEPLOYMENT_NAME"], |
| 108 | + name="my-agent-computer-use", |
| 109 | + instructions=""" |
| 110 | + You are an computer automation assistant. |
| 111 | + Use the computer_use_preview tool to interact with the screen when needed. |
| 112 | + """, |
| 113 | + tools=computer_use.definitions, |
| 114 | + ) |
| 115 | + |
| 116 | + print(f"Created agent, ID: {agent.id}") |
| 117 | + |
| 118 | + # Create thread for communication |
| 119 | + thread = agents_client.threads.create() |
| 120 | + print(f"Created thread, ID: {thread.id}") |
| 121 | + |
| 122 | + input_message = ( |
| 123 | + "I can see a web browser with bing.com open and the cursor in the search box." |
| 124 | + "Type 'movies near me' without pressing Enter or any other key. Only type 'movies near me'." |
| 125 | + ) |
| 126 | + image_base64 = image_to_base64(asset_file_path) |
| 127 | + img_url = f"data:image/jpeg;base64,{image_base64}" |
| 128 | + url_param = MessageImageUrlParam(url=img_url, detail="high") |
| 129 | + content_blocks: List[MessageInputContentBlock] = [ |
| 130 | + MessageInputTextBlock(text=input_message), |
| 131 | + MessageInputImageUrlBlock(image_url=url_param), |
| 132 | + ] |
| 133 | + # Create message to thread |
| 134 | + message = agents_client.messages.create(thread_id=thread.id, role=MessageRole.USER, content=content_blocks) |
| 135 | + print(f"Created message, ID: {message.id}") |
| 136 | + |
| 137 | + run = agents_client.runs.create(thread_id=thread.id, agent_id=agent.id) |
| 138 | + print(f"Created run, ID: {run.id}") |
| 139 | + |
| 140 | + # create a fake screenshot showing the text typed in |
| 141 | + result_image_base64 = image_to_base64(action_result_file_path) |
| 142 | + result_img_url = f"data:image/jpeg;base64,{result_image_base64}" |
| 143 | + computer_screenshot = ComputerScreenshot(image_url=result_img_url) |
| 144 | + |
| 145 | + while run.status in ["queued", "in_progress", "requires_action"]: |
| 146 | + time.sleep(1) |
| 147 | + run = agents_client.runs.get(thread_id=thread.id, run_id=run.id) |
| 148 | + |
| 149 | + if run.status == "requires_action" and isinstance(run.required_action, SubmitToolOutputsAction): |
| 150 | + print("Run requires action:") |
| 151 | + tool_calls = run.required_action.submit_tool_outputs.tool_calls |
| 152 | + if not tool_calls: |
| 153 | + print("No tool calls provided - cancelling run") |
| 154 | + agents_client.runs.cancel(thread_id=thread.id, run_id=run.id) |
| 155 | + break |
| 156 | + |
| 157 | + tool_outputs = [] |
| 158 | + for tool_call in tool_calls: |
| 159 | + if isinstance(tool_call, RequiredComputerUseToolCall): |
| 160 | + print(tool_call) |
| 161 | + try: |
| 162 | + action = tool_call.computer_use_preview.action |
| 163 | + print(f"Executing computer use action: {action.type}") |
| 164 | + if isinstance(action, TypeAction): |
| 165 | + print(f" Text to type: {action.text}") |
| 166 | + #(add hook to input text in managed environment API here) |
| 167 | + |
| 168 | + tool_outputs.append( |
| 169 | + ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot) |
| 170 | + ) |
| 171 | + if isinstance(action, ComputerScreenshot): |
| 172 | + print(f" Screenshot requested") |
| 173 | + # (add hook to take screenshot in managed environment API here) |
| 174 | + |
| 175 | + tool_outputs.append( |
| 176 | + ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot) |
| 177 | + ) |
| 178 | + except Exception as e: |
| 179 | + print(f"Error executing tool_call {tool_call.id}: {e}") |
| 180 | + |
| 181 | + print(f"Tool outputs: {tool_outputs}") |
| 182 | + if tool_outputs: |
| 183 | + agents_client.runs.submit_tool_outputs(thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs) |
| 184 | + |
| 185 | + print(f"Current run status: {run.status}") |
| 186 | + |
| 187 | + print(f"Run completed with status: {run.status}") |
| 188 | + if run.status == "failed": |
| 189 | + print(f"Run failed: {run.last_error}") |
| 190 | + |
| 191 | + # Fetch run steps to get the details of the agent run |
| 192 | + run_steps = agents_client.run_steps.list(thread_id=thread.id, run_id=run.id) |
| 193 | + for step in run_steps: |
| 194 | + print(f"Step {step.id} status: {step.status}") |
| 195 | + print(step) |
| 196 | + |
| 197 | + if isinstance(step.step_details, RunStepToolCallDetails): |
| 198 | + print(" Tool calls:") |
| 199 | + run_step_tool_calls = step.step_details.tool_calls |
| 200 | + |
| 201 | + for call in run_step_tool_calls: |
| 202 | + print(f" Tool call ID: {call.id}") |
| 203 | + print(f" Tool call type: {call.type}") |
| 204 | + |
| 205 | + if isinstance(call, RunStepComputerUseToolCall): |
| 206 | + details = call.computer_use_preview |
| 207 | + print(f" Computer use action type: {details.action.type}") |
| 208 | + |
| 209 | + print() # extra newline between tool calls |
| 210 | + |
| 211 | + print() # extra newline between run steps |
| 212 | + |
| 213 | + # Optional: Delete the agent once the run is finished. |
| 214 | + agents_client.delete_agent(agent.id) |
| 215 | + print("Deleted agent") |
| 216 | +``` |
| 217 | + |
| 218 | +## Next steps |
| 219 | + |
| 220 | +* [Python agent samples](https://github.com/azure-ai-foundry/foundry-samples/tree/main/samples/microsoft/python/getting-started-agents) |
| 221 | +* [Azure OpenAI Computer Use example Playwright integration](../../../openai/how-to/computer-use.md#playwright-integration) |
| 222 | + * The Azure OpenAI API has implementation differences compared to the Agent Service, and these examples may need to be adapted to work with agents. |
0 commit comments