MicrosoftDocs
diff --git a/‎articles/ai-foundry/agents/how-to/tools/computer-use-samples.md‎
Lines changed: 222 additions & 0 deletions b/‎articles/ai-foundry/agents/how-to/tools/computer-use-samples.md‎
Lines changed: 222 additions & 0 deletions
diff --git a/‎articles/ai-foundry/agents/how-to/tools/computer-use.md‎
Lines changed: 169 additions & 0 deletions b/‎articles/ai-foundry/agents/how-to/tools/computer-use.md‎
Lines changed: 169 additions & 0 deletions
diff --git a/‎articles/ai-foundry/agents/media/tools/computer-use-model-deployment.png‎
159 KB b/‎articles/ai-foundry/agents/media/tools/computer-use-model-deployment.png‎
159 KB
@@ -0,0 +1,222 @@
+---
+title: 'How to use the Computer Use Tool'
+titleSuffix: Azure AI Foundry
+description: Find code samples and instructions for using the Computer Use model in the Azure AI Foundry Agent Service.
+services: cognitive-services
+manager: nitinme
+ms.service: azure-ai-agent-service
+ms.topic: how-to
+ms.date: 08/22/2025
+author: aahill
+ms.author: aahi
+---
+
+# How to use the Computer Use Tool
+
+Use this article to learn how to use the Computer Use tool with the Azure AI Projects SDK.
+
+## Prerequisites
+
+* The requirements in the [Computer Use Tool overview](./deep-research.md).
+* Your Azure AI Foundry Project endpoint.
+
+    
+    [!INCLUDE [endpoint-string-portal](../../includes/endpoint-string-portal.md)]
+
+    Save this endpoint to an environment variable named `PROJECT_ENDPOINT`.
+
+* The deployment name of your Computer Use model. You can find it in **Models + Endpoints** in the left navigation menu.
+
+   :::image type="content" source="../../media/tools/computer-use-model-deployment.png" alt-text="A screenshot showing the model deployment screen the AI Foundry portal." lightbox="../../media/tools/computer-use-model-deployment.png":::
+    
+    Save the name of your model's deployment name as an environment variable named `COMPUTER_USE_MODEL_DEPLOYMENT_NAME`.
+
+* Before using the tool, you need to set up an environment that can capture screenshots and execute the recommended actions by the agent. We recommend using a sandboxed environment, such as Playwright for safety reasons.
+
+The Computer Use tool requires the latest prerelease versions of the `azure-ai-projects` library. First we recommend creating a [virtual environment](https://docs.python.org/3/library/venv.html) to work in:
+
+```console
+python -m venv env
+# after creating the virtual environment, activate it with:
+.\env\Scripts\activate
+```
+
+You can install the package with the following command:
+
+```console
+pip install --pre azure-ai-projects, azure-identity, azure-ai-agents 
+```
+
+## Code example
+
+The following code sample shows a basic API request. Once the initial API request is sent, you would perform a loop where the specified action is performed in your application code, sending a screenshot with each turn so the model can evaluate the updated state of the environment. You can see an example integration for a similar API in the [Azure OpenAI documentation](../../../openai/how-to/computer-use.md#playwright-integration). 
+
+```python
+import os, time, base64
+from typing import List
+from azure.ai.agents.models._models import ComputerScreenshot, TypeAction
+from azure.ai.projects import AIProjectClient
+from azure.ai.agents.models import (
+    MessageRole,
+    RunStepToolCallDetails,
+    RunStepComputerUseToolCall,
+    ComputerUseTool,
+    ComputerToolOutput,
+    MessageInputContentBlock,
+    MessageImageUrlParam,
+    MessageInputTextBlock,
+    MessageInputImageUrlBlock,
+    RequiredComputerUseToolCall,
+    SubmitToolOutputsAction,
+)
+from azure.identity import DefaultAzureCredential
+
+def image_to_base64(image_path: str) -> str:
+    """
+    Convert an image file to a Base64-encoded string.
+
+    :param image_path: The path to the image file (e.g. 'image_file.png')
+    :return: A Base64-encoded string representing the image.
+    :raises FileNotFoundError: If the provided file path does not exist.
+    :raises OSError: If there's an error reading the file.
+    """
+    if not os.path.isfile(image_path):
+        raise FileNotFoundError(f"File not found at: {image_path}")
+
+    try:
+        with open(image_path, "rb") as image_file:
+            file_data = image_file.read()
+        return base64.b64encode(file_data).decode("utf-8")
+    except Exception as exc:
+        raise OSError(f"Error reading file '{image_path}'") from exc
+
+asset_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot.jpg"))
+action_result_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot_next.jpg"))
+project_client = AIProjectClient(endpoint=os.environ["PROJECT_ENDPOINT"], credential=DefaultAzureCredential())
+
+# Initialize Computer Use tool with a browser-sized viewport
+environment = os.environ.get("COMPUTER_USE_ENVIRONMENT", "windows")
+computer_use = ComputerUseTool(display_width=1026, display_height=769, environment=environment)
+
+with project_client:
+
+    agents_client = project_client.agents
+
+    # Create a new Agent that has the Computer Use tool attached.
+    agent = agents_client.create_agent(
+        model=os.environ["MODEL_DEPLOYMENT_NAME"],
+        name="my-agent-computer-use",
+        instructions="""
+            You are an computer automation assistant. 
+            Use the computer_use_preview tool to interact with the screen when needed.
+            """,
+        tools=computer_use.definitions,
+    )
+
+    print(f"Created agent, ID: {agent.id}")
+
+    # Create thread for communication
+    thread = agents_client.threads.create()
+    print(f"Created thread, ID: {thread.id}")
+
+    input_message = (
+        "I can see a web browser with bing.com open and the cursor in the search box."
+        "Type 'movies near me' without pressing Enter or any other key. Only type 'movies near me'."
+    )
+    image_base64 = image_to_base64(asset_file_path)
+    img_url = f"data:image/jpeg;base64,{image_base64}"
+    url_param = MessageImageUrlParam(url=img_url, detail="high")
+    content_blocks: List[MessageInputContentBlock] = [
+        MessageInputTextBlock(text=input_message),
+        MessageInputImageUrlBlock(image_url=url_param),
+    ]
+    # Create message to thread
+    message = agents_client.messages.create(thread_id=thread.id, role=MessageRole.USER, content=content_blocks)
+    print(f"Created message, ID: {message.id}")
+
+    run = agents_client.runs.create(thread_id=thread.id, agent_id=agent.id)
+    print(f"Created run, ID: {run.id}")
+
+    # create a fake screenshot showing the text typed in
+    result_image_base64 = image_to_base64(action_result_file_path)
+    result_img_url = f"data:image/jpeg;base64,{result_image_base64}"
+    computer_screenshot = ComputerScreenshot(image_url=result_img_url)
+
+    while run.status in ["queued", "in_progress", "requires_action"]:
+        time.sleep(1)
+        run = agents_client.runs.get(thread_id=thread.id, run_id=run.id)
+
+        if run.status == "requires_action" and isinstance(run.required_action, SubmitToolOutputsAction):
+            print("Run requires action:")
+            tool_calls = run.required_action.submit_tool_outputs.tool_calls
+            if not tool_calls:
+                print("No tool calls provided - cancelling run")
+                agents_client.runs.cancel(thread_id=thread.id, run_id=run.id)
+                break
+
+            tool_outputs = []
+            for tool_call in tool_calls:
+                if isinstance(tool_call, RequiredComputerUseToolCall):
+                    print(tool_call)
+                    try:
+                        action = tool_call.computer_use_preview.action
+                        print(f"Executing computer use action: {action.type}")
+                        if isinstance(action, TypeAction):
+                            print(f"  Text to type: {action.text}")
+                            #(add hook to input text in managed environment API here)
+
+                            tool_outputs.append(
+                                ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
+                            )
+                        if isinstance(action, ComputerScreenshot):
+                            print(f"  Screenshot requested")
+                            # (add hook to take screenshot in managed environment API here)
+
+                            tool_outputs.append(
+                                ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
+                            )
+                    except Exception as e:
+                        print(f"Error executing tool_call {tool_call.id}: {e}")
+
+            print(f"Tool outputs: {tool_outputs}")
+            if tool_outputs:
+                agents_client.runs.submit_tool_outputs(thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs)
+
+        print(f"Current run status: {run.status}")
+
+    print(f"Run completed with status: {run.status}")
+    if run.status == "failed":
+        print(f"Run failed: {run.last_error}")
+
+    # Fetch run steps to get the details of the agent run
+    run_steps = agents_client.run_steps.list(thread_id=thread.id, run_id=run.id)
+    for step in run_steps:
+        print(f"Step {step.id} status: {step.status}")
+        print(step)
+
+        if isinstance(step.step_details, RunStepToolCallDetails):
+            print("  Tool calls:")
+            run_step_tool_calls = step.step_details.tool_calls
+
+            for call in run_step_tool_calls:
+                print(f"    Tool call ID: {call.id}")
+                print(f"    Tool call type: {call.type}")
+
+                if isinstance(call, RunStepComputerUseToolCall):
+                    details = call.computer_use_preview
+                    print(f"    Computer use action type: {details.action.type}")
+
+                print()  # extra newline between tool calls
+
+        print()  # extra newline between run steps
+
+    # Optional: Delete the agent once the run is finished.
+    agents_client.delete_agent(agent.id)
+    print("Deleted agent")
+```
+
+## Next steps
+
+* [Python agent samples](https://github.com/azure-ai-foundry/foundry-samples/tree/main/samples/microsoft/python/getting-started-agents)
+* [Azure OpenAI Computer Use example Playwright integration](../../../openai/how-to/computer-use.md#playwright-integration)
+    * The Azure OpenAI API has implementation differences compared to the Agent Service, and these examples may need to be adapted to work with agents.
@@ -0,0 +1,169 @@
+---
+title: 'How to use Azure AI Foundry Agent Service Computer Use Tool'
+titleSuffix: Azure AI Foundry
+description: Learn how to use Azure AI Foundry Agent Service Computer Use Tool
+services: cognitive-services
+manager: nitinme
+ms.service: azure-ai-agent-service
+ms.topic: how-to
+ms.date: 09/09/2025
+author: aahill
+ms.author: aahi
+ms.custom: references_regions
+---
+
+# Azure AI Foundry Agent Service Computer Use Tool
+
+> [!WARNING]
+> The Computer Use tool comes with additional significant security and privacy risks, including prompt injection attacks. Learn more about intended uses, capabilities, limitations, risks, and considerations when choosing a use case in the [Azure OpenAI transparency note](../../../responsible-ai/openai/transparency-note.md).
+
+
+
+Use this article to learn how to work with the Computer Use Tool in Azure AI Foundry Agent Service. Computer Use is a specialized AI tool that uses a specialized model that can perform tasks by interacting with computer systems and applications through their user interfaces. With Computer Use, you can create an agent that can handle complex tasks and make decisions by interpreting visual elements and taking action based on on-screen content. 
+
+## Features 
+
+* Autonomous navigation: For example opening applications, clicking buttons, filling out forms, and navigating multi-page workflows. 
+
+* Dynamic adaptation: Interpreting UI changes and adjusting actions accordingly. 
+
+* Cross-application task execution: Can operate across web-based and desktop applications. 
+
+* Natural language interface: Users can describe a task in plain language, and the Computer Use model determines the correct UI interactions to execute. 
+
+## Request access 
+
+For access to the `computer-use-preview` model, registration is required and access will be granted based on Microsoft's eligibility criteria. Customers who have access to other limited access models will still need to request access for this model. 
+
+To request access, see the [application form](https://aka.ms/oai/cuaaccess).
+
+Once access has been granted, you will need to create a deployment for the model. 
+
+## Differences between Browser Automation and Computer Use
+
+The following table lists some of the differences between the Computer Use Tool and [Browser Automation](./browser-automation.md) Tool.
+
+| Feature                        | Browser Automation          | Computer Use Tool          |
+|--------------------------------|-----------------------------|----------------------------|
+| Model support                  | All GPT models              | `Computer-use-preview` model only |
+| Can I visualize what's happening?     | No                          | Yes                        |
+| How it understands the screen  | Parses the HTML or XML pages into DOM documents | Raw pixel data from screenshots |
+| How it acts                    | A list of actions provided by the model | Virtual keyboard and mouse |
+| Is it multi-step?                    | Yes                         | Yes                        |
+| Interfaces                     | Browser                     | Computer and browser       |
+| Do I need to bring my own resource?    | Your own Playwright resource with the keys stored as a connection. | No additional resource required but we highly recommend running this tool in a sandboxed environment.          |
+
+## Regional support 
+
+In order to use the Computer Use Tool, you need to have a [Computer Use model](../../../foundry-models/concepts/models-sold-directly-by-azure.md#computer-use-preview) deployment. The Computer Use model is available in the following regions: 
+* `eastus2` 
+* `swedencentral` 
+* `southindia` 
+
+## Understanding the Computer Use integration 
+
+When working with the Computer Use tool, you typically would perform the following to integrate it into your application. 
+
+1. Send a request to the model that includes a call to the computer use tool, and the display size and environment. You can also include a screenshot of the initial state of the environment in the first API request. 
+
+1. Receive a response from the model. If the response has action items, those items contain suggested actions to make progress toward the specified goal. For example an action might be screenshot so the model can assess the current state with an updated screenshot, or click with X/Y coordinates indicating where the mouse should be moved. 
+
+1. Execute the action using your application code on your computer or browser environment. 
+
+1. After executing the action, capture the updated state of the environment as a screenshot. 
+
+1. Send a new request with the updated state as a `tool_call_output`, and repeat this loop until the model stops requesting actions or you decide to stop. 
+
+    > [!NOTE]
+    > Before using the tool, you need to set up an environment that can capture screenshots and execute the recommended actions by the agent. We recommend using a sandboxed environment, such as Playwright for safety reasons.
+
+## Handling conversation history 
+
+You can use the `tool_call_id` parameter to link the current request to the previous response. Using this parameter is recommended if you don't want to manage the conversation history. 
+
+If you don't use this parameter, you should make sure to include all the items returned in the response output of the previous request in your inputs array. This includes reasoning items if present. 
+
+## Safety checks 
+
+> [!WARNING] 
+> Computer Use carries substantial security and privacy risks and user responsibility. Computer Use comes with significant security and privacy risks. Both errors in judgment by the AI and the presence of malicious or confusing instructions on web pages, desktops, or other operating environments which the AI encounters may cause it to execute commands you or others do not intend, which could compromise the security of your or other users’ browsers, computers, and any accounts to which AI has access, including personal, financial, or enterprise systems.
+> 
+> We strongly recommend using the Computer Use tool on virtual machines with no access to sensitive data or critical resources. Learn more about intended uses, capabilities, limitations, risks, and considerations when choosing a use case in the [Azure OpenAI transparency note](../../../responsible-ai/openai/transparency-note.md).
+
+The API has safety checks to help protect against prompt injection and model mistakes. These checks include: 
+
+**Malicious instruction detection**: The system evaluates the screenshot image and checks if it contains adversarial content that might change the model's behavior. 
+
+**Irrelevant domain detection**: The system evaluates the `current_url` parameter (if provided) and checks if the current domain is considered relevant given the conversation history. 
+
+**Sensitive domain detection**: The system checks the `current_url` parameter (if provided) and raises a warning when it detects the user is on a sensitive domain. 
+
+If one or more of the above checks is triggered, a safety check is raised when the model returns the next `computer_call` with the `pending_safety_checks` parameter. 
+
+```json
+"output": [ 
+    { 
+        "type": "reasoning", 
+        "id": "rs_67cb...", 
+        "summary": [ 
+            { 
+                "type": "summary_text", 
+                "text": "Exploring 'File' menu option." 
+            } 
+        ] 
+    }, 
+    { 
+        "type": "computer_call", 
+        "id": "cu_67cb...", 
+        "call_id": "call_nEJ...", 
+        "action": { 
+            "type": "click", 
+            "button": "left", 
+            "x": 135, 
+            "y": 193 
+        }, 
+        "pending_safety_checks": [ 
+            { 
+                "id": "cu_sc_67cb...", 
+                "code": "malicious_instructions", 
+                "message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed." 
+            } 
+        ], 
+        "status": "completed" 
+    } 
+]
+```
+
+You need to pass the safety checks back as `acknowledged_safety_checks` in the next request in order to proceed. 
+
+```json
+"input":[ 
+        { 
+            "type": "computer_call_output", 
+            "call_id": "<call_id>", 
+            "acknowledged_safety_checks": [ 
+                { 
+                    "id": "<safety_check_id>", 
+                    "code": "malicious_instructions", 
+                    "message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed." 
+                } 
+            ], 
+            "output": { 
+                "type": "computer_screenshot", 
+                "image_url": "<image_url>" 
+            } 
+        } 
+    ]
+```
+
+## Safety check handling 
+
+In all cases where `pending_safety_checks` are returned, actions should be handed over to the end user to confirm proper model behavior and accuracy. 
+
+`malicious_instructions` and `irrelevant_domain`: end users should review model actions and confirm that the model is behaving as intended. 
+
+`sensitive_domain`: ensure an end user is actively monitoring the model actions on these sites. Exact implementation of this "watch mode" can vary by application, but a potential example could be collecting user impression data on the site to make sure there is active end user engagement with the application. 
+
+## Next steps
+
+* [Computer Use code samples](./computer-use-samples.md)