Agent Modules

Agent modules are specialized AI task execution units running in AgentBay windows environment to execute tasks described in natural language.

💡 Async API Support: This guide uses synchronous API. For async patterns, see Async Agent API.

🤖 Agent Module Overview

Agent modules are specialized AI task execution units running in AgentBay windows/linux environment to execute tasks described in natural language. The task to be executed can be as simple as "Create a word document, input some words and save the document.", in which only one application is involved, or as complex as "Find out the current weather in New York City by Google/Baidu, and write the weather report to a word document, send the word document to a specific email address", in which multiple applications are involved.

Currently, there are three types of agents: ComputerUseAgent, BrowserUseAgent, and MobileUseAgent.

🖥️ ComputerUseAgent module is designed for tasks that involve multiple applications。
🌐 BrowserUseAgent module is designed for tasks that involve specifically web browsers
📱 MobileUseAgent module is designed for tasks that involve mobile device automation.

The agents are capable of understanding user instructions, planning task execution steps, operating various applications, and managing files and folders on the computer or mobile devices.

⚠️ Note: Currently, for agent services (including ComputerUseAgent, BrowserUseAgent, and MobileUseAgent), we do not provide services for overseas users registered with alibabacloud.com.

System Image Support

Agent Module functionality is currently only available on specific system images:

System Image	Agent Module Support	Available APIs	Supported Agent
`windows_latest`	✅ Supported	`execute_task`, `get_task_status`, `terminate_task`	ComputerUseAgent
`linux_latest`	✅ Supported	`execute_task`, `get_task_status`, `terminate_task`	BrowserUseAgent(⚠️BETA)
`mobile_latest`	✅ Supported	`execute_task`, `get_task_status`, `terminate_task`	MobileUseAgent
`browser_latest`	✅ Supported	`execute_task`, `get_task_status`, `terminate_task`	BrowserUseAgent(⚠️BETA)
`code_latest`	❌ Not Supported	-	-

Important: When using ComputerUseAgent Module features, you must create sessions with image_id="windows_latest" to ensure the required MCP tools are available. When using BrowserUseAgent Module features, you must create sessions with image_id="linux_latest". When using MobileUseAgent Module features, you must create sessions with image_id="mobile_latest".

NOTE: ⚠️ BrowserUseAgent is still in beta. Please use with caution.

Creating Agent Sessions

from agentbay import AgentBay, CreateSessionParams
from pydantic import BaseModel

agent_bay = AgentBay()

# Create a session for Agent module usage
agent_params = CreateSessionParams(
    image_id="windows_latest",
    labels={"project": "ai-agent"}
)

result = agent_bay.create(agent_params)
if result.success:
    agent_session = result.session
    print(f"Session created with ID: {agent_session.session_id}")
else:
    print(f"Session creation failed: {result.error_message}")

Agent Capabilities

Office Automation: Word/Excel/PowerPoint automation
File Operations: Create/Delete/Move/Copy files and folders
Information Processing:
- Gather information from webpages
- Extract information from a web page
- Fill forms in a web page
Text Editing: Using notepad to edit (Read/Write/Edit) text file

Task Execution

ComputerUseAgent

# Execute a task using natural language
task_description = "Create a text file named hello.txt in C:\\Temp"
execution_result = agent_session.agent.computer.execute_task_and_wait(task_description, timeout=180)

if execution_result.success:
    print("Task completed successfully!")
    print(f"Task ID: {execution_result.task_id}")
    print(f"Task status: {execution_result.task_status}")
else:
    print(f"Task failed: {execution_result.error_message}")

BrowserUseAgent

# Execute a task using natural language
class OutputSchema(BaseModel):
    """Schema for task result."""
    city: str
    temperature:str
    weather:str

task_description = "Navigate to baidu.com and query the weather in Beijing"
execution_result = agent_session.agent.browser.execute_task_and_wait(
    task=task_description,
    timeout=180,
    use_vision=False,
    output_schema=OutputSchema
)

if execution_result.success:
    print("Task completed successfully!")
    print(f"Task ID: {execution_result.task_id}")
    print(f"Task status: {execution_result.task_status}")
else:
    print(f"Task failed: {execution_result.error_message}")

MobileUseAgent

# Execute a task using natural language
task_description = "Open WeChat app and send a message"
execution_result = agent_session.agent.mobile.execute_task_and_wait(
    task_description,
    timeout=180,
    max_steps=100
)

if execution_result.success:
    print("Task completed successfully!")
    print(f"Task ID: {execution_result.task_id}")
    print(f"Task status: {execution_result.task_status}")
    print(f"Task result: {execution_result.task_result}")
else:
    print(f"Task failed: {execution_result.error_message}")

📚 Related Resources

🆘 Getting Help

If you encounter issues with Agent modules:

Check the Documentation for detailed information
Search GitHub Issues for similar problems
Contact support with detailed error information and reproduction steps
Please refer to the Agent Task Execution Example to see how to use the Agent.
Please refer to the Agent API Definition for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Modules

🤖 Agent Module Overview

System Image Support

Creating Agent Sessions

Agent Capabilities

Task Execution

ComputerUseAgent

BrowserUseAgent

MobileUseAgent

📚 Related Resources

🆘 Getting Help

FilesExpand file tree

agent-modules.md

Latest commit

History

agent-modules.md

File metadata and controls

Agent Modules

🤖 Agent Module Overview

System Image Support

Creating Agent Sessions

Agent Capabilities

Task Execution

ComputerUseAgent

BrowserUseAgent

MobileUseAgent

📚 Related Resources

🆘 Getting Help