Agent modules are specialized AI task execution units running in AgentBay windows environment to execute tasks described in natural language.
💡 Async API Support: This guide uses synchronous API. For async patterns, see Async Agent API.
Agent modules are specialized AI task execution units running in AgentBay windows/linux environment to execute tasks described in natural language. The task to be executed can be as simple as "Create a word document, input some words and save the document.", in which only one application is involved, or as complex as "Find out the current weather in New York City by Google/Baidu, and write the weather report to a word document, send the word document to a specific email address", in which multiple applications are involved.
Currently, there are three types of agents: ComputerUseAgent, BrowserUseAgent, and MobileUseAgent.
- 🖥️ ComputerUseAgent module is designed for tasks that involve multiple applications。
- 🌐 BrowserUseAgent module is designed for tasks that involve specifically web browsers
- 📱 MobileUseAgent module is designed for tasks that involve mobile device automation.
The agents are capable of understanding user instructions, planning task execution steps, operating various applications, and managing files and folders on the computer or mobile devices.
⚠️ Note: Currently, for agent services (including ComputerUseAgent, BrowserUseAgent, and MobileUseAgent), we do not provide services for overseas users registered with alibabacloud.com.
Agent Module functionality is currently only available on specific system images:
| System Image | Agent Module Support | Available APIs | Supported Agent |
|---|---|---|---|
windows_latest |
✅ Supported | execute_task, get_task_status, terminate_task |
ComputerUseAgent |
linux_latest |
✅ Supported | execute_task, get_task_status, terminate_task |
BrowserUseAgent( |
mobile_latest |
✅ Supported | execute_task, get_task_status, terminate_task |
MobileUseAgent |
browser_latest |
✅ Supported | execute_task, get_task_status, terminate_task |
BrowserUseAgent( |
code_latest |
❌ Not Supported | - | - |
Important: When using ComputerUseAgent Module features, you must create sessions with image_id="windows_latest" to ensure the required MCP tools are available. When using BrowserUseAgent Module features, you must create sessions with image_id="linux_latest". When using MobileUseAgent Module features, you must create sessions with image_id="mobile_latest".
NOTE:
from agentbay import AgentBay, CreateSessionParams
from pydantic import BaseModel
agent_bay = AgentBay()
# Create a session for Agent module usage
agent_params = CreateSessionParams(
image_id="windows_latest",
labels={"project": "ai-agent"}
)
result = agent_bay.create(agent_params)
if result.success:
agent_session = result.session
print(f"Session created with ID: {agent_session.session_id}")
else:
print(f"Session creation failed: {result.error_message}")- Office Automation: Word/Excel/PowerPoint automation
- File Operations: Create/Delete/Move/Copy files and folders
- Information Processing:
- Gather information from webpages
- Extract information from a web page
- Fill forms in a web page
- Text Editing: Using notepad to edit (Read/Write/Edit) text file
# Execute a task using natural language
task_description = "Create a text file named hello.txt in C:\\Temp"
execution_result = agent_session.agent.computer.execute_task_and_wait(task_description, timeout=180)
if execution_result.success:
print("Task completed successfully!")
print(f"Task ID: {execution_result.task_id}")
print(f"Task status: {execution_result.task_status}")
else:
print(f"Task failed: {execution_result.error_message}")# Execute a task using natural language
class OutputSchema(BaseModel):
"""Schema for task result."""
city: str
temperature:str
weather:str
task_description = "Navigate to baidu.com and query the weather in Beijing"
execution_result = agent_session.agent.browser.execute_task_and_wait(
task=task_description,
timeout=180,
use_vision=False,
output_schema=OutputSchema
)
if execution_result.success:
print("Task completed successfully!")
print(f"Task ID: {execution_result.task_id}")
print(f"Task status: {execution_result.task_status}")
else:
print(f"Task failed: {execution_result.error_message}")# Execute a task using natural language
task_description = "Open WeChat app and send a message"
execution_result = agent_session.agent.mobile.execute_task_and_wait(
task_description,
timeout=180,
max_steps=100
)
if execution_result.success:
print("Task completed successfully!")
print(f"Task ID: {execution_result.task_id}")
print(f"Task status: {execution_result.task_status}")
print(f"Task result: {execution_result.task_result}")
else:
print(f"Task failed: {execution_result.error_message}")If you encounter issues with Agent modules:
- Check the Documentation for detailed information
- Search GitHub Issues for similar problems
- Contact support with detailed error information and reproduction steps
- Please refer to the Agent Task Execution Example to see how to use the Agent.
- Please refer to the Agent API Definition for more details.