infiAgent Also called MLA (Multi-Level Agent) is an agent framework designed for unlimited runtime without tool calling chaos or system crashes caused by cumulative task resources and conversation history. With MLA, you can build powerful general-purpose and semi-specialized agents simply by writing configuration files.
- โ Days-Long Complex Tasks: Supports continuous execution over days without context accumulation or compression degradation. Any interruption (crash, network error, manual stop) can be fully recovered via Resume โ true breakpoint continuation.
- โ Agent Skills Standard: Compatible with the Agent Skills open standard. Drop skill folders into the skills library and agents will discover, load, and execute them on demand.
- โ
Flexible Agent Architecture: Supports both multi-level hierarchy (tree-structured orchestration for complex domain tasks โ e.g., the
Researcherconfig enables long-running scientific research with paper generation) and flat architecture (single agent with one sub-agent + Skills for broad general-purpose tasks โ e.g., the OpenCowork config). - โ Persistent Memory: File-directory-based memory system. Launch agents in the same workspace directory and they remember all historical tasks across sessions โ no external database required.
If you pulled the image or code before the latest update date, please refer to the issues that have been fixed and, based on your needs, pull the image and code again.
-
[2026/03/08] Desktop branch sync update: The current desktop branch now includes packaged Python backend build scripts, the bundled
infiagentPython SDK, configurable runtime cadence (action_window_steps,thinking_interval, scheduled/manualfresh), MCP runtime integration, per-task logs, desktop environment settings, and marketplace integration. The legacy standalone tool-server workflow has been replaced by in-processdirect-tools, and the built-in research system is now namedResearcher. -
[2026/02/09] Mac desktop version released! Click here to download!. Support download skills from offical market. It supports any API that is allowed to be called by tools, and runs fully locally with the support of the localization model.

-
[2026/02/07] Agent Skills Support! InfiAgent now supports the Agent Skills open standard. Skills are folders of instructions, scripts, and resources that agents can dynamically load to improve performance on specialized tasks. Docker users: place skill folders in
~/.mla_v3/skills_library/(mounted to/root/mla_v3/skills_library/inside the container). Local developers: place them in~/mla_v3/skills_library/. Windows users:%USERPROFILE%\mla_v3\skills_library\. The agent will automatically discover available skills and deploy them to the workspace on demand viaload_skilltool. -
[2026/02/07] Multi-Provider Model Support! You can now use models from different providers in the same configuration. Each model can optionally override
api_keyandbase_urlto use a different provider. Different sub-agents can use different models. Seellm_config.example.yamlfor configuration details. -
[2026/02/07] Web UI Enhancements: Added Resume button for recovering interrupted tasks (same as CLI
/resume). Added Agent System selector to freely switch betweenResearcher(academic research) and Open Cowork systems. User inputs now automatically include timestamps (consistent with CLI behavior). -
[2026/02/07] Multimodal Message Architecture: Separated multimodal and text-only message logic. For multimodal models, images from
image_readare embedded directly in the conversation context for native understanding. Text-only models retain the external vision tool approach. Configure viamultimodalandcompressor_multimodalinllm_config.yaml. -
[2026/01/17] We introduce a new configuration profile, Open Cowork, which delivers a computer-work assistant similar to Anthropic's Cowork. After entering a user-specified working directory, the assistant can perform a wide range of tasks, including but not limited to: organizing folders, creating PowerPoint presentations, processing and categorizing bills and invoices in multiple formats, conducting in-depth research, and writing project code. The system is built on the InfiAgent architecture, preserving its long-horizon execution capabilities and unbounded, file-systemโlevel memory within the same workspace. Open Cowork supports CLI, Docker-based CLI, and Web UI modes. In Web UI, use the Agent System selector to switch between
Researcherand OpenCowork. A demonstration video is available for more details.
Open Cowork Demo Videos:
-
[2026/01/13] Supports breakpoint recovery for program errors (the original Ctrl+C resume function is retained). Please access the resume function using your CLI version and type /resume.
-
[2026/01/08] Our Paper "InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents" released
-
[2026/01/07] Web UI: This is a temporary fix for the "ๅค็ไบไปถๅผๅธธ: 'int' object has no attribute 'get'". It will not affect subsequent agent output or operation, but the error will still be displayed. A full fix is โโpending.
-
[2026/01/06] Web UI: add an entry-agent selector next to Task ID so you can choose the root agent for the conversation, with an agent list and a visual agent tree for the selected root.
-
[2026/01/05] Resolves global freeze caused by prolonged unresponsiveness of the primary token. Please update code or pull latest docker image!
-
[2026/01/04] Support different Language of Agent output base on user input.
-
[2026/01/03] Optimize LiteLLMโs native retry mechanism by enhancing error-aware retry prompts to improve small-model call success rates; add connection timeout detection to reduce task interruption risks.
-
[2026/01/02] Install and how use vedio please click infiagent:ๅ จ่ชๅจๅไฝๅทฅๅ ท
-
[2026/01/02] fix some bugs about reference manage, Please clone latest repo or pull latest docker image: chenglinhku/mlav3.
-
[2026/01/01] support web_ui and qwen api. Also fix some problem when using third part oepnai format api. please using latest chenglinhku/mlav3 docker image and see the example configs.
-
[2025/12/31] support gemini api key from google ai studio now. Please See the gemini config in dir.
Attention: Current coding task only support python project. Other language may supported later. In old version execute_command only support safe command like cd or grep๏ผnow it include every commands including rm. Please try to use it in docker mode if your task may edit system file.
complete academic papers generated by MLA:
Demo 1:
Demo 2:
Demo 3:
MLA handles the entire research workflow - from literature search and experiment design to code execution, figure generation, and LaTeX paper writing. All automatically orchestrated through multi-level agents.
- See It In Action
- Quick Start
- How It Works
- Interface Screenshots
- Configuration Guide
- CLI Interface
- SDK Integration
- Example Outputs
infiagent:ๅ จ่ชๅจๅไฝๅทฅๅ ท
1. Install Docker
- Mac/Windows: Docker Desktop
- Linux:
curl -fsSL https://get.docker.com | sh
2. Pull Image
docker pull chenglinhku/mlav3:latest3. Choose Your Mode
Web UI supports both bundled agent systems. Use the Agent System selector to switch between Researcher and OpenCowork.
open localhost:9641 to set keys and base url.
cd /your/workspace
# XXXX is optional port for agent web development (replace with your port like 5002)
docker run -d --name mla \
-e HOST_PWD=$(pwd) \
-v $(pwd):/workspace$(pwd) \
-v ~/.mla_v3:/root/mla_v3 \
-v mla-config:/mla_config \
-p 8002:8002 \
-p 9641:9641 \
-p 4242:4242 \
-p 5002:5002 \
chenglinhku/mlav3:latest webui && docker logs -f mlaThen open browser: http://localhost:4242
default username๏ผuser defaultpassword๏ผpassword
๐ Web UI usage & UI details: see web_ui/README.md.
cd /your/workspace
# XXXX is optional port for agent web development (replace with your port like 5002)
docker run -it --rm \
-e HOST_PWD=$(pwd) \
-v $(pwd):/workspace$(pwd) \
-v ~/.mla_v3:/root/mla_v3 \
-v mla-config:/mla_config \
-p 8002:8002 \
-p 9641:9641 \
-p 5002:5002 \
chenglinhku/mlav3:latest cliWindows Users:
Windows users need to manage conversation IDs manually. Different task IDs maintain different memories.
# CLI Mode (PowerShell)
docker run -it --rm `
-e HOST_PWD="/{your_conversation_id}" `
-v "${PWD}:/workspace/{your_conversation_id}" `
-v "${HOME}\.mla_v3:/root/mla_v3" `
-v mla-config:/mla_config `
-p 8002:8002 `
-p 9641:9641 `
-p 5002:5002 `
chenglinhku/mlav3:latest cli
# Web UI Mode (PowerShell)
docker run -d --name mla-webui `
-e HOST_PWD="/{your_conversation_id}" `
-v "${PWD}:/workspace/{your_conversation_id}" `
-v "${HOME}\.mla_v3:/root/mla_v3" `
-v mla-config:/mla_config `
-p 8002:8002 `
-p 9641:9641 `
-p 4242:4242 `
-p 5002:5002 `
chenglinhku/mlav3:latest webui
# Then open browser: http://localhost:4242
# View logs: docker logs -f mla-webui4. Configure API Key
Open browser: http://localhost:9641
Edit llm_config.yaml, fill in your API key, and save.
๐ Done! Start using MLA CLI.
1. Install the package
# Python 3.9+ is supported. Use Python 3.10+ if you need MCP support through the packaged dependency set.
cd install_path
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
git clone https://github.com/ChenglinPoly/infiAgent.git
cd infiAgent
pip install -e .2. Install Playwright
playwright install chromium3. Configure API Key
mla-agent --config-set api_key "your-api-key"4. Start CLI
cd /your/workspace
mla-agent --cli๐ Complete CLI Guide
MLA's design philosophy is "Provide short but high-value context for the next step." To achieve this, the framework implements multiple innovations:
MLA deploys agents in a tree-structured hierarchy (e.g., Grandparent โ Parent โ Child). This ensures:
- โ Single-purpose agents: Each agent has a focused role
- โ Minimal tool sets: Agents only access necessary tools
- โ Task alignment: Serial execution prevents parallel conflicts
- โ Clear delegation: Parent agents orchestrate child agents
Example Hierarchy:
alpha_agent (Level 3)
โโโ data_collection_agent (Level 2)
โ โโโ web_search_agent (Level 1)
โโโ coder_agent (Level 2)
โโโ material_to_document_agent (Level 2)
Long documents (PDFs, novels, papers) are never directly loaded into context. Instead:
- โ
Use
answer_from_pdf,answer_from_documenttools - โ Query-driven content extraction
- โ Only relevant excerpts or summaries enter context
- โ Application-layer attention allocation through tools
Traditional Approach:
Load entire 50-page PDF โ Agent processes everything โ Token overflow
MLA Approach:
Agent asks: "What is the methodology?"
โ Tool extracts relevant sections (2 pages)
โ Returns concise answer โ Minimal token usage
"Files are everything." All outputs and interactions are saved to the file system:
- โ Web scraping โ Saves as Markdown files
- โ PDF parsing โ Extracts to structured documents
- โ Sub-agent results โ Stored as files
- โ No immediate returns cluttering context
Benefits:
- Clear audit trail
- Reusable artifacts
- Context-free state representation
A key insight: The current file system state represents the effect of all historical actions.
- โ A separate thinking module updates file space state every 10 steps
- โ Agents only retain the last 10 actions (since last state update)
- โ No need for context compression
- โ Historical actions are reflected in file system, not conversation history
Traditional LLM Agents:
Step 1: Create file A
Step 2: Edit file B
...
Step 100: Context overflow โ Compression needed โ Information loss
MLA Approach:
Steps 1-10: Actions recorded
Step 10: Thinking module updates "Current State: Files A, B, C exist with..."
Steps 11-20: Only these + Current State kept
โ No compression, no information loss
Inspired by Claude Code, MLA uses list-based tool parameters to save tokens:
- โ Read multiple files in one call
- โ Batch operations reduce cumulative overhead
- โ Significant token savings on repeated actions
Example:
# Traditional: 3 separate calls
file_read(path="file1.txt")
file_read(path="file2.txt")
file_read(path="file3.txt")
# MLA: 1 batch call
file_read(paths=["file1.txt", "file2.txt", "file3.txt"])- โ Task ID = Workspace absolute path (not user-configurable)
- โ Same task ID allows unlimited conversation sessions
- โ Agents remember all historical tasks in the workspace
- โ Persistent memory across interruptions and restarts
Usage:
# First session
mla-agent --task_id ~/research --user_input "Collect papers on Transformers"
# โ Stores conversation in ~/mla_v3/conversations/{hash}_research_*
# Second session (days later)
mla-agent --task_id ~/research --user_input "Summarize the collected papers"
# โ Agent remembers previous session and accesses collected filesThe hierarchy_manager maintains a dynamic call relationship graph:
- โ Tracks parent-child agent relationships
- โ Injects call graph into shared context
- โ Prevents agents from overstepping boundaries
- โ Maintains task alignment across multi-agent system
Call Graph Example:
{
"current_agent": "coder_agent",
"parent": "alpha_agent",
"siblings": ["data_collection_agent", "material_to_document_agent"],
"allowed_tools": ["python_run", "file_write", "file_read"]
}This ensures coder_agent won't accidentally call web_search (not in its scope) or interfere with sibling agents.
MLA provides a rich interactive CLI with real-time task monitoring, HIL handling, and agent switching:
System Selection:
Tool Mode Configuration:
Starting Tasks:
Interactive CLI with prompt_toolkit and rich terminal UI - featuring multi-turn conversations, automatic HIL detection, and tool execution confirmation.
Build powerful IDE extensions using MLA's JSONL mode:
VS Code extension powered by MLA - seamless integration with workspace context and real-time streaming output.
MLA uses YAML files for agent and tool configuration. Configuration files are located in:
config/
โโโ agent_library/
โ โโโ Researcher/ # Research-oriented multi-level system
โ โ โโโ general_prompts.yaml # Shared prompts
โ โ โโโ level_-1_judge_agent.yaml # Judge agent
โ โ โโโ level_0_tools.yaml # Tool definitions
โ โ โโโ level_1_agents.yaml # Low-level agents
โ โ โโโ level_2_agents.yaml # Mid-level agents
โ โ โโโ level_3_agents.yaml # Top-level agents
โ โโโ OpenCowork/ # General computer-work assistant
โโโ run_env_config/
โโโ llm_config.yaml # LLM settings
โโโ llm_config.example.yaml # Example template
# Global defaults
api_key: "your-api-key"
base_url: "https://openrouter.ai/api/v1"
temperature: 0
max_tokens: 0
models:
- openai/google/gemini-3-flash-preview # uses global api_key + base_url
- name: openai/qwen-plus # override with different provider
api_key: "your-dashscope-key"
base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
figure_models:
- openai/google/gemini-3-flash-preview
compressor_models:
- openai/google/gemini-3-flash-preview
thinking_models:
- openai/google/gemini-3-flash-preview
read_figure_models:
- openai/google/gemini-3-flash-preview
# Multimodal configuration
multimodal: true # Enable image embedding in messages for main model
compressor_multimodal: true # Enable image embedding for compressor modelNote: Copy llm_config.example.yaml to llm_config.yaml to get started. Each model can optionally override api_key and base_url to use a different provider.
MLA organizes agents into levels:
- Level 3: Top-level orchestrators (e.g.,
alpha_agent) - Level 2: Functional specialists (e.g.,
data_collection_agent,coder_agent) - Level 1: Basic executors (e.g.,
web_search_agent) - Level 0: Tool definitions
- Level -1: Quality control (e.g.,
judge_agent)
Edit YAML files to customize agent behavior:
news_agent:
type: llm_call_agent
level: 1
model_type: "advanced"
available_tools:
- data_collection_agent
- coder_agent
...
system_prompt: |
You are a newspaper agent.Start the CLI for a conversational experience:
mla-agent --cliKey Features:
- ๐ Multi-turn conversations with persistent context
- ๐ค Agent switching with
@agent_namesyntax - ๐ Automatic HIL detection with audio alerts
โ ๏ธ Tool execution confirmation in manual mode- โธ๏ธ Interrupt and resume support (Ctrl+C to pause)
- ๐จ Rich terminal UI powered by
prompt_toolkitandrich
Usage Examples:
# Direct task input (uses default agent)
[alpha_agent] > Collect papers on Transformers
# Switch agent and execute task
[alpha_agent] > @data_collection_agent Search for recent NLP papers
# Switch default agent only
[alpha_agent] > @coder_agent
โ
Switched to: coder_agent
[coder_agent] > CLI Commands:
| Command | Description |
|---|---|
/help |
Show help and available commands |
/agents |
List all available agents |
/resume |
Resume interrupted tasks |
/quit or /exit |
Exit CLI mode |
Ctrl+C |
Interrupt current task (stays in CLI) |
Ctrl+D |
Exit CLI immediately |
Human-in-Loop (HIL) Handling:
When an agent requests human input, the CLI automatically detects it:
๐๐๐ Detected HIL task! Press Enter to handle... ๐๐๐
================================================================================
๐ Human Interaction Task (HIL)
================================================================================
๐ Task ID: upload_file_20250124
๐ Instruction: Please upload the required dataset files...
================================================================================
๐ก Enter your response (any text)
Type /skip to skip this task
================================================================================
[alpha_agent] HIL Response > Files uploaded successfully
โ
HIL task responded
Tool Confirmation (Manual Mode):
When --auto-mode false is set, each tool execution requires confirmation:
โ ๏ธโ ๏ธโ ๏ธ Detected tool execution request! Press Enter to confirm... โ ๏ธโ ๏ธโ ๏ธ
================================================================================
โ ๏ธ Tool Execution Confirmation Request
================================================================================
๐ง Tool Name: python_run
๐ Confirmation ID: confirm_12345
๐ Parameters:
code: import numpy as np...
timeout: 300
================================================================================
๐ก Choose action:
yes / y - Approve execution
no / n - Reject execution
================================================================================
[alpha_agent] Confirm [yes/no] > yes
โ
Approved tool execution: python_run
Screenshot: (User will provide)
For scripting and automation:
mla-agent \
--task_id /path/to/workspace \
--user_input "Your task description" \
--agent_name alpha_agentCommon Parameters:
| Parameter | Description | Default |
|---|---|---|
--task_id |
Workspace path (absolute) | Required |
--user_input |
Task description | Required |
--agent_name |
Agent to invoke | alpha_agent |
--agent_system |
Agent library name | Researcher |
--cli |
Interactive CLI mode | false |
--jsonl |
JSONL output mode | false |
--force-new |
Clear all state and start fresh | false |
--auto-mode |
Tool execution mode (true/false) |
Auto-detect |
Auto-Mode Examples:
# Automatic tool execution (no confirmation needed)
mla-agent --task_id ~/project --user_input "Task" --auto-mode true
# Manual confirmation for each tool
mla-agent --task_id ~/project --user_input "Task" --auto-mode false# Tools are executed in-process through direct-tools.
# No standalone mla-tool-server process is required.MLA provides two SDK options: Python SDK for direct integration and JSONL mode for IDE plugins.
Import and use MLA components directly in your Python code:
from pathlib import Path
from utils.config_loader import ConfigLoader
from core.hierarchy_manager import get_hierarchy_manager
from core.agent_executor import AgentExecutor
# Initialize components
task_id = str(Path.home() / "my_project")
agent_system = "Researcher"
config_loader = ConfigLoader(agent_system)
hierarchy_manager = get_hierarchy_manager(task_id)
# Get agent configuration
agent_config = config_loader.get_tool_config("alpha_agent")
# Create and run agent
agent = AgentExecutor(
agent_name="alpha_agent",
agent_config=agent_config,
config_loader=config_loader,
hierarchy_manager=hierarchy_manager
)
# Execute task
result = agent.run(
task_id=task_id,
user_input="Write a survey paper on Transformers"
)
print(f"Status: {result['status']}")
print(f"Output: {result['output']}")Advanced: Custom Agent with Tool Permissions
# Set tool execution mode
agent.tool_executor.set_task_permission(task_id, auto_mode=True)
# Run with custom configuration
result = agent.run(task_id, user_input)
if result['status'] == 'success':
print("Task completed successfully!")
else:
print(f"Error: {result.get('error_information')}")Use Cases for Python SDK:
- ๐ง Building custom workflows
- ๐ค Embedding agents in existing applications
- ๐ Batch processing multiple tasks
- ๐ฌ Research experiments with programmatic control
MLA provides a JSONL streaming mode for real-time integration with IDEs and editors:
mla-agent \
--task_id $(pwd) \
--user_input "Optimize code performance" \
--jsonl 2>/dev/nullOutput Format:
{"type":"start","call_id":"c-1760936557-474c43","project":"~/project","agent":"alpha_agent","task":"Optimize..."}
{"type":"token","text":"[alpha_agent] Analyzing code..."}
{"type":"progress","phase":"execution","pct":30}
{"type":"token","text":"Calling tool: code_analyzer"}
{"type":"result","ok":true,"summary":"Optimization complete"}
{"type":"end","status":"ok","duration_ms":5432}Event Types:
| Event Type | Description | Key Fields |
|---|---|---|
start |
Task begins | call_id, agent, task |
token |
Streaming text output | text |
progress |
Progress update | phase, pct |
result |
Task result | ok, summary |
end |
Task completed | status, duration_ms |
error |
Error occurred | message |
import { spawn } from 'child_process';
interface AgentEvent {
type: 'start' | 'token' | 'progress' | 'result' | 'end' | 'error';
[key: string]: any;
}
function runAgent(
workspacePath: string,
userInput: string,
onEvent: (event: AgentEvent) => void
): Promise<AgentEvent> {
return new Promise((resolve, reject) => {
const child = spawn('mla-agent', [
'--task_id', workspacePath,
'--user_input', userInput,
'--jsonl'
]);
let buffer = '';
child.stdout.on('data', (data) => {
buffer += data.toString();
const lines = buffer.split('\n');
buffer = lines.pop() || '';
lines.forEach(line => {
if (!line.trim()) return;
try {
const event: AgentEvent = JSON.parse(line);
onEvent(event);
if (event.type === 'end') {
resolve(event);
} else if (event.type === 'error') {
reject(new Error(event.message));
}
} catch (e) {
console.error('Failed to parse event:', line);
}
});
});
child.stderr.on('data', (data) => {
// Log errors to stderr
console.error(data.toString());
});
child.on('error', reject);
});
}
// Usage
await runAgent('/path/to/workspace', 'Write unit tests', (event) => {
switch (event.type) {
case 'start':
console.log(`Task started: ${event.task}`);
break;
case 'token':
process.stdout.write(event.text);
break;
case 'progress':
updateProgressBar(event.pct);
break;
case 'result':
console.log(`\nResult: ${event.summary}`);
break;
}
});Build your own Cursor/VS Code extension using MLA:
Extension Features:
- ๐ค Agent commands in command palette
- ๐ฌ Inline chat with workspace context
- ๐ Automatic code generation and refactoring
- ๐ Literature search within editor
- ๐ HIL task handling with UI prompts
Basic Extension Structure:
// extension.ts
import * as vscode from 'vscode';
import { runAgent } from './mla-client';
export function activate(context: vscode.ExtensionContext) {
let disposable = vscode.commands.registerCommand(
'mla.executeTask',
async () => {
const workspace = vscode.workspace.workspaceFolders?.[0].uri.fsPath;
const input = await vscode.window.showInputBox({
prompt: 'Enter task description'
});
if (!workspace || !input) return;
// Show progress
await vscode.window.withProgress({
location: vscode.ProgressLocation.Notification,
title: 'MLA Agent',
cancellable: true
}, async (progress, token) => {
await runAgent(workspace, input, (event) => {
if (event.type === 'token') {
vscode.window.showInformationMessage(event.text);
} else if (event.type === 'progress') {
progress.report({ increment: event.pct });
}
});
});
}
);
context.subscriptions.push(disposable);
}Screenshot: (User will provide)
MLA can generate complete research papers with the following structure:
upload/
โโโ paper.tex # Main LaTeX document
โโโ references.bib # Bibliography
โโโ figures/
โ โโโ architecture.png
โ โโโ results_comparison.png
โ โโโ ablation_study.png
โโโ supplementary/
โโโ detailed_results.pdf
Quality Metrics:
- โ Passes peer review at EI/IEEE conferences
- โ Proper citation formatting
- โ High-quality figures (300 DPI)
- โ Coherent structure and flow
1. Scientific Computing
- ECM protein composition simulation
- Logistics company shift scheduling
- Student assignment grading with feedback
2. General Tasks
- Web scraping and data extraction
- Code generation and debugging
- Document conversion and processing
- Runtime tools are executed in-process via direct-tools; no standalone tool server is required.
- Human-in-the-Loop API - User interaction integration
- Configuration Examples - Agent YAML templates
Contributions are welcome! Please feel free to submit issues or pull requests.
see LICENSE for details.
If you use InfiAgent in your research, please cite our paper:
@article{yu2026infiagent,
title={InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents},
author={Yu, Chenglin and Wang, Yuchen and Wang, Songmiao and Yang, Hongxia and Li, Ming},
journal={arXiv preprint arXiv:2601.03204},
year={2026}
}Author: @yuchenglin
Thanks to Contributors๏ผ @wangyuchen @wangsongmiao @yuyang @lijinjia
Email: yuchenglin96@qq.com/cl0415@connect.hku.hk/chenglin.yu@poly.edu.h
GitHub: MLA V3 Repository











