MiroMindAI · shiqian-su · Aug 25, 2025 · Aug 24, 2025 · Aug 25, 2025 · Aug 25, 2025
diff --git a/README.md b/README.md
@@ -9,49 +9,43 @@
 <a href="https://www.xiaohongshu.com/user/profile/663098830000000003033edc"><img src="https://img.shields.io/badge/-grey?style=social&logo=red&label=RedNote" alt="小红书" style="height: 20px;"></a>
 <a href="https://discord.gg/GPqEnkzQZd"><img src="https://img.shields.io/badge/-grey?style=social&logo=discord&label=Discord" alt="Discord" style="height: 20px;"></a>
 <a href="./docs/figs/wechat-group-qr-code.jpg"><img src="https://img.shields.io/badge/-grey?style=social&logo=wechat&label=WeChat" alt="WeChat" style="height: 20px;"></a>
+<a href="https://deepwiki.com/MiroMindAI/MiroFlow"><img src="https://img.shields.io/badge/-grey?style=social&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAyCAYAAAAnWDnqAAAAAXNSR0IArs4c6QAAA05JREFUaEPtmUtyEzEQhtWTQyQLHNak2AB7ZnyXZMEjXMGeK/AIi+QuHrMnbChYY7MIh8g01fJoopFb0uhhEqqcbWTp06/uv1saEDv4O3n3dV60RfP947Mm9/SQc0ICFQgzfc4CYZoTPAswgSJCCUJUnAAoRHOAUOcATwbmVLWdGoH//PB8mnKqScAhsD0kYP3j/Yt5LPQe2KvcXmGvRHcDnpxfL2zOYJ1mFwrryWTz0advv1Ut4CJgf5uhDuDj5eUcAUoahrdY/56ebRWeraTjMt/00Sh3UDtjgHtQNHwcRGOC98BJEAEymycmYcWwOprTgcB6VZ5JK5TAJ+fXGLBm3FDAmn6oPPjR4rKCAoJCal2eAiQp2x0vxTPB3ALO2CRkwmDy5WohzBDwSEFKRwPbknEggCPB/imwrycgxX2NzoMCHhPkDwqYMr9tRcP5qNrMZHkVnOjRMWwLCcr8ohBVb1OMjxLwGCvjTikrsBOiA6fNyCrm8V1rP93iVPpwaE+gO0SsWmPiXB+jikdf6SizrT5qKasx5j8ABbHpFTx+vFXp9EnYQmLx02h1QTTrl6eDqxLnGjporxl3NL3agEvXdT0WmEost648sQOYAeJS9Q7bfUVoMGnjo4AZdUMQku50McDcMWcBPvr0SzbTAFDfvJqwLzgxwATnCgnp4wDl6Aa+Ax283gghmj+vj7feE2KBBRMW3FzOpLOADl0Isb5587h/U4gGvkt5v60Z1VLG8BhYjbzRwyQZemwAd6cCR5/XFWLYZRIMpX39AR0tjaGGiGzLVyhse5C9RKC6ai42ppWPKiBagOvaYk8lO7DajerabOZP46Lby5wKjw1HCRx7p9sVMOWGzb/vA1hwiWc6jm3MvQDTogQkiqIhJV0nBQBTU+3okKCFDy9WwferkHjtxib7t3xIUQtHxnIwtx4mpg26/HfwVNVDb4oI9RHmx5WGelRVlrtiw43zboCLaxv46AZeB3IlTkwouebTr1y2NjSpHz68WNFjHvupy3q8TFn3Hos2IAk4Ju5dCo8B3wP7VPr/FGaKiG+T+v+TQqIrOqMTL1VdWV1DdmcbO8KXBz6esmYWYKPwDL5b5FA1a0hwapHiom0r/cKaoqr+27/XcrS5UwSMbQAAAABJRU5ErkJggg==&label=Deepwiki" alt="DeepWiki"></a>
+<!-- DeepWiki badge generated by https://deepwiki.ryoppippi.com/ -->
 <a href="https://miromind.ai"><img src="https://img.shields.io/badge/-grey?style=social&logo=google-chrome&label=miromind.ai" alt="miromind.ai" style="height: 20px;"></a>
-</p>
-
 
-<p align="center">
-| <a href="https://deepwiki.com/miromind/miroflow"><b>Ask DeepWiki</b></a> | <a href="#-overview"><b>🎯 Overview</b></a> |
-<a href="#-miroflow-sota-performance" target="_blank"><b>✨ Performance</b></a> |
-<a href="#-miroflow-modular-ai-agent-framework" target="_black"><b>🤖 Framework</b> </a> | 
-<a href="#-getting-started" target="_black"><b>🚀 Getting Started</b> </a> | 
-<a href="https://github.com/MiroMindAI/MiroThinker" target="_black"><b>🌟 MiroThinker</b> </a>
 </p>
 
-<!-- <p align="center">
-  <span style="
-    display: inline-block;
-    font-size: 1.5em;
-    font-weight: bold;
-    background: linear-gradient(90deg, #ff4e50, #f9d423, #1e90ff, #32cd32, #ff69b4, #8a2be2, #ff4e50);
-    -webkit-background-clip: text;
-    -webkit-text-fill-color: transparent;
-    background-clip: text;
-    text-fill-color: transparent;
-    animation: rainbow-text 3s linear infinite;
-  ">
-    <a href="https://dr.miromind.ai/" style="color: #1e90ff; text-decoration: underline; text-decoration-thickness: 2px;"><b><u>Try our demo here!</u></b></a>
-  </span>
-</p>
-<style>
-@keyframes rainbow-text {
-  0% { filter: hue-rotate(0deg);}
-  100% { filter: hue-rotate(360deg);}
-}
-</style> -->
+
 
 <p align="center">
 <a href="https://dr.miromind.ai/" style="color:rgb(30, 203, 255); text-decoration: underline; text-decoration-thickness: 2px;"><b><u>Try our demo with MiroThinker here!</u></b></a>
 </p>
 
+## 📚 Table of Contents
+
+- [🎯 Overview](#-overview)
+- [✨ MiroFlow SOTA Performance](#-miroflow-sota-performance)
+- [🤖 MiroFlow: Modular AI Agent Framework](#-miroflow-modular-ai-agent-framework)
+  - [Workflow Overview](#workflow-overview)
+  - [Architecture Components](#architecture-components)
+    - [Core System 💻](#core-system-)
+    - [Tool Integration 🔧](#tool-integration-)
+    - [Agent System 👷](#agent-system-)
+    - [Support Systems ⚙️](#support-systems-️)
+- [🚀 Getting Started](#-getting-started)
+  - [Prerequisites](#prerequisites)
+  - [Runing a single task](#runing-a-single-task)
+  - [Evaluate on Benchmark](#evaluate-on-benchmark)
+  - [[Optional] Customized Configuration](#optional-customized-configuration)
+- [🌟 MiroThinker](#-mirothinker)
+- [❓ FAQ](#-faq)
+- [🎉 Join Our Communities!](#-join-our-communities)
+
 # 🎯 Overview 
 
 <img src="./docs/figs/logo.png" alt="MiroFlow Logo" width="200" align="right">
 
-**MiroFlow** is a **battle-tested** agent framework that reliably completes complex tool-use tasks. We have extensively used it to generate high-quality, post-training agent trace data for **[MiroThinker](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1)**. Some key features are:
+**MiroFlow** is a **battle-tested** agent framework that reliably completes complex tool-use tasks. We have extensively used it to generate high-quality, post-training agent trace data for **[MiroThinker](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1)**, our suite of open-source agentic models. Some key features are:
 
 - 🌟 **Reproducible SOTA**: **MiroFlow** consistently achieves 72.2% (pass@1 average@3) on GAIA validation set. Follow our [getting-started guide](#get-start) below, or view our many runs of gaia trace on huggingfaces. If you can't reproduce our result, please open a Github issue - We take reproducibility seriously.
 - 🌟 **High Concurrency and Fault Tolerance**: **MiroFlow**  scales data collection efficiently and handles rate-limited APIs and unstable network connections with ease.
@@ -82,13 +76,13 @@ MiroFlow is a sophisticated, modular framework for building intelligent AI agent
 MiroFlow handles user queries through a multi-stage and agentic process designed for flexibility and depth. The workflow is organized as follows:
 
 1. **Intent Recognition & Query Augmentation**  
-   User input is first analyzed by Large Language Models (LLMs) to determine intent and enrich the query for deeper understanding.
+   LLMs analyze user input to detect intent and refine the query.
 
 2. **Planning & Task Orchestration**  
-   The main agent examines the enriched query, develops a comprehensive execution plan, and orchestrates the entire workflow—invoking tools, delegating tasks to sub-agents, and driving task progress forward.
+   The main agent drafts an execution plan, invokes tools, and coordinates sub-agents.
 
 3. **Delegation to Sub-Agents**  
-   For complex or domain-specific tasks, the main agent delegates responsibilities to specialized sub-agents (such as `agent-browsing`) that possess targeted expertise. Sub-agents independently plan, act, and execute tool calls as needed.
+   Specialized agents (e.g., agent-browsing) handle complex or domain-specific tasks. Sub-agents independently plan, act, and execute tool calls as needed.
 
 4. **Tool Access via MCP Servers**  
    When external capabilities are required, agents leverage specialized tools by connecting to MCP (Model Context Protocol) servers.
@@ -102,28 +96,23 @@ All core components are located in the `libs/` directory.
 
 ### Core System 💻
 
-**Pipeline** (`./miroflow/src/miroflow/prebuilt/pipeline.py`)  
-Main entry point that coordinates task execution. Creates and manages all components, handles error recovery, and returns final results. Serves as the factory for initializing tool managers, LLM clients, and output formatters.
+- **Pipeline** (`./miroflow/src/miroflow/prebuilt/pipeline.py`): Main entry point that creates and manages all components, handles error recovery, and returns final results
 
-**Orchestrator** (`./miroflow/src/miroflow/prebuilt/orchestrator.py`)  
-Manages the conversation flow between LLM and tools. Handles multi-turn conversations, parses tool calls from LLM responses, executes tools, delegates tasks to sub-agents, and manages contexts.
+- **Orchestrator** (`./miroflow/src/miroflow/prebuilt/orchestrator.py`): Manages multi-turn conversations, parses tool calls, executes tools, and delegates to sub-agents
 
-**LLM Client** (`./miroflow/src/miroflow/llm/client.py`)  
-Provides a unified interface for various LLM providers (Anthropic, OpenAI, Google, Qwen, DeepSeek, local deployments, etc.). Manages authentication, request formatting, retry logic, token usage tracking, and supports streaming responses.
+- **LLM Client** (`./miroflow/src/miroflow/llm/client.py`): Unified interface supporting Anthropic, OpenAI, Google, Qwen, DeepSeek, and local deployments
 
 ### Tool Integration 🔧
 
-**Tool Manager** (`./miroflow-tool/src/miroflow/tool/manager.py`)  
-Comprehensive MCP server connection manager. Handles tool discovery, maintains persistent server connections, executes tool calls with advanced error handling, and supports flexible tool blacklisting.
+- **Tool Manager** (`./miroflow-tool/src/miroflow/tool/manager.py`) : Comprehensive MCP server connection manager with tool discovery, persistent connections, and error handling
 
-**MCP Servers** (`./miroflow-tool/src/miroflow/tool/mcp_servers/`)  
-Individual tool implementations built on FastMCP. Provides extensive capabilities including:
-- Code execution and analysis (`./python_server.py`)
-- Visual perception (`./vision_mcp_server.py`)
-- Web search and content retrieval (`./searching_mcp_server.py`)
-- Audio transcription (`./audio_mcp_server.py`)
-- Enhanced reasoning capabilities (`./reasoning_mcp_server.py`)
-- Document processing and analysis (`./reading_mcp_server.py`)
+- **MCP Servers** (`./miroflow-tool/src/miroflow/tool/mcp_servers/`) : Individual tool implementations built on FastMCP. Provides extensive capabilities including:
+  - Code execution and analysis (`./python_server.py`)
+  - Visual perception (`./vision_mcp_server.py`)
+  - Web search and content retrieval (`./searching_mcp_server.py`)
+  - Audio transcription (`./audio_mcp_server.py`)
+  - Enhanced reasoning capabilities (`./reasoning_mcp_server.py`)
+  - Document processing and analysis (`./reading_mcp_server.py`)
 
 ### Agent System 👷
 
@@ -132,14 +121,11 @@ Specialized agents designed for specific domains (e.g., `agent-browsing` for web
 
 ### Support Systems ⚙️
 
-**Configuration System** (`./miroflow/src/miroflow/prebuilt/config/`)  
-Hydra-powered configuration management with structured YAML files covering agents, LLMs, benchmarks, and pricing models.
+- **Configuration System** (`./miroflow/src/miroflow/prebuilt/config/`) : Hydra-powered YAML configuration for agents, LLMs, benchmarks, and pricing
 
-**Output Formatter** (`./miroflow/src/miroflow/utils/io_utils.py`)  
-Intelligent response formatting system that adapts agent outputs to various benchmark requirements, extracts structured answers, and handles multiple output formats seamlessly.
+- **Output Formatter** (`./miroflow/src/miroflow/utils/io_utils.py`) : Intelligent response formatting that adapts to various benchmark requirements
 
-**Task Logger** (`./miroflow/src/miroflow/logging/`)  
-Comprehensive logging infrastructure that captures agent interactions, tool executions, performance metrics, and error traces for debugging, analysis, and system optimization.
+- **Task Logger** (`./miroflow/src/miroflow/logging/`) : Comprehensive logging for agent interactions, tool executions, and performance metrics
 
 <a id="get-start"></a>
 # 🚀 Getting Started
@@ -154,6 +140,7 @@ Comprehensive logging infrastructure that captures agent interactions, tool exec
 ## clone the repo
 git clone https://github.com/MiroMindAI/MiroFlow
 cd MiroFlow/apps/run-agent
+
 ## prepare python environment
 uv sync
 ```
@@ -164,35 +151,61 @@ a. Set up `MiroFlow/apps/prepare-benchmark/.env` by:
 ```bash
 ## copy environment variable template and prepare yours in .env file
 cd MiroFlow/apps/prepare-benchmark
+
+# Edit .env with your actual API keys
 cp .env.template .env
-vim .env 
 ```
-Required environment variables:  
-- `HF_TOKEN` (for downloading datasets from Hugging Face) 
-
-Optional environment variables:
-- `DATA_DIR` (Data loading directory, by default `../../data`)
+Edit `.env` to configure environment variables:  
+```
+# For downloading datasets from Hugging Face
+HF_TOKEN="<your-huggingface-token>"
 
+# [Optional] Data loading directory, by default `../../data`
+DATA_DIR="../../data" # relative to this file 
+```
 
 b. Set up `MiroFlow/apps/run-agent/.env` by:
 ```bash
 ## copy environment variable template and prepare yours in .env file
 cd MiroFlow/apps/run-agent
+
+# Edit .env with your actual API keys
 cp .env.template .env
-vim .env
 ```
-Required environment variables:  
-- `OPENROUTER_API_KEY` (Using OpenRouter to provide primary agent model) 
-- `ANTHROPIC_API_KEY` (for vision tools)
-- `OPENAI_API_KEY` (for audio tools, intent recognition, and answer extraction)
-- `GEMINI_API_KEY` (for YouTube tasks)  
-- `SERPER_API_KEY` (for Google search and website scraping)  
-- `JINA_API_KEY` (for website scraping)
-- `E2B_API_KEY` (for the Linux sandbox)
-
-Optional environment variables:
-- `HTTPS_PROXY` (for network proxy, null by default )
-- `DATA_DIR` (Data loading directory, by default `../../data`)
+Edit `.env` to configure environment variables:  
+```
+# Using OpenRouter to provide primary agent model
+OPENROUTER_API_KEY=""
+OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
+
+# Anthropic, for vision tools
+ANTHROPIC_API_KEY=""
+ANTHROPIC_BASE_URL="https://api.anthropic.com"
+
+# OpenAI, for audio tools, intent recognition, and answer extraction
+OPENAI_API_KEY=""
+OPENAI_BASE_URL="https://api.openai.com/v1"
+
+# Gemini, for YouTube tasks
+GEMINI_API_KEY=""
+
+# Third party API keys
+# For Google search and website scraping
+SERPER_API_KEY=""
+# For website scraping
+JINA_API_KEY=""
+# For the Linux sandbox
+E2B_API_KEY=""
+
+# [Optional] NewAPI, alternative to OpenRouter 
+NEWAPI_API_KEY=""
+NEWAPI_BASE_URL=""
+
+# [Optional] for network proxy, null by default
+HTTPS_PROXY=""
+# [Optional] Data loading directory, by default `../../data`
+DATA_DIR="../../data"
+```
 
 If you wish to use a different LLM as the primary agent model, you will need to provide the corresponding API keys.
 
@@ -266,39 +279,93 @@ cd MiroFlow/apps/run-agent
 bash scripts/claude-sonnet-3.7/run_evaluate_multiple_runs_gaia-validation.sh
 ```
 
-# 🌟 MiroThinker (7B/14B/32B): Our Open-Source Agentic Models
+## [Optional] Customized Configuration
+
+MiroFlow uses [Hydra](https://hydra.cc/) for flexible configuration management, supporting different setups for LLMs, agents, benchmarks, and pricing models.
+
+## Structure
 
-[![MiroThinker](https://img.shields.io/badge/Github-24292F?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAI/MiroThinker)
+```
+MiroFlow/libs/miroflow/src/miroflow/prebuilt/config
+├── config.yaml              # Main configuration with defaults
+├── agent/                   # Agent configurations (tools, limits)
+├── benchmark/               # Benchmark configurations (datasets, execution)
+└── llm/                     # Language model configurations (providers, models)
+```
 
+## Usage
 
-MiroThinker is our suite of open-source agentic models, designed to work seamlessly with the MiroFlow framework. Our models are specifically built to handle **complex, multi-tool tasks**, leveraging the reproducible and robust foundation that MiroFlow provides.
+Run with default configuration:
+```bash
+cd MiroFlow/apps/run-agent
+uv run main.py common-benchmark
+```
+**Default Components**:
+- LLM: `claude_openrouter`
+- Agent: `miroflow`
+- Benchmark: `gaia-validation`
+- Pricing: `_default`
+
+
+## Override Configurations
+
+### Component Override
+Switch between existing configurations using the filename (without `.yaml`):
+```bash
+uv run main.py common-benchmark llm=<filename> agent=<filename> benchmark=<filename>
+```
+
+For example, if you have `conf/llm/claude_openrouter.yaml`, use `llm=claude_openrouter`
+
+
+### Parameter Override
+Override specific parameters:
+```bash
+cd MiroFlow/apps/run-agent
+uv run main.py common-benchmark llm.temperature=0.1 agent.main_agent.max_turns=30
+```
+
+## Create Custom Configurations
+
+1. **Create new config file** in the appropriate subdirectory (e.g., `conf/llm/my_config.yaml`)
+2. **Inherit from defaults** using Hydra's composition:
+   ```yaml
+   defaults:
+     - _default  # Inherit base configuration
+     - _self_    # Allow self-overrides
+
+   # Your custom parameters
+   parameter: value
+   ```
+3. **Use your config**: `uv run main.py common-benchmark component=my_config`
+
+
+# 🌟 MiroThinker
+
+
+[MiroThinker](https://github.com/MiroMindAI/MiroThinker) (7B/14B/32B) is our suite of open-source agentic models, designed to work seamlessly with the MiroFlow framework. Our models are specifically built to handle **complex, multi-tool tasks**, leveraging the reproducible and robust foundation that MiroFlow provides.
 
 By combining MiroFlow’s reliable orchestration with MiroThinker’s advanced reasoning capabilities, we offer a powerful, end-to-end solution for building high-performing, reproducible AI agents.
 These models are a direct result of our extensive data collection efforts, utilizing MiroFlow to generate high-quality, post-training agent trace data. This unique approach enables MiroThinker to excel in planning, executing, and reasoning through complex multi-step tasks.
 We invite the community to explore and build upon these models. For more details on the architecture and implementation, please take a look at our codebase.
 
-# 🤔 Why Choose MiroFlow
+# ❓ FAQ
+
+**Q: What is the estimated cost of running the GAIA validation set for a single run?** <br>
+**A**: The cost is approximately **$450 USD** for a run without a cache. Enabling the cache can significantly reduce this cost by 50-67%, bringing it down to the **$150 - $225** range.
+
+
+**Q: How long does it take to run the GAIA validation set for a single run?** <br>
+**A**: With the `max_concurrent` parameter set to 20, a full run takes about **5 hours** to complete.
 
-Among the many agent frameworks out there, why do we believe **MiroFlow** is worth your time?
+**Q: Are all the specified APIs required?** <br>
+**A**: **Yes.** To fully reproduce our published results, access to all the listed APIs is necessary.
 
-### 1. Stable and Reproducible Performance  
-Many open-source agent projects list impressive benchmark scores in their README, but often lack clear testing conditions or are difficult to reproduce.  
-MiroFlow was built from day one with **reproducibility** as a core principle:  
-- Fully open evaluation scripts and configuration files  
-- Multiple independent GAIA trace runs published on HuggingFace  
-- If you cannot reproduce our results, we actively encourage you to open a GitHub issue — we will help investigate and resolve it promptly  
 
-### 2. Continuous Updates and Community Co-Creation  
-MiroFlow is not a “one-and-done” academic repo, but a continuously evolving engineering project:  
-- **Monthly releases** with priority given to community feedback  
-- New tools, enhanced sub-agents, and expanded benchmark coverage  
-- Pull requests and feature proposals are welcome — we carefully review and credit all contributors in our changelog  
+**Q: What is the difference between MiroFlow and MiroThinker?** <br>
+**A**:  **MiroFlow** is primarily focused on interacting with proprietary models; **MiroThinker** is designed for our own open-source models.
 
-### 3. Seamless Transition from Benchmark to Production  
-MiroFlow is designed not only to achieve high benchmark scores, but also to operate reliably in production environments:  
-- High concurrency and robust fault tolerance  
-- Built-in observability (support visual UI + full logging system)  
-- Flexible integration with various LLMs and tools  
+We plan to merge these two projects in the future to create a single, unified platform.
 
 ## 🎉 Join Our Communities!