Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 162 additions & 95 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,49 +9,43 @@
<a href="https://www.xiaohongshu.com/user/profile/663098830000000003033edc"><img src="https://img.shields.io/badge/-grey?style=social&logo=red&label=RedNote" alt="小红书" style="height: 20px;"></a>
<a href="https://discord.gg/GPqEnkzQZd"><img src="https://img.shields.io/badge/-grey?style=social&logo=discord&label=Discord" alt="Discord" style="height: 20px;"></a>
<a href="./docs/figs/wechat-group-qr-code.jpg"><img src="https://img.shields.io/badge/-grey?style=social&logo=wechat&label=WeChat" alt="WeChat" style="height: 20px;"></a>
<a href="https://deepwiki.com/MiroMindAI/MiroFlow"><img src="https://img.shields.io/badge/-grey?style=social&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAyCAYAAAAnWDnqAAAAAXNSR0IArs4c6QAAA05JREFUaEPtmUtyEzEQhtWTQyQLHNak2AB7ZnyXZMEjXMGeK/AIi+QuHrMnbChYY7MIh8g01fJoopFb0uhhEqqcbWTp06/uv1saEDv4O3n3dV60RfP947Mm9/SQc0ICFQgzfc4CYZoTPAswgSJCCUJUnAAoRHOAUOcATwbmVLWdGoH//PB8mnKqScAhsD0kYP3j/Yt5LPQe2KvcXmGvRHcDnpxfL2zOYJ1mFwrryWTz0advv1Ut4CJgf5uhDuDj5eUcAUoahrdY/56ebRWeraTjMt/00Sh3UDtjgHtQNHwcRGOC98BJEAEymycmYcWwOprTgcB6VZ5JK5TAJ+fXGLBm3FDAmn6oPPjR4rKCAoJCal2eAiQp2x0vxTPB3ALO2CRkwmDy5WohzBDwSEFKRwPbknEggCPB/imwrycgxX2NzoMCHhPkDwqYMr9tRcP5qNrMZHkVnOjRMWwLCcr8ohBVb1OMjxLwGCvjTikrsBOiA6fNyCrm8V1rP93iVPpwaE+gO0SsWmPiXB+jikdf6SizrT5qKasx5j8ABbHpFTx+vFXp9EnYQmLx02h1QTTrl6eDqxLnGjporxl3NL3agEvXdT0WmEost648sQOYAeJS9Q7bfUVoMGnjo4AZdUMQku50McDcMWcBPvr0SzbTAFDfvJqwLzgxwATnCgnp4wDl6Aa+Ax283gghmj+vj7feE2KBBRMW3FzOpLOADl0Isb5587h/U4gGvkt5v60Z1VLG8BhYjbzRwyQZemwAd6cCR5/XFWLYZRIMpX39AR0tjaGGiGzLVyhse5C9RKC6ai42ppWPKiBagOvaYk8lO7DajerabOZP46Lby5wKjw1HCRx7p9sVMOWGzb/vA1hwiWc6jm3MvQDTogQkiqIhJV0nBQBTU+3okKCFDy9WwferkHjtxib7t3xIUQtHxnIwtx4mpg26/HfwVNVDb4oI9RHmx5WGelRVlrtiw43zboCLaxv46AZeB3IlTkwouebTr1y2NjSpHz68WNFjHvupy3q8TFn3Hos2IAk4Ju5dCo8B3wP7VPr/FGaKiG+T+v+TQqIrOqMTL1VdWV1DdmcbO8KXBz6esmYWYKPwDL5b5FA1a0hwapHiom0r/cKaoqr+27/XcrS5UwSMbQAAAABJRU5ErkJggg==&label=Deepwiki" alt="DeepWiki"></a>
<!-- DeepWiki badge generated by https://deepwiki.ryoppippi.com/ -->
<a href="https://miromind.ai"><img src="https://img.shields.io/badge/-grey?style=social&logo=google-chrome&label=miromind.ai" alt="miromind.ai" style="height: 20px;"></a>
</p>


<p align="center">
| <a href="https://deepwiki.com/miromind/miroflow"><b>Ask DeepWiki</b></a> | <a href="#-overview"><b>🎯 Overview</b></a> |
<a href="#-miroflow-sota-performance" target="_blank"><b>✨ Performance</b></a> |
<a href="#-miroflow-modular-ai-agent-framework" target="_black"><b>🤖 Framework</b> </a> |
<a href="#-getting-started" target="_black"><b>🚀 Getting Started</b> </a> |
<a href="https://github.com/MiroMindAI/MiroThinker" target="_black"><b>🌟 MiroThinker</b> </a>
</p>

<!-- <p align="center">
<span style="
display: inline-block;
font-size: 1.5em;
font-weight: bold;
background: linear-gradient(90deg, #ff4e50, #f9d423, #1e90ff, #32cd32, #ff69b4, #8a2be2, #ff4e50);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
text-fill-color: transparent;
animation: rainbow-text 3s linear infinite;
">
<a href="https://dr.miromind.ai/" style="color: #1e90ff; text-decoration: underline; text-decoration-thickness: 2px;"><b><u>Try our demo here!</u></b></a>
</span>
</p>
<style>
@keyframes rainbow-text {
0% { filter: hue-rotate(0deg);}
100% { filter: hue-rotate(360deg);}
}
</style> -->


<p align="center">
<a href="https://dr.miromind.ai/" style="color:rgb(30, 203, 255); text-decoration: underline; text-decoration-thickness: 2px;"><b><u>Try our demo with MiroThinker here!</u></b></a>
</p>

## 📚 Table of Contents

- [🎯 Overview](#-overview)
- [✨ MiroFlow SOTA Performance](#-miroflow-sota-performance)
- [🤖 MiroFlow: Modular AI Agent Framework](#-miroflow-modular-ai-agent-framework)
- [Workflow Overview](#workflow-overview)
- [Architecture Components](#architecture-components)
- [Core System 💻](#core-system-)
- [Tool Integration 🔧](#tool-integration-)
- [Agent System 👷](#agent-system-)
- [Support Systems ⚙️](#support-systems-️)
- [🚀 Getting Started](#-getting-started)
- [Prerequisites](#prerequisites)
- [Runing a single task](#runing-a-single-task)
- [Evaluate on Benchmark](#evaluate-on-benchmark)
- [[Optional] Customized Configuration](#optional-customized-configuration)
- [🌟 MiroThinker](#-mirothinker)
- [❓ FAQ](#-faq)
- [🎉 Join Our Communities!](#-join-our-communities)

# 🎯 Overview

<img src="./docs/figs/logo.png" alt="MiroFlow Logo" width="200" align="right">

**MiroFlow** is a **battle-tested** agent framework that reliably completes complex tool-use tasks. We have extensively used it to generate high-quality, post-training agent trace data for **[MiroThinker](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1)**. Some key features are:
**MiroFlow** is a **battle-tested** agent framework that reliably completes complex tool-use tasks. We have extensively used it to generate high-quality, post-training agent trace data for **[MiroThinker](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1)**, our suite of open-source agentic models. Some key features are:

- 🌟 **Reproducible SOTA**: **MiroFlow** consistently achieves 72.2% (pass@1 average@3) on GAIA validation set. Follow our [getting-started guide](#get-start) below, or view our many runs of gaia trace on huggingfaces. If you can't reproduce our result, please open a Github issue - We take reproducibility seriously.
- 🌟 **High Concurrency and Fault Tolerance**: **MiroFlow** scales data collection efficiently and handles rate-limited APIs and unstable network connections with ease.
Expand Down Expand Up @@ -82,13 +76,13 @@ MiroFlow is a sophisticated, modular framework for building intelligent AI agent
MiroFlow handles user queries through a multi-stage and agentic process designed for flexibility and depth. The workflow is organized as follows:

1. **Intent Recognition & Query Augmentation**
User input is first analyzed by Large Language Models (LLMs) to determine intent and enrich the query for deeper understanding.
LLMs analyze user input to detect intent and refine the query.

2. **Planning & Task Orchestration**
The main agent examines the enriched query, develops a comprehensive execution plan, and orchestrates the entire workflow—invoking tools, delegating tasks to sub-agents, and driving task progress forward.
The main agent drafts an execution plan, invokes tools, and coordinates sub-agents.

3. **Delegation to Sub-Agents**
For complex or domain-specific tasks, the main agent delegates responsibilities to specialized sub-agents (such as `agent-browsing`) that possess targeted expertise. Sub-agents independently plan, act, and execute tool calls as needed.
Specialized agents (e.g., agent-browsing) handle complex or domain-specific tasks. Sub-agents independently plan, act, and execute tool calls as needed.

4. **Tool Access via MCP Servers**
When external capabilities are required, agents leverage specialized tools by connecting to MCP (Model Context Protocol) servers.
Expand All @@ -102,28 +96,23 @@ All core components are located in the `libs/` directory.

### Core System 💻

**Pipeline** (`./miroflow/src/miroflow/prebuilt/pipeline.py`)
Main entry point that coordinates task execution. Creates and manages all components, handles error recovery, and returns final results. Serves as the factory for initializing tool managers, LLM clients, and output formatters.
- **Pipeline** (`./miroflow/src/miroflow/prebuilt/pipeline.py`): Main entry point that creates and manages all components, handles error recovery, and returns final results

**Orchestrator** (`./miroflow/src/miroflow/prebuilt/orchestrator.py`)
Manages the conversation flow between LLM and tools. Handles multi-turn conversations, parses tool calls from LLM responses, executes tools, delegates tasks to sub-agents, and manages contexts.
- **Orchestrator** (`./miroflow/src/miroflow/prebuilt/orchestrator.py`): Manages multi-turn conversations, parses tool calls, executes tools, and delegates to sub-agents

**LLM Client** (`./miroflow/src/miroflow/llm/client.py`)
Provides a unified interface for various LLM providers (Anthropic, OpenAI, Google, Qwen, DeepSeek, local deployments, etc.). Manages authentication, request formatting, retry logic, token usage tracking, and supports streaming responses.
- **LLM Client** (`./miroflow/src/miroflow/llm/client.py`): Unified interface supporting Anthropic, OpenAI, Google, Qwen, DeepSeek, and local deployments

### Tool Integration 🔧

**Tool Manager** (`./miroflow-tool/src/miroflow/tool/manager.py`)
Comprehensive MCP server connection manager. Handles tool discovery, maintains persistent server connections, executes tool calls with advanced error handling, and supports flexible tool blacklisting.
- **Tool Manager** (`./miroflow-tool/src/miroflow/tool/manager.py`) : Comprehensive MCP server connection manager with tool discovery, persistent connections, and error handling

**MCP Servers** (`./miroflow-tool/src/miroflow/tool/mcp_servers/`)
Individual tool implementations built on FastMCP. Provides extensive capabilities including:
- Code execution and analysis (`./python_server.py`)
- Visual perception (`./vision_mcp_server.py`)
- Web search and content retrieval (`./searching_mcp_server.py`)
- Audio transcription (`./audio_mcp_server.py`)
- Enhanced reasoning capabilities (`./reasoning_mcp_server.py`)
- Document processing and analysis (`./reading_mcp_server.py`)
- **MCP Servers** (`./miroflow-tool/src/miroflow/tool/mcp_servers/`) : Individual tool implementations built on FastMCP. Provides extensive capabilities including:
- Code execution and analysis (`./python_server.py`)
- Visual perception (`./vision_mcp_server.py`)
- Web search and content retrieval (`./searching_mcp_server.py`)
- Audio transcription (`./audio_mcp_server.py`)
- Enhanced reasoning capabilities (`./reasoning_mcp_server.py`)
- Document processing and analysis (`./reading_mcp_server.py`)

### Agent System 👷

Expand All @@ -132,14 +121,11 @@ Specialized agents designed for specific domains (e.g., `agent-browsing` for web

### Support Systems ⚙️

**Configuration System** (`./miroflow/src/miroflow/prebuilt/config/`)
Hydra-powered configuration management with structured YAML files covering agents, LLMs, benchmarks, and pricing models.
- **Configuration System** (`./miroflow/src/miroflow/prebuilt/config/`) : Hydra-powered YAML configuration for agents, LLMs, benchmarks, and pricing

**Output Formatter** (`./miroflow/src/miroflow/utils/io_utils.py`)
Intelligent response formatting system that adapts agent outputs to various benchmark requirements, extracts structured answers, and handles multiple output formats seamlessly.
- **Output Formatter** (`./miroflow/src/miroflow/utils/io_utils.py`) : Intelligent response formatting that adapts to various benchmark requirements

**Task Logger** (`./miroflow/src/miroflow/logging/`)
Comprehensive logging infrastructure that captures agent interactions, tool executions, performance metrics, and error traces for debugging, analysis, and system optimization.
- **Task Logger** (`./miroflow/src/miroflow/logging/`) : Comprehensive logging for agent interactions, tool executions, and performance metrics

<a id="get-start"></a>
# 🚀 Getting Started
Expand All @@ -154,6 +140,7 @@ Comprehensive logging infrastructure that captures agent interactions, tool exec
## clone the repo
git clone https://github.com/MiroMindAI/MiroFlow
cd MiroFlow/apps/run-agent

## prepare python environment
uv sync
```
Expand All @@ -164,35 +151,61 @@ a. Set up `MiroFlow/apps/prepare-benchmark/.env` by:
```bash
## copy environment variable template and prepare yours in .env file
cd MiroFlow/apps/prepare-benchmark

# Edit .env with your actual API keys
cp .env.template .env
vim .env
```
Required environment variables:
- `HF_TOKEN` (for downloading datasets from Hugging Face)

Optional environment variables:
- `DATA_DIR` (Data loading directory, by default `../../data`)
Edit `.env` to configure environment variables:
```
# For downloading datasets from Hugging Face
HF_TOKEN="<your-huggingface-token>"

# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data" # relative to this file
```

b. Set up `MiroFlow/apps/run-agent/.env` by:
```bash
## copy environment variable template and prepare yours in .env file
cd MiroFlow/apps/run-agent

# Edit .env with your actual API keys
cp .env.template .env
vim .env
```
Required environment variables:
- `OPENROUTER_API_KEY` (Using OpenRouter to provide primary agent model)
- `ANTHROPIC_API_KEY` (for vision tools)
- `OPENAI_API_KEY` (for audio tools, intent recognition, and answer extraction)
- `GEMINI_API_KEY` (for YouTube tasks)
- `SERPER_API_KEY` (for Google search and website scraping)
- `JINA_API_KEY` (for website scraping)
- `E2B_API_KEY` (for the Linux sandbox)

Optional environment variables:
- `HTTPS_PROXY` (for network proxy, null by default )
- `DATA_DIR` (Data loading directory, by default `../../data`)
Edit `.env` to configure environment variables:
```
# Using OpenRouter to provide primary agent model
OPENROUTER_API_KEY=""
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

# Anthropic, for vision tools
ANTHROPIC_API_KEY=""
ANTHROPIC_BASE_URL="https://api.anthropic.com"

# OpenAI, for audio tools, intent recognition, and answer extraction
OPENAI_API_KEY=""
OPENAI_BASE_URL="https://api.openai.com/v1"

# Gemini, for YouTube tasks
GEMINI_API_KEY=""

# Third party API keys
# For Google search and website scraping
SERPER_API_KEY=""
# For website scraping
JINA_API_KEY=""
# For the Linux sandbox
E2B_API_KEY=""

# [Optional] NewAPI, alternative to OpenRouter
NEWAPI_API_KEY=""
NEWAPI_BASE_URL=""

# [Optional] for network proxy, null by default
HTTPS_PROXY=""
# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data"
```

If you wish to use a different LLM as the primary agent model, you will need to provide the corresponding API keys.

Expand Down Expand Up @@ -266,39 +279,93 @@ cd MiroFlow/apps/run-agent
bash scripts/claude-sonnet-3.7/run_evaluate_multiple_runs_gaia-validation.sh
```

# 🌟 MiroThinker (7B/14B/32B): Our Open-Source Agentic Models
## [Optional] Customized Configuration

MiroFlow uses [Hydra](https://hydra.cc/) for flexible configuration management, supporting different setups for LLMs, agents, benchmarks, and pricing models.

## Structure

[![MiroThinker](https://img.shields.io/badge/Github-24292F?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAI/MiroThinker)
```
MiroFlow/libs/miroflow/src/miroflow/prebuilt/config
├── config.yaml # Main configuration with defaults
├── agent/ # Agent configurations (tools, limits)
├── benchmark/ # Benchmark configurations (datasets, execution)
└── llm/ # Language model configurations (providers, models)
```

## Usage

MiroThinker is our suite of open-source agentic models, designed to work seamlessly with the MiroFlow framework. Our models are specifically built to handle **complex, multi-tool tasks**, leveraging the reproducible and robust foundation that MiroFlow provides.
Run with default configuration:
```bash
cd MiroFlow/apps/run-agent
uv run main.py common-benchmark
```
**Default Components**:
- LLM: `claude_openrouter`
- Agent: `miroflow`
- Benchmark: `gaia-validation`
- Pricing: `_default`


## Override Configurations

### Component Override
Switch between existing configurations using the filename (without `.yaml`):
```bash
uv run main.py common-benchmark llm=<filename> agent=<filename> benchmark=<filename>
```

For example, if you have `conf/llm/claude_openrouter.yaml`, use `llm=claude_openrouter`


### Parameter Override
Override specific parameters:
```bash
cd MiroFlow/apps/run-agent
uv run main.py common-benchmark llm.temperature=0.1 agent.main_agent.max_turns=30
```

## Create Custom Configurations

1. **Create new config file** in the appropriate subdirectory (e.g., `conf/llm/my_config.yaml`)
2. **Inherit from defaults** using Hydra's composition:
```yaml
defaults:
- _default # Inherit base configuration
- _self_ # Allow self-overrides

# Your custom parameters
parameter: value
```
3. **Use your config**: `uv run main.py common-benchmark component=my_config`


# 🌟 MiroThinker


[MiroThinker](https://github.com/MiroMindAI/MiroThinker) (7B/14B/32B) is our suite of open-source agentic models, designed to work seamlessly with the MiroFlow framework. Our models are specifically built to handle **complex, multi-tool tasks**, leveraging the reproducible and robust foundation that MiroFlow provides.

By combining MiroFlow’s reliable orchestration with MiroThinker’s advanced reasoning capabilities, we offer a powerful, end-to-end solution for building high-performing, reproducible AI agents.
These models are a direct result of our extensive data collection efforts, utilizing MiroFlow to generate high-quality, post-training agent trace data. This unique approach enables MiroThinker to excel in planning, executing, and reasoning through complex multi-step tasks.
We invite the community to explore and build upon these models. For more details on the architecture and implementation, please take a look at our codebase.

# 🤔 Why Choose MiroFlow
# ❓ FAQ

**Q: What is the estimated cost of running the GAIA validation set for a single run?** <br>
**A**: The cost is approximately **$450 USD** for a run without a cache. Enabling the cache can significantly reduce this cost by 50-67%, bringing it down to the **$150 - $225** range.


**Q: How long does it take to run the GAIA validation set for a single run?** <br>
**A**: With the `max_concurrent` parameter set to 20, a full run takes about **5 hours** to complete.

Among the many agent frameworks out there, why do we believe **MiroFlow** is worth your time?
**Q: Are all the specified APIs required?** <br>
**A**: **Yes.** To fully reproduce our published results, access to all the listed APIs is necessary.

### 1. Stable and Reproducible Performance
Many open-source agent projects list impressive benchmark scores in their README, but often lack clear testing conditions or are difficult to reproduce.
MiroFlow was built from day one with **reproducibility** as a core principle:
- Fully open evaluation scripts and configuration files
- Multiple independent GAIA trace runs published on HuggingFace
- If you cannot reproduce our results, we actively encourage you to open a GitHub issue — we will help investigate and resolve it promptly

### 2. Continuous Updates and Community Co-Creation
MiroFlow is not a “one-and-done” academic repo, but a continuously evolving engineering project:
- **Monthly releases** with priority given to community feedback
- New tools, enhanced sub-agents, and expanded benchmark coverage
- Pull requests and feature proposals are welcome — we carefully review and credit all contributors in our changelog
**Q: What is the difference between MiroFlow and MiroThinker?** <br>
**A**: **MiroFlow** is primarily focused on interacting with proprietary models; **MiroThinker** is designed for our own open-source models.

### 3. Seamless Transition from Benchmark to Production
MiroFlow is designed not only to achieve high benchmark scores, but also to operate reliably in production environments:
- High concurrency and robust fault tolerance
- Built-in observability (support visual UI + full logging system)
- Flexible integration with various LLMs and tools
We plan to merge these two projects in the future to create a single, unified platform.

## 🎉 Join Our Communities!

Expand Down
Loading
Loading