- π° News & Updates
- π Introduction
- β¨ Performance on Benchmarks
- π Getting Started
- π MiroThinker
- π License & Support
- 2025-08-27: π MiroFlow v0.2 - Achieves SOTA performance across multiple agentic benchmarks. Highlights include HLE 27.2%, HLE-Text-Only 29.5%, BrowserComp-EN 33.2%, BrowserComp-ZH 47.1%, and xBench-DeepSearch 72.0%.
- 2025-08-26: π GAIA Validation Trace released (73.94% with pass@1) and Gradio Demo released for local deployment.
- 2025-08-08: π MiroFlow v0.1 - Framework, model, and data are now fully open-sourced!
MiroFlow is a fully open-sourced agent framework designed to reliably complete complex tool-use tasks. Our comprehensive ecosystem includes the following key components:
- π Reproducible SOTA Performance: MiroFlow consistently achieves 72.2% (pass@1 average@3) on the GAIA benchmark. Follow our detailed guide to reproduce our released GAIA traces and verify results.
- π Advanced Data Collection: Our framework features sophisticated data collection capabilities that generate high-quality, post-training agent trace data. We've open-sourced extensive datasets through MiroVerse.
- π Open Source Models: We provide fully open-sourced models that you can deploy locally and fine-tune for your specific needs. Explore our model collection at MiroThinker.
- π Comprehensive Training Framework: We've open-sourced our complete SFT and DPO training recipes, available at MiroTrain.
- π Reinforcement Learning Framework: Our RL training exploration and methodologies are fully available through MiroRL.
We benchmark MiroFlow on a series of benchmarks including GAIA, HLE, BrowseComp and xBench-DeepSearch. Meantime, we are working on more benchmarks.
Model/Framework | GAIA Val | HLE | HLE-Text | BrowserComp-EN | BrowserComp-ZH | xBench-DeepSearch |
---|---|---|---|---|---|---|
MiroFlow | 82.4% | 27.2% | 29.5% | 33.2% | 47.1% | 72.0% |
OpenAI Deep Research | 67.4% | 26.6% | - | 51.5% | 42.9% | - |
Gemini Deep Research | - | 26.9% | - | - | - | 50+% |
Kimi Researcher | - | - | 26.9% | - | - | 69.0% |
WebSailor-72B | 55.4% | - | - | - | 30.1% | 55.0% |
Manus | 73.3% | - | - | - | - | - |
DeepSeek v3.1 | - | - | 29.8% | - | - | 71.2% |
MiroFlow achieved 81.8% pass@3, 82.4% maj. vote, 74.5% pass@1 (best@3), and 72.2% pass@1 (avg@3) on the GAIA validation set. This represents state-of-the-art (SOTA) performance among open-source agent frameworks.
Note
Our pass@1 scores are reported as both the average across three runs (avg@3) and the best score among those runs (best@3). For most other reported pass@1 results, it is unclear whether they represent an average or a best score across multiple trials (indicated with *).
To prevent agents from retrieving answers directly from Hugging Face, we disabled access to it during the inference and trace collection.
We have evaluated multiple agent frameworks on GAIA. Please note that some reported results may be overstated or lack clear definitions, and are not reproducible. In contrast, reproducing MiroFlow's results is straightforward with just a few required API keys.
MiroFlow is a high-performance, modular framework for building intelligent AI agents that achieve state-of-the-art results on complex benchmarks. It features multi-turn conversation capabilities, comprehensive tool integration, and hierarchical sub-agent support for superior task completion.
More information on our agent workflow.
Tip
we recommend using uv
with python>= 3.12
## clone the repo
git clone https://github.com/MiroMindAI/MiroFlow
cd MiroFlow/apps/run-agent
## prepare python environment
uv sync
## copy environment variable template and prepare yours in .env file
cd MiroFlow/apps/prepare-benchmark
# Edit .env with your actual API keys
cp .env.template .env
Edit .env
to configure environment variables:
# For downloading datasets from Hugging Face
HF_TOKEN="<your-huggingface-token>"
# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data" # relative to this file
## copy environment variable template and prepare yours in .env file
cd MiroFlow/apps/run-agent
# Edit .env with your actual API keys
cp .env.template .env
Edit .env
to configure environment variables:
# Using OpenRouter to provide primary agent model
OPENROUTER_API_KEY=""
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
# Anthropic, for vision tools
ANTHROPIC_API_KEY=""
ANTHROPIC_BASE_URL="https://api.anthropic.com"
# OpenAI, for audio tools, intent recognition, and answer extraction
OPENAI_API_KEY=""
OPENAI_BASE_URL="https://api.openai.com/v1"
# Gemini, for YouTube tasks
GEMINI_API_KEY=""
# Third party API keys
# For Google search and website scraping
SERPER_API_KEY=""
# For website scraping
JINA_API_KEY=""
# For the Linux sandbox
E2B_API_KEY=""
# [Optional] NewAPI, alternative to OpenRouter
NEWAPI_API_KEY=""
NEWAPI_BASE_URL=""
# [Optional] for network proxy, null by default
HTTPS_PROXY=""
# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data"
Note
If you wish to use a different LLM as the primary agent model, you will need to provide the corresponding API keys.
To achieve our best benchmark results, we recommend using a pre-defined sandbox template that includes the most commonly used Python and apt packages. Please see our installation guide for detailed instructions.
If you prefer not to use a sandbox template, you can disable it by commenting out the line template=DEFAULT_TEMPLATE_ID,
in libs/miroflow-tool/src/miroflow/tool/mcp_servers/python_server.py
(line 145).
## run a task with instruction
cd MiroFlow/apps/run-agent
uv run main.py trace --task="your task description" --task_file_name="path to related task file"
Prepare datasets according to your requirements. Some datasets may need to be downloaded manually into the /data/<benchmark>
folder, and you should also create a corresponding standardized_data.jsonl
metafile. We will support as many datasets as possible as soon as we can.
## supported benchmarks
cd MiroFlow/apps/prepare-benchmark
uv run main.py get gaia-val
uv run main.py get browsecomp-test
uv run main.py get browsecomp-zh-test
uv run main.py get hle
Run evaluation using the default settings. (Not parallelized; not recommended.)
## run the code
cd MiroFlow/apps/run-agent
uv run main.py common-benchmark benchmark=gaia-validation
uv run main.py common-benchmark benchmark=browsecomp
uv run main.py common-benchmark benchmark=browsecomp-zh
uv run main.py common-benchmark benchmark=hle
For parallel and multi-run evaluations, and to gain better control over environment settings using Hydra, we recommend using the provided script:
cd MiroFlow/apps/run-agent
bash ./scripts/main-worker-dual/run_evaluate_multiple_runs_gaia-validation.sh
bash ./scripts/main-worker-dual/run_evaluate_multiple_runs_browsecomp.sh
bash ./scripts/main-worker-dual/run_evaluate_multiple_runs_browsecomp-zh.sh
bash ./scripts/main-worker-dual/run_evaluate_multiple_runs_hle.sh
You can easily modify and customize these scripts to suit your needs. See Customized Configuration for more details.
MiroFlow leverages Hydra for powerful configuration management, allowing you to easily switch between different LLMs, agents, benchmarks, and pricing models using YAML configuration files. For detailed instructions on configuration management, see our configuration guide.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details. Some components may have different licenses as specified in their respective file headers.
- Benchmark Contributors for the comprehensive evaluation datasets
- Open Source Community for the tools and libraries that make this possible
- Issues: For questions or bug reports, please use GitHub Issues.
- FAQ Documentation: See faq.md for additional guidelines
@misc{2025mirothinker,
title={MiroFlow: An Open-Source Agentic Framework for Deep Research},
author={MiroMind AI Team},
howpublished={\url{https://github.com/MiroMindAI/MiroFlow}},
year={2025}
}