MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models

📖 Overview | 🛠️ Installation | 🚀 Quick Start | 📚 Usage Guide | 📄 License

News

[2025/08/15] Released SFT and DPO recipes for training MiroThinker using the MiroVerse-v0.1 dataset. Check them in recipes/configs/mirothinker_v0_1. These configs cover three different sizes of MiroThinker models. The training data used is the MiroVerse-v0.1-all subset from MiroVerse-v0.1, with the split set to train.

SFT Configurations:

Hyperparameter	MiroThinker-8B-SFT-v0.1	MiroThinker-14B-SFT-v0.1	MiroThinker-32B-SFT-v0.1
Epochs	4	3	3
Learning Rate	4e-5	4e-5	4e-5
Weight Decay	0.1	0.1	0.1
Packed Data	Enabled	Enabled	Enabled
Context Length	40k	40k	40k
Batch Size	128	128	128
Clip Grad Norm	1.0	1.0	1.0
Warmup Ratio	0.1	0.1	0.1

DPO Configurations:

Use a unified hyper-parameter setting for 8B / 14B / 32B models.

Hyperparameter	MiroThinker-DPO-v0.1
Learning Rate	1e-5
Weight Decay	0.05
Context Length	40k
Batch Size	32
Warmup Ratio	0.1
Beta	0.1

[2025/08/08] Released MiroTrain-v0.1, supporting post-training for MiroThinker using the MiroVerse-v0.1 dataset.

Overview

MiroTrain is an efficient, algorithm-first framework for post-training large agentic models. Built on top of the open-source project TorchTune, it delivers enhanced training recipes for SFT and DPO, supports post-training of 32B-scale LLMs on agentic datasets on a single GPU node with 8×80GB GPUs, and enables seamless scaling of post-training workloads to hundreds of GPUs.

🚀 Efficient

High-Performance Post-Training: MiroTrain automatically leverages optimized operators such as FlashAttention and Triton kernels to maximize training throughput. It supports streaming_pack, which packs training samples on the fly without requiring dataset preprocessing.
Best-in-Class Memory Efficiency: MiroTrain incorporates Sequence Parallelism and CPU offloading, enabling efficient post-training of models with large vocabulary sizes and long context lengths.
FSDPv2 Compatible: Fully compatible with FSDPv2, which adopts DTensor-based per-parameter sharding.

⚡ Algorithm-First

Customizable Post-Training Recipes: Provides easily hackable recipes for SFT and DPO workflows. The modular design makes it simple to adapt or extend recipes for new post-training methods.
Simple PyTorch-Based LLM Implementations: Clean and extensible model definitions allow for quick experimentation. Model architectures can be easily modified to integrate new features—such as support for Yarn-style RoPE scaling.
HuggingFace Friendly: Fully compatible with HuggingFace datasets and model weights. Fine-tuned checkpoints are saved in HuggingFace-compatible format and can be seamlessly loaded by Transformers, vLLM, or SGLang for model serving.

For GRPO (Group Relative Policy Optimization) training, please refer to MiroRL: An MCP-first Reinforcement Learning Framework for Deep Research Agent

Installation

MiroTrain is tested with the latest stable PyTorch releases (2.5, 2.6, and 2.7). We recommend using Python 3.10+ and CUDA 12.1+ for optimal performance.

🐳 Docker Installation

For the fastest setup, we provide a pre-built Docker image with all dependencies pre-installed:

# Pull the Docker image
docker pull miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0

# Run the container with GPU support
docker run --shm-size=8g --gpus all -it --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0

🔧 Manual Installation

Create a Python Environment and install Pytorch based on your CUDA version. We recommend using conda to create a clean Python 3.10 environment. For other Pytorch and CUDA versions, please refer to the PyTorch installation guide.

conda create --name mirotrain-env python=3.10 -y
conda activate mirotrain-env
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia

Install MiroTrain. Clone the repository and install MiroTrain:

git clone https://github.com/MiroMindAI/mirotrain
cd mirotrain
pip install ./torchtune
pip install .

Quick Start on Single-Node

This guide demonstrates how to run MiroTrain on a single node with 8×80GB GPUs using Qwen3-32B as an example.

Download Model Weights

First, download the Qwen3-32B model weights from HuggingFace:

# Download Qwen3-32B model
tune download Qwen/Qwen3-32B \
  --output-dir /path/to/qwen3-32b \
  --hf-token <YOUR_HF_TOKEN>

SFT (Supervised Fine-Tuning)

Run supervised fine-tuning using the torchrun command:

cd recipes
torchrun \
  --nproc_per_node 8 \
  --nnodes 1 \
  sft_trainer.py \
  --config ./configs/qwen3/32B_full_sft.yaml

DPO (Direct Preference Optimization)

Run direct preference optimization using the torchrun command:

cd recipes
torchrun \
  --nproc_per_node 8 \
  --nnodes 1 \
  dpo_trainer.py \
  --config ./configs/qwen3/32B_full_dpo.yaml

Usage Guide

Acknowledgements

TorchTune for the excellent training framework and modular design
Liger-Kernel for memory-efficient loss functions and training optimizations
Grouped GEMM for efficient grouped matrix operations in MoE model training
Flash Attention for high-performance attention implementations

Citation

@misc{2025mirotrain,
    title={MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models},
    author={MiroMind AI Infra Team},
    howpublished = {\url{https://github.com/MiroMindAI/MiroTrain}},
    year={2025}
}

License

This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
mirotrain		mirotrain
recipes		recipes
tests/models/qwen3		tests/models/qwen3
torchtune		torchtune
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models

News

Overview

🚀 Efficient

⚡ Algorithm-First

Installation

🐳 Docker Installation

🔧 Manual Installation

Quick Start on Single-Node

Download Model Weights

SFT (Supervised Fine-Tuning)

DPO (Direct Preference Optimization)

Usage Guide

Acknowledgements

Citation

License

About

Uh oh!

Releases 1

Packages

Contributors 4

Uh oh!

Languages

License

MiroMindAI/MiroTrain

Folders and files

Latest commit

History

Repository files navigation

MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models

News

Overview

🚀 Efficient

⚡ Algorithm-First

Installation

🐳 Docker Installation

🔧 Manual Installation

Quick Start on Single-Node

Download Model Weights

SFT (Supervised Fine-Tuning)

DPO (Direct Preference Optimization)

Usage Guide

Acknowledgements

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Uh oh!

Languages

Packages