📖 Overview | 🛠️ Installation | 🚀 Quick Start | 📚 Usage Guide | 📄 License
-
[2025/08/15] Released SFT and DPO recipes for training MiroThinker using the MiroVerse-v0.1 dataset. Check them in
recipes/configs/mirothinker_v0_1
. These configs cover three different sizes of MiroThinker models. The training data used is theMiroVerse-v0.1-all
subset from MiroVerse-v0.1, with the split set totrain
.SFT Configurations:
Hyperparameter MiroThinker-8B-SFT-v0.1 MiroThinker-14B-SFT-v0.1 MiroThinker-32B-SFT-v0.1 Epochs 4 3 3 Learning Rate 4e-5 4e-5 4e-5 Weight Decay 0.1 0.1 0.1 Packed Data Enabled Enabled Enabled Context Length 40k 40k 40k Batch Size 128 128 128 Clip Grad Norm 1.0 1.0 1.0 Warmup Ratio 0.1 0.1 0.1 DPO Configurations:
Use a unified hyper-parameter setting for 8B / 14B / 32B models.
Hyperparameter MiroThinker-DPO-v0.1 Learning Rate 1e-5 Weight Decay 0.05 Context Length 40k Batch Size 32 Warmup Ratio 0.1 Beta 0.1 -
[2025/08/08] Released MiroTrain-v0.1, supporting post-training for MiroThinker using the MiroVerse-v0.1 dataset.
MiroTrain is an efficient, algorithm-first framework for post-training large agentic models. Built on top of the open-source project TorchTune, it delivers enhanced training recipes for SFT and DPO, supports post-training of 32B-scale LLMs on agentic datasets on a single GPU node with 8×80GB GPUs, and enables seamless scaling of post-training workloads to hundreds of GPUs.
-
High-Performance Post-Training: MiroTrain automatically leverages optimized operators such as FlashAttention and Triton kernels to maximize training throughput. It supports streaming_pack, which packs training samples on the fly without requiring dataset preprocessing.
-
Best-in-Class Memory Efficiency: MiroTrain incorporates Sequence Parallelism and CPU offloading, enabling efficient post-training of models with large vocabulary sizes and long context lengths.
-
FSDPv2 Compatible: Fully compatible with FSDPv2, which adopts DTensor-based per-parameter sharding.
-
Customizable Post-Training Recipes: Provides easily hackable recipes for SFT and DPO workflows. The modular design makes it simple to adapt or extend recipes for new post-training methods.
-
Simple PyTorch-Based LLM Implementations: Clean and extensible model definitions allow for quick experimentation. Model architectures can be easily modified to integrate new features—such as support for Yarn-style RoPE scaling.
-
HuggingFace Friendly: Fully compatible with HuggingFace datasets and model weights. Fine-tuned checkpoints are saved in HuggingFace-compatible format and can be seamlessly loaded by Transformers, vLLM, or SGLang for model serving.
For GRPO (Group Relative Policy Optimization) training, please refer to MiroRL: An MCP-first Reinforcement Learning Framework for Deep Research Agent
MiroTrain is tested with the latest stable PyTorch releases (2.5, 2.6, and 2.7). We recommend using Python 3.10+ and CUDA 12.1+ for optimal performance.
For the fastest setup, we provide a pre-built Docker image with all dependencies pre-installed:
# Pull the Docker image
docker pull miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0
# Run the container with GPU support
docker run --shm-size=8g --gpus all -it --rm \
-v $(pwd):/workspace \
-w /workspace \
miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0
Create a Python Environment and install Pytorch based on your CUDA version. We recommend using conda to create a clean Python 3.10 environment. For other Pytorch and CUDA versions, please refer to the PyTorch installation guide.
conda create --name mirotrain-env python=3.10 -y
conda activate mirotrain-env
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
Install MiroTrain. Clone the repository and install MiroTrain:
git clone https://github.com/MiroMindAI/mirotrain
cd mirotrain
pip install ./torchtune
pip install .
This guide demonstrates how to run MiroTrain on a single node with 8×80GB GPUs using Qwen3-32B as an example.
First, download the Qwen3-32B model weights from HuggingFace:
# Download Qwen3-32B model
tune download Qwen/Qwen3-32B \
--output-dir /path/to/qwen3-32b \
--hf-token <YOUR_HF_TOKEN>
Run supervised fine-tuning using the torchrun command:
cd recipes
torchrun \
--nproc_per_node 8 \
--nnodes 1 \
sft_trainer.py \
--config ./configs/qwen3/32B_full_sft.yaml
Run direct preference optimization using the torchrun command:
cd recipes
torchrun \
--nproc_per_node 8 \
--nnodes 1 \
dpo_trainer.py \
--config ./configs/qwen3/32B_full_dpo.yaml
- TorchTune for the excellent training framework and modular design
- Liger-Kernel for memory-efficient loss functions and training optimizations
- Grouped GEMM for efficient grouped matrix operations in MoE model training
- Flash Attention for high-performance attention implementations
@misc{2025mirotrain,
title={MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models},
author={MiroMind AI Infra Team},
howpublished = {\url{https://github.com/MiroMindAI/MiroTrain}},
year={2025}
}
This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.