Skip to content

MiroMindAI/MiroTrain

Repository files navigation

MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models

HuggingFace X 小红书 Discord WeChat miromind.ai

📖 Overview | 🛠️ Installation | 🚀 Quick Start | 📚 Usage Guide | 📄 License

News

  • [2025/08/15] Released SFT and DPO recipes for training MiroThinker using the MiroVerse-v0.1 dataset. Check them in recipes/configs/mirothinker_v0_1. These configs cover three different sizes of MiroThinker models. The training data used is the MiroVerse-v0.1-all subset from MiroVerse-v0.1, with the split set to train.

    SFT Configurations:

    Hyperparameter MiroThinker-8B-SFT-v0.1 MiroThinker-14B-SFT-v0.1 MiroThinker-32B-SFT-v0.1
    Epochs 4 3 3
    Learning Rate 4e-5 4e-5 4e-5
    Weight Decay 0.1 0.1 0.1
    Packed Data Enabled Enabled Enabled
    Context Length 40k 40k 40k
    Batch Size 128 128 128
    Clip Grad Norm 1.0 1.0 1.0
    Warmup Ratio 0.1 0.1 0.1

    DPO Configurations:

    Use a unified hyper-parameter setting for 8B / 14B / 32B models.

    Hyperparameter MiroThinker-DPO-v0.1
    Learning Rate 1e-5
    Weight Decay 0.05
    Context Length 40k
    Batch Size 32
    Warmup Ratio 0.1
    Beta 0.1
  • [2025/08/08] Released MiroTrain-v0.1, supporting post-training for MiroThinker using the MiroVerse-v0.1 dataset.

Overview

MiroTrain is an efficient, algorithm-first framework for post-training large agentic models. Built on top of the open-source project TorchTune, it delivers enhanced training recipes for SFT and DPO, supports post-training of 32B-scale LLMs on agentic datasets on a single GPU node with 8×80GB GPUs, and enables seamless scaling of post-training workloads to hundreds of GPUs.

🚀 Efficient

  • High-Performance Post-Training: MiroTrain automatically leverages optimized operators such as FlashAttention and Triton kernels to maximize training throughput. It supports streaming_pack, which packs training samples on the fly without requiring dataset preprocessing.

  • Best-in-Class Memory Efficiency: MiroTrain incorporates Sequence Parallelism and CPU offloading, enabling efficient post-training of models with large vocabulary sizes and long context lengths.

  • FSDPv2 Compatible: Fully compatible with FSDPv2, which adopts DTensor-based per-parameter sharding.

⚡ Algorithm-First

  • Customizable Post-Training Recipes: Provides easily hackable recipes for SFT and DPO workflows. The modular design makes it simple to adapt or extend recipes for new post-training methods.

  • Simple PyTorch-Based LLM Implementations: Clean and extensible model definitions allow for quick experimentation. Model architectures can be easily modified to integrate new features—such as support for Yarn-style RoPE scaling.

  • HuggingFace Friendly: Fully compatible with HuggingFace datasets and model weights. Fine-tuned checkpoints are saved in HuggingFace-compatible format and can be seamlessly loaded by Transformers, vLLM, or SGLang for model serving.

For GRPO (Group Relative Policy Optimization) training, please refer to MiroRL: An MCP-first Reinforcement Learning Framework for Deep Research Agent

Installation

MiroTrain is tested with the latest stable PyTorch releases (2.5, 2.6, and 2.7). We recommend using Python 3.10+ and CUDA 12.1+ for optimal performance.

🐳 Docker Installation

For the fastest setup, we provide a pre-built Docker image with all dependencies pre-installed:

# Pull the Docker image
docker pull miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0

# Run the container with GPU support
docker run --shm-size=8g --gpus all -it --rm \
  -v $(pwd):/workspace \
  -w /workspace \
  miromind/mirotrain:0.1.0-cuda12.6-pytorch2.6.0

🔧 Manual Installation

Create a Python Environment and install Pytorch based on your CUDA version. We recommend using conda to create a clean Python 3.10 environment. For other Pytorch and CUDA versions, please refer to the PyTorch installation guide.

conda create --name mirotrain-env python=3.10 -y
conda activate mirotrain-env
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia

Install MiroTrain. Clone the repository and install MiroTrain:

git clone https://github.com/MiroMindAI/mirotrain
cd mirotrain
pip install ./torchtune
pip install .

Quick Start on Single-Node

This guide demonstrates how to run MiroTrain on a single node with 8×80GB GPUs using Qwen3-32B as an example.

Download Model Weights

First, download the Qwen3-32B model weights from HuggingFace:

# Download Qwen3-32B model
tune download Qwen/Qwen3-32B \
  --output-dir /path/to/qwen3-32b \
  --hf-token <YOUR_HF_TOKEN>

SFT (Supervised Fine-Tuning)

Run supervised fine-tuning using the torchrun command:

cd recipes
torchrun \
  --nproc_per_node 8 \
  --nnodes 1 \
  sft_trainer.py \
  --config ./configs/qwen3/32B_full_sft.yaml

DPO (Direct Preference Optimization)

Run direct preference optimization using the torchrun command:

cd recipes
torchrun \
  --nproc_per_node 8 \
  --nnodes 1 \
  dpo_trainer.py \
  --config ./configs/qwen3/32B_full_dpo.yaml

Usage Guide

Usage Guide

Acknowledgements

  • TorchTune for the excellent training framework and modular design
  • Liger-Kernel for memory-efficient loss functions and training optimizations
  • Grouped GEMM for efficient grouped matrix operations in MoE model training
  • Flash Attention for high-performance attention implementations

Citation

@misc{2025mirotrain,
    title={MiroTrain: An Efficient and Algorithm-First Framework for Post-Training Large Agentic Models},
    author={MiroMind AI Infra Team},
    howpublished = {\url{https://github.com/MiroMindAI/MiroTrain}},
    year={2025}
}

License

This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.

About

MiroTrain is an efficient and algorithm-first framework for post-training large agentic models.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages