Skip to content

Red-Hat-AI-Innovation-Team/mini_trainer

Repository files navigation

Mini Trainer

PR Tests Python 3.11+ License: Apache 2.0 codecov

A lightweight, high-performance training library for efficient fine-tuning of large language models up to 70B parameters.

Mini Trainer Logo

Built for speed, simplicity, and scalability πŸš€


✨ Features

  • πŸ”₯ Liger Kernels - Minimized memory footprint through chunked loss computation
  • ⚑ Smart Batch Packing - Automatic minibatching with numba-optimized LPT algorithm for optimal GPU load balancing
  • 🎯 FSDP2 Support - Native PyTorch distributed training with FullyShardedDataParallel
  • 🚫 Padding-Free - Leverages Flash Attention for efficient computation without padding overhead
  • ♾️ Infinite Sampling - Continuous data streaming without manual epoch configuration
  • πŸ”¬ Orthogonal Subspace Fine-Tuning (OSFT) - Advanced continual learning technique for parameter-efficient training
  • πŸ“š Pretraining Mode - Document-style pretraining with configurable block sizes on pre-tokenized input_ids
  • πŸ“Š Flexible Logging - JSONL metrics logging with optional Weights & Biases integration

πŸ”¬ Orthogonal Subspace Fine-Tuning (OSFT)

arXiv

Mini Trainer implements Orthogonal Subspace Fine-Tuning (OSFT), a breakthrough continual learning technique that enables models to learn new tasks without catastrophic forgetting. OSFT uses adaptive SVD-based decomposition to intelligently update models in unused parameter subspaces while preserving crucial prior knowledge.

πŸŽ₯ Learn More

Orthogonal Subspace Learning

Watch our technical deep-dive on Orthogonal Subspace Learning

πŸ“š Resources

πŸš€ Using OSFT

Enable OSFT in your training runs with the --osft flag:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --data-path ./data.jsonl \
    --output-dir ./checkpoints \
    --osft \
    --osft-unfreeze-rank-ratio 0.25  # train the 25% least important parameters

The --osft-unfreeze-rank-ratio parameter controls how much of the model to update (0.0 = everything frozen, 1.0 = full training).


πŸ“¦ Installation

From PyPI

# Install base package
pip install rhai-innovation-mini-trainer

# Install CUDA dependencies (required for GPU training)
pip install rhai-innovation-mini-trainer[cuda] --no-build-isolation

From Source (Editable)

# Clone the repository
git clone https://github.com/Red-Hat-AI-Innovation-Team/mini_trainer.git
cd mini_trainer

# Install in editable mode
pip install -e .

# Install CUDA dependencies
pip install -e .[cuda] --no-build-isolation

🎯 Usage

Training is orchestrated through the api_train.py module, which provides a programmatic interface for launching training jobs. You can run training using torchrun for distributed setups:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --output-dir ./checkpoints \
    --data-path ./data.jsonl \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --batch-size 128 \
    --max-tokens-per-gpu 128000 \
    --learning-rate 5e-6 \
    --use-liger-kernels

Key Parameters

  • --model-name-or-path - HuggingFace model identifier or local path
  • --data-path - Path to tokenized training data (JSONL format)
  • --batch-size - Target batch size for training
  • --max-tokens-per-gpu - Maximum tokens per GPU (auto-balances minibatches)
  • --output-dir - Directory for checkpoints and logs
  • --use-liger-kernels - Enable memory-efficient Liger kernels
  • --osft - Enable Orthogonal Subspace Fine-Tuning mode
  • --osft-unfreeze-rank-ratio - Ratio of model parameters to train with OSFT (0.0-1.0)
  • --block-size - Enables pretraining mode with the given block length

For the complete list of arguments and advanced configuration options, see src/mini_trainer/api_train.py.


πŸ› οΈ Contributors – Looking for the lazy-init + FSDP2 loading flow?
See docs/distributed_initialization.md for diagrams and a detailed walkthrough of the SFT and OSFT pipelines.

πŸ“Š Data Format

Mini Trainer expects pre-tokenized data in JSONL format with the following structure:

{"input_ids": [1, 2, 3, ...], "labels": [1, 2, 3, ...], "len": 128}
{"input_ids": [4, 5, 6, ...], "labels": [-100, -100, 6, ...], "len": 256}

Each line should contain:

  • input_ids - Tokenized input sequence
  • labels - Target labels (use -100 for tokens to ignore in loss computation)
  • len - Sequence length (optional, computed automatically if missing)

πŸ”„ Data Processing

Mini Trainer does not include data processing utilities. For tokenization and data preparation, please use the instructlab-training APIs, which provide robust data processing pipelines compatible with Mini Trainer's input format.

🧱 Pretraining Mode

Mini Trainer supports pretraining on tokenized document corpora. Pass a --block-size to enable the document pipeline (the input JSONL is expected to have an input_ids column):

torchrun --nnodes=1 --nproc-per-node=4 -m mini_trainer.train \
    --model-name-or-path qwen/Qwen2.5-1.5B-Instruct \
    --data-path ./documents.jsonl \
    --output-dir ./checkpoints \
    --batch-size 16 \
    --max-tokens-per-gpu 8192 \
    --block-size 512
  • --block-size (required) enables pretraining mode and defines the token length for each block.

Programmatic usage mirrors the CLI via PretrainingConfig:

from mini_trainer import TrainingArgs, PretrainingConfig

args = TrainingArgs(
    model_name_or_path="mistralai/Mistral-7B-v0.1",
    data_path="documents.jsonl",
    output_dir="./checkpoints",
    batch_size=128,
    max_tokens_per_gpu=40000,
    pretraining_config=PretrainingConfig(
        block_size=4096,
    ),
)

πŸ› Bug Reports & Issues

Found a bug or have a feature request? We'd love to hear from you! Please open an issue on GitHub with:

  • A clear description of the problem
  • Steps to reproduce
  • Expected vs. actual behavior
  • Environment details (Python version, GPU type, etc.)

πŸ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


πŸ™ Acknowledgments

Built with ❀️ by the Red Hat AI Innovation Team.

Mini Trainer is part of a broader ecosystem of LLM tools developed by the AI Innovation Team. Check out our other projects:

  • training_hub - Post-training algorithms for LLMs
  • its_hub - Inference-time scaling for LLMs
  • sdg_hub - Synthetic data generation pipelines
  • reward_hub - State-of-the-art reward models

Visit ai-innovation.team to explore all our open-source tools and research.

Special thanks to the open-source community for contributions and feedback!

About

fast trainer for educational purposes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages