Mini Trainer

A lightweight, high-performance training library for efficient fine-tuning of large language models up to 70B parameters.

Built for speed, simplicity, and scalability 🚀

✨ Features

🔥 Liger Kernels - Minimized memory footprint through chunked loss computation
⚡ Smart Batch Packing - Automatic minibatching with numba-optimized LPT algorithm for optimal GPU load balancing
🎯 FSDP2 Support - Native PyTorch distributed training with FullyShardedDataParallel
🚫 Padding-Free - Leverages Flash Attention for efficient computation without padding overhead
♾️ Infinite Sampling - Continuous data streaming without manual epoch configuration
🔬 Orthogonal Subspace Fine-Tuning (OSFT) - Advanced continual learning technique for parameter-efficient training
📚 Pretraining Mode - Document-style pretraining with configurable block sizes on pre-tokenized input_ids
📊 Flexible Logging - JSONL metrics logging with optional Weights & Biases integration

🔬 Orthogonal Subspace Fine-Tuning (OSFT)

Mini Trainer implements Orthogonal Subspace Fine-Tuning (OSFT), a breakthrough continual learning technique that enables models to learn new tasks without catastrophic forgetting. OSFT uses adaptive SVD-based decomposition to intelligently update models in unused parameter subspaces while preserving crucial prior knowledge.

🎥 Learn More

Watch our technical deep-dive on Orthogonal Subspace Learning

📚 Resources

📝 Blog Post: Sculpting Subspaces: How We Solved Continual Learning in LLMs
📄 Research Paper: arXiv:2504.07097

🚀 Using OSFT

Enable OSFT in your training runs with the --osft flag:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --data-path ./data.jsonl \
    --output-dir ./checkpoints \
    --osft \
    --osft-unfreeze-rank-ratio 0.25  # train the 25% least important parameters

The --osft-unfreeze-rank-ratio parameter controls how much of the model to update (0.0 = everything frozen, 1.0 = full training).

📦 Installation

From PyPI

# Install base package
pip install rhai-innovation-mini-trainer

# Install CUDA dependencies (required for GPU training)
pip install rhai-innovation-mini-trainer[cuda] --no-build-isolation

From Source (Editable)

# Clone the repository
git clone https://github.com/Red-Hat-AI-Innovation-Team/mini_trainer.git
cd mini_trainer

# Install in editable mode
pip install -e .

# Install CUDA dependencies
pip install -e .[cuda] --no-build-isolation

🎯 Usage

Training is orchestrated through the api_train.py module, which provides a programmatic interface for launching training jobs. You can run training using torchrun for distributed setups:

torchrun --nnodes=1 --nproc-per-node=8 -m mini_trainer.train \
    --output-dir ./checkpoints \
    --data-path ./data.jsonl \
    --model-name-or-path meta-llama/Llama-3.1-8B-Instruct \
    --batch-size 128 \
    --max-tokens-per-gpu 128000 \
    --learning-rate 5e-6 \
    --use-liger-kernels

Key Parameters

--model-name-or-path - HuggingFace model identifier or local path
--data-path - Path to tokenized training data (JSONL format)
--batch-size - Target batch size for training
--max-tokens-per-gpu - Maximum tokens per GPU (auto-balances minibatches)
--output-dir - Directory for checkpoints and logs
--use-liger-kernels - Enable memory-efficient Liger kernels
--osft - Enable Orthogonal Subspace Fine-Tuning mode
--osft-unfreeze-rank-ratio - Ratio of model parameters to train with OSFT (0.0-1.0)
--block-size - Enables pretraining mode with the given block length

For the complete list of arguments and advanced configuration options, see src/mini_trainer/api_train.py.

🛠️ Contributors – Looking for the lazy-init + FSDP2 loading flow?
See docs/distributed_initialization.md for diagrams and a detailed walkthrough of the SFT and OSFT pipelines.

📊 Data Format

Mini Trainer expects pre-tokenized data in JSONL format with the following structure:

{"input_ids": [1, 2, 3, ...], "labels": [1, 2, 3, ...], "len": 128}
{"input_ids": [4, 5, 6, ...], "labels": [-100, -100, 6, ...], "len": 256}

Each line should contain:

input_ids - Tokenized input sequence
labels - Target labels (use -100 for tokens to ignore in loss computation)
len - Sequence length (optional, computed automatically if missing)

🔄 Data Processing

Mini Trainer does not include data processing utilities. For tokenization and data preparation, please use the instructlab-training APIs, which provide robust data processing pipelines compatible with Mini Trainer's input format.

🧱 Pretraining Mode

Mini Trainer supports pretraining on tokenized document corpora. Pass a --block-size to enable the document pipeline (the input JSONL is expected to have an input_ids column):

torchrun --nnodes=1 --nproc-per-node=4 -m mini_trainer.train \
    --model-name-or-path qwen/Qwen2.5-1.5B-Instruct \
    --data-path ./documents.jsonl \
    --output-dir ./checkpoints \
    --batch-size 16 \
    --max-tokens-per-gpu 8192 \
    --block-size 512

--block-size (required) enables pretraining mode and defines the token length for each block.

Programmatic usage mirrors the CLI via PretrainingConfig:

from mini_trainer import TrainingArgs, PretrainingConfig

args = TrainingArgs(
    model_name_or_path="mistralai/Mistral-7B-v0.1",
    data_path="documents.jsonl",
    output_dir="./checkpoints",
    batch_size=128,
    max_tokens_per_gpu=40000,
    pretraining_config=PretrainingConfig(
        block_size=4096,
    ),
)

🐛 Bug Reports & Issues

Found a bug or have a feature request? We'd love to hear from you! Please open an issue on GitHub with:

A clear description of the problem
Steps to reproduce
Expected vs. actual behavior
Environment details (Python version, GPU type, etc.)

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Built with ❤️ by the Red Hat AI Innovation Team.

Mini Trainer is part of a broader ecosystem of LLM tools developed by the AI Innovation Team. Check out our other projects:

training_hub - Post-training algorithms for LLMs
its_hub - Inference-time scaling for LLMs
sdg_hub - Synthetic data generation pipelines
reward_hub - State-of-the-art reward models

Visit ai-innovation.team to explore all our open-source tools and research.

Special thanks to the open-source community for contributions and feedback!

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
docs		docs
regression_tests		regression_tests
research_scratch		research_scratch
scripts		scripts
src/mini_trainer		src/mini_trainer
tests		tests
tutorials		tutorials
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Readme.md		Readme.md
image.png		image.png
pyproject.toml		pyproject.toml
test-vector-projection.py		test-vector-projection.py
test.jsonl		test.jsonl
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini Trainer

A lightweight, high-performance training library for efficient fine-tuning of large language models up to 70B parameters.

✨ Features

🔬 Orthogonal Subspace Fine-Tuning (OSFT)

🎥 Learn More

📚 Resources

🚀 Using OSFT

📦 Installation

From PyPI

From Source (Editable)

🎯 Usage

Key Parameters

📊 Data Format

🔄 Data Processing

🧱 Pretraining Mode

🐛 Bug Reports & Issues

📝 License

🙏 Acknowledgments

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

Red-Hat-AI-Innovation-Team/mini_trainer

Folders and files

Latest commit

History

Repository files navigation

Mini Trainer

A lightweight, high-performance training library for efficient fine-tuning of large language models up to 70B parameters.

✨ Features

🔬 Orthogonal Subspace Fine-Tuning (OSFT)

🎥 Learn More

📚 Resources

🚀 Using OSFT

📦 Installation

From PyPI

From Source (Editable)

🎯 Usage

Key Parameters

📊 Data Format

🔄 Data Processing

🧱 Pretraining Mode

🐛 Bug Reports & Issues

📝 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages