Skip to content

Transformers Distillation is a Python library for knowledge distillation of Hugging Face Transformers. Train smaller, faster student models from large teacher models while retaining performance. Supports PyTorch, tokenizers, and datasets for efficient model compression and deployment.

License

Notifications You must be signed in to change notification settings

Dhiraj309/transformers_distillation

Repository files navigation

🧪 HF Distiller — Knowledge Distillation for Hugging Face Models

HF Banner

Python License Hugging Face

HF Distiller is an open-source toolkit for performing knowledge distillation on Hugging Face Transformers models. It allows developers to train smaller, faster student models from large pre-trained teacher models while maintaining high performance.


📖 Overview

Knowledge Distillation (KD) compresses a large model into a smaller one by transferring the “knowledge” learned by the teacher to the student. HF Distiller wraps around Hugging Face’s Trainer to make KD accessible, modular, and intuitive.

Key Features:

  • ✅ Load any teacher model from Hugging Face Hub
  • ✅ Create smaller student models from scratch
  • ✅ Supports Hugging Face tokenizers
  • ✅ Seamless integration with the datasets library
  • ✅ Transparent logging and checkpointing
  • ✅ Fully compatible with PyTorch and Transformers

🖼 Architecture

           ┌────────────────────────┐
           │      Teacher Model      │  Pretrained Hugging Face LM
           └────────────┬───────────┘
                        │
                        ▼
           ┌────────────────────────┐
           │ Knowledge Distillation  │  Transfer teacher knowledge + KD loss
           └────────────┬───────────┘
                        │
                        ▼
           ┌────────────────────────┐
           │      Student Model      │  Smaller, efficient model trained from scratch
           └────────────────────────┘

⚡ Installation

#Install transformers_distilattion (Recommended)
pip install --no-deps git+https://github.com/Dhiraj309/transformers_distillation.git

#OR

# Clone repository
git clone https://github.com/Dhiraj309/transformers_distillation.git
cd transformers_distillation.git

# Install dependencies
pip install -r requirements.txt

🏃 Quick Start

from transformers_distillation.models import load_teacher, load_student
from transformers_distillation.trainer import DistillTrainer
from transformers import AutoTokenizer, TrainingArguments
from datasets import Dataset

# Example dataset
dataset = Dataset.from_dict({"text": ["Hello world!", "AI is amazing."]})

# Load teacher
teacher = load_teacher("google-bert/bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")

# Create student model
student = load_student(
    model_name_or_path="google-bert/bert-base-uncased",
    from_scratch=True,
    n_layers=4,
    n_heads=4,
    n_embd=256,
    is_pretrained=False
)

# Tokenize
def tokenize(batch):
    return tokenizer(batch["text"], max_length=128, padding=True, truncation=True)

tokenized = dataset.map(tokenize, remove_columns=["text"])

# Training arguments
training_args = TrainingArguments(
    output_dir="./student-llm",
    per_device_train_batch_size=1,
    num_train_epochs=1,
    learning_rate=2e-4,
    report_to="none"
)

# Train student with KD
trainer = DistillTrainer(
    teacher_model=teacher,
    student_model=student,
    train_dataset=tokenized,
    tokenizer=tokenizer,
    training_args=training_args,
    kd_alpha=0.5,
    temperature=2.0
)
trainer.train()

📂 Project Status

Stage Status
Core Development ✅ Complete
Documentation ✅ Complete
Community Feedback 🚧 In Progress
Tutorials & Examples 🚧 In Progress

🤝 Collaboration

We welcome contributions from the community, including:

  • Pull requests for new KD strategies
  • Bug reports and feature requests
  • Tutorials and example scripts
  • Optimization for faster student training

🔗 GitHub: Dhiraj309 🔗 Hugging Face: dignity045


📜 License

Released under the MIT License — free to use, modify, and distribute. See LICENSE for full terms.

About

Transformers Distillation is a Python library for knowledge distillation of Hugging Face Transformers. Train smaller, faster student models from large teacher models while retaining performance. Supports PyTorch, tokenizers, and datasets for efficient model compression and deployment.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages