A comprehensive, production-ready pipeline for fine-tuning Large Language Models using Parameter-Efficient Fine-Tuning (PEFT) techniques with MLflow experiment tracking and a user-friendly GUI.
- Parameter-Efficient Fine-tuning: Support for LoRA and QLoRA techniques
- Multiple Model Support: RoBERTa, Llama 2, Mistral, GPT models, and more
- Experiment Tracking: Full MLflow integration for reproducible experiments
- Flexible Configuration: YAML-based configuration system for easy customization
- Data Pipeline: Robust data loading, preprocessing, and validation
- Model Evaluation: Comprehensive metrics and custom evaluation functions
- Web GUI: Simple interface for model selection and training configuration
- Production Ready: Docker support, CI/CD pipelines, and deployment scripts
- Resume-Friendly: Industry best practices and professional documentation
| Model Type | Model Names | Use Cases |
|---|---|---|
| Encoder Models | RoBERTa, BERT, DistilBERT | Classification, NER, QA |
| Decoder Models | Llama 2, Mistral, Falcon | Text generation, Chat |
| Encoder-Decoder | T5, BART, Flan-T5 | Summarization, Translation |
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Pipeline │ │ Training Core │ │ Evaluation │
│ │ │ │ │ │
│ • Data Loading │ -> │ • LoRA/QLoRA │ -> │ • Metrics │
│ • Preprocessing │ │ • MLflow Track. │ │ • Benchmarking │
│ • Validation │ │ • Model Manag. │ │ • Reporting │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Web GUI │ │ Configuration │ │ Deployment │
│ │ │ │ │ │
│ • Model Select │ │ • YAML Configs │ │ • Docker │
│ • Training UI │ │ • Hyperparams │ │ • API Server │
│ • Monitoring │ │ • Data Splits │ │ • Model Serving │
└─────────────────┘ └─────────────────┘ └─────────────────┘
llm-finetuning-pipeline/
├── README.md
├── requirements.txt
├── setup.py
├── .env.example
├── docker-compose.yml
├── Dockerfile
│
├── configs/ # Configuration files
│ ├── models/ # Model-specific configs
│ ├── datasets/ # Dataset-specific configs
│ └── experiments/ # Experiment configs
│
├── src/ # Source code
│ ├── __init__.py
│ ├── core/ # Core pipeline components
│ ├── models/ # Model implementations
│ ├── data/ # Data processing
│ ├── training/ # Training logic
│ ├── evaluation/ # Evaluation metrics
│ └── utils/ # Utility functions
│
├── gui/ # Web interface
│ ├── app.py # Main Flask/Streamlit app
│ ├── components/ # UI components
│ └── static/ # Static assets
│
├── notebooks/ # Jupyter notebooks
│ ├── examples/ # Usage examples
│ └── experiments/ # Research notebooks
│
├── tests/ # Unit tests
│ ├── unit/
│ ├── integration/
│ └── e2e/
│
├── docs/ # Documentation
│ ├── api/ # API documentation
│ ├── guides/ # User guides
│ └── best_practices/ # Best practices
│
├── scripts/ # Utility scripts
│ ├── setup_env.sh
│ ├── download_models.py
│ └── benchmark.py
│
└── deployments/ # Deployment configurations
├── kubernetes/
├── docker/
└── cloud/
- Python 3.8+
- CUDA-capable GPU (optional but recommended)
- Docker (optional, for containerized deployment)
# Clone the repository
git clone https://github.com/mominalix/LLM-Finetuning-Pipeline-LoRA-QLoRA.git
cd LLM-Finetuning-Pipeline-LoRA-QLoRA
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Start MLflow tracking server
mlflow server --host 0.0.0.0 --port 5000Option A: Using the Web GUI (Recommended for beginners)
# Start the web interface
streamlit run src/gui/app.py
# Open your browser to http://localhost:8501Option B: Using the Command Line
# Run RoBERTa sentiment analysis example
python examples/roberta_sentiment.py
# Or use the CLI directly
python -m src.cli train --model roberta-base --dataset imdb --epochs 3Option C: Using Python API
from src import FineTuningPipeline, load_config
# Load configuration
config = load_config("configs/experiments/roberta_sentiment_analysis.yaml")
# Run training
pipeline = FineTuningPipeline(config)
results = pipeline.run()
print(f"Training completed! F1 Score: {results['eval_f1']:.4f}")The pipeline uses YAML configuration files for maximum flexibility:
# Basic configuration structure
model:
name: "roberta-base"
task_type: "sequence_classification"
num_labels: 2
dataset:
name: "imdb"
train_split: "train"
test_split: "test"
training:
num_train_epochs: 3
per_device_train_batch_size: 16
learning_rate: 2e-5
lora:
r: 8
alpha: 16
dropout: 0.1model:
name: "roberta-base" # or "roberta-large" for better performance
task_type: "sequence_classification"
num_labels: 2 # Adjust based on your dataset
use_lora: true
use_qlora: false # Not needed for RoBERTa base/large
lora:
r: 8 # Good starting point
alpha: 16 # Usually 2x the rank
dropout: 0.1model:
name: "meta-llama/Llama-2-7b-hf"
task_type: "text_generation"
max_length: 2048
use_lora: true
use_qlora: true # Essential for 7B+ models
lora:
r: 16 # Higher rank for complex tasks
alpha: 32
dropout: 0.05
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
training:
per_device_train_batch_size: 2 # Smaller due to memory constraints
gradient_accumulation_steps: 8 # Maintain effective batch sizedataset:
name: "imdb" # Any dataset from HuggingFace Hub
text_column: "text"
label_column: "label"
validation_size: 0.1dataset:
name: "/path/to/your/dataset.csv"
text_column: "review_text"
label_column: "sentiment"
validation_size: 0.2Create a CSV file with your data:
text,label
"I love this movie!",1
"This was terrible.",0
"Great acting and plot.",1training:
per_device_train_batch_size: 32
gradient_accumulation_steps: 1
bf16: true
gradient_checkpointing: false # Can disable for speed
# Can use larger models without QLoRA
model:
name: "roberta-large" # or even "meta-llama/Llama-2-7b-hf" with QLoRAtraining:
per_device_train_batch_size: 16
gradient_accumulation_steps: 2
bf16: true
gradient_checkpointing: true
# Use QLoRA for models >1B parameters
use_qlora: true # If using large modelstraining:
per_device_train_batch_size: 4
gradient_accumulation_steps: 8
bf16: true
gradient_checkpointing: true
# Must use QLoRA for any model >500M parameters
use_qlora: true
model:
max_length: 256 # Reduce sequence length# 1. Prepare your data (CSV format)
echo "text,label
Great product quality!,positive
Terrible customer service,negative
Amazing value for money,positive" > customer_reviews.csv
# 2. Create configuration
python -m src.cli create-config \
--model roberta-base \
--dataset customer_reviews.csv \
--task sequence_classification \
--output configs/customer_reviews.yaml
# 3. Train the model
python -m src.cli train --config configs/customer_reviews.yamlmodel:
name: "roberta-base"
task_type: "token_classification"
num_labels: 9 # B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, B-MISC, I-MISC, O
dataset:
name: "conll2003"
text_column: "tokens"
label_column: "ner_tags"
lora:
r: 16 # Higher rank for sequence labeling
alpha: 32
target_modules: ["query", "value", "key"] # Include key projectionmodel:
name: "google/flan-t5-base"
task_type: "text_generation"
max_length: 512
dataset:
name: "squad"
# Custom preprocessing function will handle QA format
training:
num_train_epochs: 5
learning_rate: 3e-4 # Higher for T5
lora:
r: 32
alpha: 64
target_modules: ["q", "v", "k", "o", "wi_0", "wi_1", "wo"] # T5 modulesThe pipeline uses YAML configuration files for maximum flexibility:
# configs/roberta_sentiment.yaml
model:
name: "roberta-base"
task_type: "sequence_classification"
num_labels: 2
lora:
r: 8
alpha: 16
dropout: 0.1
target_modules: ["query", "value"]
dataset:
name: "imdb"
train_split: "train"
test_split: "test"
max_length: 512
training:
epochs: 3
batch_size: 16
learning_rate: 2e-5
warmup_steps: 100
mlflow:
experiment_name: "roberta_sentiment_analysis"
tracking_uri: "http://localhost:5000"All experiments are automatically tracked with MLflow:
- Parameters: Model configs, hyperparameters, data splits
- Metrics: Training/validation loss, accuracy, F1-score, custom metrics
- Artifacts: Model checkpoints, training plots, evaluation reports
- Models: Versioned model registry with staging/production promotion
- Start MLflow:
mlflow server --host 0.0.0.0 --port 5000 - Open browser:
http://localhost:5000 - Browse experiments and compare runs
- Download trained models
- Start GUI:
streamlit run src/gui/app.py - Open browser:
http://localhost:8501 - Navigate to "Monitoring" tab
- View real-time training progress
import mlflow
# Search for best run
best_run = mlflow.search_runs(
experiment_ids=["1"],
order_by=["metrics.eval_f1 DESC"],
max_results=1
).iloc[0]
print(f"Best F1 Score: {best_run['metrics.eval_f1']}")
print(f"Best Run ID: {best_run['run_id']}")
# Load best model
model_uri = f"runs:/{best_run['run_id']}/model"
model = mlflow.transformers.load_model(model_uri)python -m src.gui.app # Start web interface
python -m src.train configs/roberta_sentiment.yaml # CLI trainingdocker-compose up # Full stack with MLflow and GUI- AWS SageMaker: Pre-configured training jobs
- Google Cloud AI Platform: Vertex AI integration
- Azure ML: Azure Machine Learning pipelines
- Kubernetes: Scalable distributed training
from src.evaluation import MetricsComputer
def custom_accuracy(predictions, labels):
"""Custom accuracy that ignores label 0."""
mask = labels != 0
return (predictions[mask] == labels[mask]).mean()
# Add to metrics computer
computer = MetricsComputer(task_type="sequence_classification")
computer.add_custom_metric("custom_accuracy", custom_accuracy)from src.data import DataLoader
from datasets import Dataset
import pandas as pd
# Load custom data
def load_custom_dataset():
df = pd.read_json("custom_data.jsonl", lines=True)
return Dataset.from_pandas(df)
# Use in pipeline
from src.core.config import DatasetConfig
config = DatasetConfig(name="custom", ...)
loader = DataLoader(config)
# Override with custom loaderimport mlflow
from itertools import product
# Define parameter grid
param_grid = {
'learning_rate': [1e-5, 2e-5, 5e-5],
'lora_r': [8, 16, 32],
'batch_size': [8, 16, 32]
}
# Run sweep
for lr, r, batch_size in product(*param_grid.values()):
with mlflow.start_run():
# Update config
config.training.learning_rate = lr
config.lora.r = r
config.training.per_device_train_batch_size = batch_size
# Run training
pipeline = FineTuningPipeline(config)
results = pipeline.run()- Memory Efficient: QLoRA with 4-bit quantization for large models
- Optimal Parameters: Research-backed default values for rank, alpha, dropout
- Target Module Selection: Automatic identification of optimal layers
- Mixed Precision: BF16/FP16 training for faster convergence
- Type Safety: Full type hints and Pydantic validation
- Error Handling: Comprehensive exception handling and logging
- Testing: Unit, integration, and end-to-end tests
- Documentation: Detailed API docs and user guides
- Monitoring: Real-time training metrics and alerts
- Validation: Automatic data quality checks and statistics
- Caching: Efficient data loading with HuggingFace datasets
- Preprocessing: Configurable tokenization and augmentation
- Streaming: Support for large datasets that don't fit in memory
Problem: CUDA out of memory error
Solutions:
- Enable QLoRA:
use_qlora: true - Reduce batch size:
per_device_train_batch_size: 1 - Increase gradient accumulation:
gradient_accumulation_steps: 16 - Enable gradient checkpointing:
gradient_checkpointing: true - Reduce sequence length:
max_length: 256
Problem: Training is very slow
Solutions:
- Enable mixed precision:
bf16: trueorfp16: true - Increase batch size if memory allows
- Use gradient accumulation instead of small batches
- Enable dataloader optimizations:
training: dataloader_pin_memory: true dataloader_num_workers: 4
Problem: Model accuracy is low
Solutions:
- Increase LoRA rank:
r: 32or higher - Adjust learning rate: try
1e-4for LoRA - Add more epochs:
num_train_epochs: 5 - Check data quality and preprocessing
- Try different target modules:
lora: target_modules: ["query", "value", "key", "dense"]
Problem: Cannot connect to MLflow server
Solutions:
- Start MLflow server:
mlflow server --host 0.0.0.0 --port 5000 - Check firewall settings
- Use local SQLite backend:
mlflow: tracking_uri: "sqlite:///mlflow.db"
- Experiment with different models: Try various architectures for your task
- Optimize hyperparameters: Use grid search or Bayesian optimization
- Deploy to production: Use MLflow model serving or containerization
- Scale up: Use distributed training for very large datasets
- Custom extensions: Add your own metrics, callbacks, and preprocessing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for transformers and PEFT libraries
- MLflow for experiment tracking
- LoRA Paper by Hu et al.
- QLoRA Paper by Dettmers et al.