Releases: peremartra/optipfair
v0.3.0 - Fairness-Aware Pruning
🎉 New Features
Fairness-Aware Pruning
- New Function:
analyze_neuron_bias()- Analyze per-neuron bias contributions across multiple demographic prompt pairs- Computes activation-based bias scores for individual neurons
- Supports multiple aggregation methods (mean, max) across sequence positions
- Works with GLU architecture MLP layers (gate_proj, up_proj)
- New Function:
compute_fairness_pruning_scores()- Combine bias and importance scores for balanced pruning- Configurable
bias_weightparameter (0.0 to 1.0) to adjust fairness vs. performance trade-offs - Returns fairness pruning scores for each layer
- Enables fairness-aware neuron selection strategies
- Configurable
Enhanced Pruning Integration
- Modified:
prune_model_mlp_glu()- Improved compatibility with fairness-aware workflows - Documentation: Added comprehensive fairness-aware pruning guide with examples
📚 Documentation Enhancements
- Complete guide to fairness-aware pruning workflow
- Step-by-step tutorials for new functions
- Understanding the bias_weight parameter with recommended configurations
- Complete end-to-end examples
- New example notebook:
fairness_aware_pruning_demo.ipynb
🧪 Testing & Quality
- Compatible with existing pruning functionality
- No breaking changes to existing API
- All existing tests remain passing
📦 Installation
pip install optipfair==0.3.0
OptiPFair v0.2.4 - L2 & Universal DataLoader Support
Bug Fix Hybrid Pruning
v0.2.2 - Selective Layer Width Pruning
🚀 OptiPFair v0.2.2 - Selective Layer Width Pruning
We're excited to announce OptiPFair v0.2.2, bringing powerful new capabilities for fine-grained control over model pruning!
🎯 Headline Features
1️⃣ Selective Layer Width Pruning
The layer_indices parameter now works for both DEPTH and MLP_GLU pruning, giving you unprecedented control over which layers to optimize:
from optipfair import prune_model
# Prune neurons ONLY in specific layers (preserve first & last)
pruned_model = prune_model(
model=model,
pruning_type="MLP_GLU",
pruning_percentage=30,
layer_indices=[5, 10, 15, 20], # Only these layers are pruned
show_progress=True
)Key Benefits:
- 🛡️ Preserve Critical Layers: Keep embedding and output layers at full capacity
- 🎯 Targeted Optimization: Prune only the layers that matter
- 🔬 Data-Driven Selection: Combine with layer importance analysis
- ⚡ Full Feature Support: Works with expansion_rate, expansion_divisor, dataloader, all methods
2️⃣ Optimized Hybrid Importance Calculation
We've streamlined the data-driven pruning algorithm for better performance:
- Simplified gate_proj & up_proj: Now use the same fast MAW method as static pruning
- Focused Complexity: Activation-weighted calculation only where it matters (down_proj)
- Faster Execution: Reduced computational overhead while maintaining effectiveness
- Consistent Methodology: Same MAW formula across static and hybrid approaches
📊 What's New
Extended API
- ✅
layer_indicesparameter now contextual: removes layers for DEPTH, prunes neurons for MLP_GLU - ✅ Comprehensive validation: checks for valid indices, duplicates, empty lists, type errors
- ✅ Enhanced statistics: reports
pruned_layersandtotal_layersfor selective pruning
Improved Performance
- ⚡ Faster hybrid importance calculation
- 💾 Selective hook registration (only on specified layers)
- 🎯 More efficient calibration with layer_indices
Better Documentation
- 📖 Complete "Selective Layer Width Pruning" guide in README
- 📝 Extended reference manual with 4+ detailed examples
- 💻 New example file with 5 practical use cases
- 🧪 12 comprehensive test cases
💡 Common Use Cases
Use Case 1: Preserve Embedding Layers
# Prune all middle layers, preserve first and last 5
num_layers = len(model.model.layers)
middle_layers = list(range(5, num_layers - 5))
pruned_model = prune_model(
model=model,
pruning_type="MLP_GLU",
pruning_percentage=25,
layer_indices=middle_layers
)Use Case 2: Importance-Based Pruning
from optipfair import analyze_layer_importance
# Step 1: Analyze which layers are least important
importance_scores = analyze_layer_importance(model, dataloader)
sorted_layers = sorted(importance_scores.items(), key=lambda x: x[1])
least_important = [idx for idx, score in sorted_layers[:10]]
# Step 2: Prune only those layers
pruned_model = prune_model(
model=model,
pruning_type="MLP_GLU",
pruning_percentage=30,
layer_indices=least_important
)Use Case 3: Data-Driven Selective Pruning
# Combine calibration data with selective pruning
pruned_model = prune_model(
model=model,
pruning_type="MLP_GLU",
neuron_selection_method="MAW",
pruning_percentage=20,
dataloader=calibration_dataloader, # Hybrid importance
layer_indices=[5, 10, 15, 20], # Only these layers
show_progress=True
)🔧 Technical Highlights
Modified Core Functions
prune_model(): Now passes layer_indices to MLP_GLU pruningprune_model_mlp_glu(): Full selective pruning implementation with validationsetup_mlp_hooks_for_importance(): Selective hook registrationcompute_neuron_pair_importance_maw_hybrid(): Simplified and optimizedget_pruning_statistics(): Detects and reports selective pruning
Enhanced CLI
# CLI now supports layer_indices for both pruning types
optipfair prune \
--model-path meta-llama/Llama-3.2-1B \
--pruning-type MLP_GLU \
--pruning-percentage 30 \
--layer-indices "5,10,15,20" \
--output-path ./pruned-model🧪 Testing & Validation
- ✅ 12 comprehensive test cases in
tests/test_selective_layer_pruning.py - ✅ Tested with all neuron selection methods (MAW, VOW, PON)
- ✅ Verified compatibility with expansion_rate, expansion_divisor, dataloader
- ✅ Validated error handling and edge cases
- ✅ Confirmed backward compatibility with v0.2.1
📦 Installation
pip install --upgrade optipfairOr with visualization support:
pip install --upgrade "optipfair[viz]"📚 Resources
- Documentation: https://peremartra.github.io/optipfair/
- GitHub: https://github.com/peremartra/optipfair
- Examples: Check out
examples/selective_layer_width_pruning.py - Tests: See
tests/test_selective_layer_pruning.py
🙏 Acknowledgments
Thank you to our community for the feedback and suggestions that made this release possible!
📝 Full Changelog
See CHANGELOG.md for detailed changes.
Upgrade today and take control of your model optimization! 🚀
Questions or issues? Open an issue on GitHub.
Hardware-Optimized width Pruning
🎉 OptiPFair v0.2.1 - Hardware-Optimized Pruning
This release introduces the expansion_divisor parameter for hardware-optimized model pruning, enabling better GPU/TPU performance through aligned tensor dimensions.
✨ What's New
Hardware-Optimized Pruning with expansion_divisor
The new expansion_divisor parameter allows you to round intermediate layer sizes to specific multiples (32, 64, 128, or 256), optimizing pruned models for modern GPU and TPU architectures.
Quick Example:
from optipfair import prune_model
pruned_model = prune_model(
model=model,
pruning_percentage=20,
expansion_divisor=128, # Round to multiple of 128
show_progress=True
)Key Benefits:
- 🚀 Better GPU performance through optimized memory access patterns
- ⚡ Improved tensor core efficiency with aligned dimensions
- 🎯 Flexible integration with both
pruning_percentageandexpansion_rate - 🔧 Simple to use - just one parameter
📚 New Resources
- Example Notebook:
expansion_divisor_example.ipynb- Complete tutorial with comparisons - Test Suite: Comprehensive tests in
tests/test_expansion_divisor.py - Documentation: Updated README, LLM reference manual, and API docs
🔧 Technical Details
New Functions:
round_to_divisor(): Utility function for precise rounding to nearest multiple
Modified Functions:
prune_model(): Addedexpansion_divisorparameterprune_model_mlp_glu(): Integrated validation and rounding logicprune_neuron_pairs(): Applies rounding after pruning calculation
Validation:
- Valid values:
None(default),32,64,128,256 - Requires either
pruning_percentageorexpansion_rate - Maintains bounds: result always ≥1 and ≤ original size
🔄 Compatibility
- ✅ Fully backward compatible with v0.2.0
- ✅ Works with all neuron selection methods (MAW, VOW, PON)
- ✅ Compatible with both static and data-driven pruning
- ✅ No breaking changes
📦 Installation
pip install --upgrade optipfair
# or
pip install optipfair==0.2.1📖 Documentation
🙏 Acknowledgments
Thank you to the community for your feedback and contributions!
Full Changelog: https://github.com/peremartra/optipfair/blob/main/CHANGELOG.md
v0.2.0 - Data-Driven Width Pruning
OptiPFair v0.2.0 - Data-Driven Width Pruning
🌟 Major Features
Data-Driven Width Pruning
This release introduces hybrid importance calculation for neuron pruning, combining static weight analysis with dynamic activation statistics from calibration data.
Key capabilities:
- Activation-aware pruning: Uses real data to guide neuron selection
- Domain adaptation: Optimize pruning for your specific use case
- Research-backed: Based on CFSP methodology (arXiv:2409.13199v2)
- Easy integration: Just add a dataloader parameter
What's New
API Changes
- Added
dataloaderparameter toprune_model()function - Automatic switching between static and hybrid pruning
- Compatible with MAW neuron selection method
New Functions
compute_neuron_pair_importance_maw_hybrid(): Hybrid importance calculationsetup_mlp_hooks_for_importance(): Activation capture via PyTorch hooksrun_calibration_forward_passes(): Calibration workflow with progress trackingget_activation_norms(): Retrieve accumulated activation statistics
Documentation
- Complete usage guide for data-driven pruning
- Updated API reference with examples
- Best practices for calibration data selection
- Comprehensive CHANGELOG
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.utils.data import DataLoader, TensorDataset
from optipfair import prune_model
# Load model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
# Prepare calibration data
texts = ["Your domain-specific examples..."] * 500
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
dataset = TensorDataset(inputs['input_ids'], inputs['attention_mask'])
dataloader = DataLoader(dataset, batch_size=8)
# Prune with data-driven method
pruned_model = prune_model(
model=model,
neuron_selection_method="MAW",
pruning_percentage=20,
dataloader=dataloader, # ← NEW: Enables hybrid pruning
show_progress=True
)Installation
pip install --upgrade optipfairBreaking Changes
None - This release is fully backward compatible with v0.1.x
🔗 Documentation
Acknowledgments
This implementation is based on the CFSP paper:
"CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information" (arXiv:2409.13199v2)
Full Changelog: https://github.com/peremartra/optipfair/blob/main/CHANGELOG.md
OptiPFair v0.1.5 - Layer Importance Analysis
New Features
Layer Importance Analysis
- Added
analyze_layer_importance()function for analyzing transformer layer importance using cosine similarity - Multi-architecture support: automatic detection of layer paths for LLaMA, Qwen, Mistral, GPT-2, and other architectures
- Integration with depth pruning workflows to inform layer removal decisions
- Progress tracking and robust error handling
Improvements
- Enhanced documentation with layer analysis examples
- Updated API reference with new functionality
Usage Example
from optipfair import analyze_layer_importance
importance_scores = analyze_layer_importance(model, dataloader)OptiPFair v0.1.4 - Depth Pruning Support
[0.1.4] - 2025-01-18
Added
- Depth pruning functionality for removing entire transformer layers
- Enhanced documentation with complete depth pruning guide
- Automated documentation deployment via GitHub Actions
Changed
- Updated examples to include depth pruning demonstrations
- Improved API documentation structure
Fixed
- Documentation deployment workflow permissions
- Missing dependencies in CI/CD pipeline
Added Bias Activations Visualizations.
This release adds a comprehensive bias visualization module for analyzing how transformer models process information differently based on protected attributes (race, gender, etc.).
New features:
- Visualization of activation differences across model layers
- Heatmap analysis for detailed inspection of bias patterns
- PCA visualization showing demographic effect on token representations
- Quantitative bias metrics for consistent evaluation
- Integration with existing pruning functionality
- Documentation and examples for bias analysis
This update enables researchers and practitioners to understand where bias manifests in model architectures and evaluate how pruning affects fairness.
Fixes
Bug in the creation of new layers.