Skip to content

Commit ca9d69e

Browse files
committed
Release v0.3.0: Fairness-Aware Pruning
- Add analyze_neuron_bias() for per-neuron bias analysis - Add compute_fairness_pruning_scores() for balanced pruning - Enhanced documentation with comprehensive fairness examples - Updated API reference and usage guides - Improved prune_model_mlp_glu compatibility
1 parent 3258659 commit ca9d69e

File tree

6 files changed

+574
-7
lines changed

6 files changed

+574
-7
lines changed

CHANGELOG.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,47 @@
22

33
---
44

5+
## [0.3.0] - 2026-03-02
6+
7+
### 🎉 New Features
8+
9+
#### Fairness-Aware Pruning
10+
- **New Function**: `analyze_neuron_bias()` - Analyze per-neuron bias contributions across multiple demographic prompt pairs
11+
- Computes activation-based bias scores for individual neurons
12+
- Supports multiple aggregation methods (mean, max) across sequence positions
13+
- Works with GLU architecture MLP layers (gate_proj, up_proj)
14+
- **New Function**: `compute_fairness_pruning_scores()` - Combine bias and importance scores for balanced pruning
15+
- Configurable `bias_weight` parameter (0.0 to 1.0) to adjust fairness vs. performance trade-offs
16+
- Returns fairness pruning scores for each layer
17+
- Enables fairness-aware neuron selection strategies
18+
19+
#### Enhanced Pruning Integration
20+
- **Modified**: `prune_model_mlp_glu()` - Improved compatibility with fairness-aware workflows
21+
- **Documentation**: Added comprehensive fairness-aware pruning guide with examples
22+
23+
### 📚 Documentation Enhancements
24+
25+
#### New Fairness-Aware Pruning Section
26+
- Complete guide to fairness-aware pruning workflow with:
27+
- Step-by-step tutorial for `analyze_neuron_bias()`
28+
- Step-by-step tutorial for `compute_fairness_pruning_scores()`
29+
- Understanding the bias_weight parameter with recommended configurations
30+
- Complete end-to-end example combining bias analysis with pruning
31+
- Common patterns for fairness-aware analysis
32+
- New example notebook: `fairness_aware_pruning_demo.ipynb`
33+
34+
#### Updated API Documentation
35+
- Added `analyze_neuron_bias()` to API reference
36+
- Added `compute_fairness_pruning_scores()` to API reference
37+
- Enhanced usage guide with fairness workflows
38+
39+
### 🧪 Testing & Quality
40+
- Compatible with existing pruning functionality
41+
- No breaking changes to existing API
42+
- All existing tests remain passing
43+
44+
---
45+
546
## [0.2.4] - 2026-01-10
647

748
### 🎉 New Features

docs/api.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,123 @@ def calculate_bias_metrics(act1: Dict[str, torch.Tensor], act2: Dict[str, torch.
231231
"""
232232
```
233233

234+
## Fairness-Aware Pruning Analysis (NEW in v0.3.0)
235+
236+
### `analyze_neuron_bias`
237+
238+
```python
239+
def analyze_neuron_bias(
240+
model: PreTrainedModel,
241+
tokenizer: Any,
242+
prompt_pairs: List[Tuple[str, str]],
243+
target_layers: List[str] = None,
244+
aggregation: str = "mean",
245+
batch_size: int = 4,
246+
show_progress: bool = True,
247+
) -> Dict[str, torch.Tensor]:
248+
"""
249+
Analyze bias contributions at the individual neuron level across multiple demographic prompt pairs.
250+
251+
Computes per-neuron bias scores by comparing activations across prompt pairs that differ
252+
only in demographic attributes. This enables identification of which specific neurons
253+
contribute most to bias in the model.
254+
255+
Args:
256+
model: HuggingFace PreTrainedModel with GLU architecture (e.g., LLaMA, Mistral)
257+
tokenizer: Matching tokenizer for encoding prompts
258+
prompt_pairs: List of (prompt1, prompt2) tuples where each pair differs only in
259+
the demographic attribute being tested (e.g., gender, ethnicity)
260+
target_layers: List of layer projection types to analyze. Options:
261+
["gate_proj", "up_proj"] (both), or subsets
262+
Default: ["gate_proj", "up_proj"]
263+
aggregation: How to aggregate bias across sequence positions:
264+
- "mean": Average bias across all tokens (default, more stable)
265+
- "max": Maximum bias across any token position (more sensitive)
266+
batch_size: Batch size for processing prompt pairs. Adjust based on GPU memory.
267+
show_progress: Whether to display progress bar
268+
269+
Returns:
270+
Dict[str, torch.Tensor]: Mapping of layer names to bias score tensors
271+
Example: {"gate_proj_layer_5": tensor([...]), "up_proj_layer_5": tensor([...]), ...}
272+
where each tensor has shape [intermediate_hidden_dim] with bias scores for each neuron
273+
274+
Example:
275+
>>> from optipfair.bias import analyze_neuron_bias
276+
>>> import optipfair as opf
277+
>>>
278+
>>> model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
279+
>>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
280+
>>>
281+
>>> prompt_pairs = [
282+
... ("The male nurse helped the patient.", "The female nurse helped the patient."),
283+
... ("White scientist discovered X.", "Black scientist discovered X."),
284+
... ]
285+
>>>
286+
>>> bias_scores = analyze_neuron_bias(
287+
... model=model,
288+
... tokenizer=tokenizer,
289+
... prompt_pairs=prompt_pairs,
290+
... target_layers=["gate_proj", "up_proj"],
291+
... aggregation="mean"
292+
... )
293+
"""
294+
```
295+
296+
### `compute_fairness_pruning_scores`
297+
298+
```python
299+
def compute_fairness_pruning_scores(
300+
model: PreTrainedModel,
301+
bias_scores: Dict[str, torch.Tensor],
302+
bias_weight: float = 0.4,
303+
) -> Dict[int, torch.Tensor]:
304+
"""
305+
Compute fairness-aware pruning scores by combining bias and importance metrics.
306+
307+
Creates balanced pruning scores that account for both:
308+
1. **Bias**: Neurons with high bias should be preserved (not pruned)
309+
2. **Importance**: Neurons that are unimportant should be pruned
310+
311+
The resulting scores indicate which neurons are "safe" to prune:
312+
High scores = low bias + low importance = good to prune
313+
Low scores = high bias or high importance = risky to prune
314+
315+
Args:
316+
model: HuggingFace PreTrainedModel with GLU architecture
317+
bias_scores: Dictionary returned by analyze_neuron_bias()
318+
Format: Dict[str, torch.Tensor] with layer names as keys
319+
bias_weight: Weight to balance fairness vs. performance (float, 0.0 to 1.0):
320+
- 0.0: Pure importance (standard pruning, ignore bias)
321+
- 0.4-0.5: Balanced (RECOMMENDED) - good compression + fairness
322+
- 0.7: Fairness-critical - prioritize reducing bias
323+
- 1.0: Pure bias (preserve important but biased neurons)
324+
325+
Returns:
326+
Dict[int, torch.Tensor]: Mapping of layer indices to fairness pruning scores
327+
Example: {0: tensor([...]), 1: tensor([...]), ...}
328+
where each tensor has shape [intermediate_hidden_dim]
329+
330+
Raises:
331+
ValueError: If bias_weight is not in [0.0, 1.0] range
332+
ValueError: If bias_scores format is invalid or missing expected layers
333+
334+
Example:
335+
>>> from optipfair.bias import compute_fairness_pruning_scores
336+
>>>
337+
>>> # After computing bias_scores with analyze_neuron_bias()
338+
>>> fairness_scores = compute_fairness_pruning_scores(
339+
... model=model,
340+
... bias_scores=bias_scores,
341+
... bias_weight=0.45 # Balanced approach
342+
... )
343+
>>>
344+
>>> # Identify neurons safe to prune
345+
>>> for layer_idx, scores in fairness_scores.items():
346+
... safe_neurons = (scores > 0.75).sum().item()
347+
... print(f"Layer {layer_idx}: {safe_neurons} safe neurons")
348+
"""
349+
```
350+
234351
## Pruning Module
235352

236353
### MLP GLU Pruning

docs/usage.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,4 +737,171 @@ When using tuple or list batches, elements are automatically mapped to standard
737737

738738
**Note**: All formats are fully backward compatible. Existing code continues to work without modifications.
739739

740+
---
741+
742+
## Fairness-Aware Pruning (NEW in v0.3.0)
743+
744+
OptiPFair v0.3.0 introduces fairness-aware pruning, which combines bias analysis with pruning decisions to create models that are both smaller and potentially less biased.
745+
746+
### Overview
747+
748+
Traditional pruning focuses solely on minimizing performance loss. Fairness-aware pruning adds an additional dimension: identifying and potentially removing neurons that contribute to demographic bias.
749+
750+
The workflow consists of two main steps:
751+
752+
1. **Analyze Neuron Bias**: Identify which neurons contribute most to bias across demographic groups
753+
2. **Compute Fairness Scores**: Combine bias scores with importance scores for balanced pruning decisions
754+
755+
### Step 1: Analyze Neuron Bias
756+
757+
```python
758+
from transformers import AutoModelForCausalLM, AutoTokenizer
759+
import optipfair as opf
760+
from optipfair.bias import analyze_neuron_bias
761+
762+
# Load model
763+
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
764+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
765+
766+
# Define prompt pairs that differ only in demographic attributes
767+
prompt_pairs = [
768+
("The male nurse was helpful.", "The female nurse was helpful."),
769+
("White doctor examined the patient.", "Black doctor examined the patient."),
770+
("Young engineer designed the system.", "Old engineer designed the system."),
771+
]
772+
773+
# Analyze per-neuron bias
774+
bias_scores = analyze_neuron_bias(
775+
model=model,
776+
tokenizer=tokenizer,
777+
prompt_pairs=prompt_pairs,
778+
target_layers=["gate_proj", "up_proj"], # Analyze these MLP components
779+
aggregation="mean", # "mean" or "max" across tokens
780+
batch_size=4,
781+
show_progress=True
782+
)
783+
784+
# bias_scores maps layer names to bias tensors
785+
print(f"Analyzed {len(bias_scores)} layers")
786+
```
787+
788+
### Step 2: Compute Fairness Pruning Scores
789+
790+
```python
791+
from optipfair.bias import compute_fairness_pruning_scores
792+
793+
# Combine bias with importance
794+
fairness_scores = compute_fairness_pruning_scores(
795+
model=model,
796+
bias_scores=bias_scores,
797+
bias_weight=0.45 # Balance fairness (0.0-1.0) vs performance
798+
)
799+
800+
# fairness_scores maps layer indices to pruning score tensors
801+
# Higher scores = safer to prune (low bias + low importance)
802+
for layer_idx, scores in fairness_scores.items():
803+
safe_neurons = (scores > 0.75).sum().item()
804+
print(f"Layer {layer_idx}: {safe_neurons} neurons safe to prune")
805+
```
806+
807+
### Understanding bias_weight Parameter
808+
809+
The `bias_weight` parameter controls the trade-off between fairness and performance:
810+
811+
| bias_weight | Use Case |
812+
|-------------|----------|
813+
| **0.0** | Pure performance - ignore bias (standard pruning) |
814+
| **0.2** | Performance-critical - secondary fairness concerns |
815+
| **0.4-0.5** | **Balanced - good compression + fairness (RECOMMENDED)** |
816+
| **0.7** | Fairness-critical - reduce bias even at performance cost |
817+
| **1.0** | Pure fairness - prioritize bias reduction over all |
818+
819+
### Complete Fairness-Aware Pruning Example
820+
821+
```python
822+
from transformers import AutoModelForCausalLM, AutoTokenizer
823+
import optipfair as opf
824+
from optipfair.bias import analyze_neuron_bias, compute_fairness_pruning_scores
825+
826+
# 1. Load model
827+
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
828+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
829+
830+
# 2. Define demographic test pairs
831+
prompt_pairs = [
832+
("The Christian employee worked hard.", "The Muslim employee worked hard."),
833+
("The wealthy student studied diligently.", "The poor student studied diligently."),
834+
]
835+
836+
# 3. Analyze neuron-level bias
837+
print("Step 1: Analyzing neuron bias...")
838+
bias_scores = analyze_neuron_bias(
839+
model=model,
840+
tokenizer=tokenizer,
841+
prompt_pairs=prompt_pairs,
842+
target_layers=["gate_proj", "up_proj"],
843+
aggregation="mean",
844+
show_progress=True
845+
)
846+
847+
# 4. Compute fairness pruning scores
848+
print("Step 2: Computing fairness scores...")
849+
fairness_scores = compute_fairness_pruning_scores(
850+
model=model,
851+
bias_scores=bias_scores,
852+
bias_weight=0.45 # Balanced approach
853+
)
854+
855+
# 5. Analyze which neurons are safe to prune
856+
print("\nStep 3: Analyzing results...")
857+
for layer_idx, scores in fairness_scores.items():
858+
high_score = (scores > 0.75).sum().item()
859+
print(f"Layer {layer_idx}: {high_score} neurons are safe to prune (score > 0.75)")
860+
861+
# 6. Perform standard pruning
862+
# (Current implementation - use fairness analysis to guide understanding)
863+
print("\nStep 4: Pruning model...")
864+
pruned_model, stats = opf.prune_model(
865+
model=model,
866+
pruning_type="MLP_GLU",
867+
neuron_selection_method="MAW",
868+
pruning_percentage=15,
869+
show_progress=True,
870+
return_stats=True
871+
)
872+
873+
print(f"\nPruning complete: {stats['percentage_reduction']:.2f}% reduction")
874+
print("Next: Evaluate bias metrics to measure fairness improvement")
875+
876+
# 7. Re-evaluate bias after pruning (optional)
877+
# Re-run analyze_neuron_bias on pruned_model to compare
878+
```
879+
880+
### Practical Tips
881+
882+
**1. Choosing Prompt Pairs**
883+
884+
Create prompt pairs that:
885+
- Differ in exactly ONE demographic attribute
886+
- Are otherwise identical in structure and content
887+
- Cover the demographic dimensions you care about (gender, race, age, religion, etc.)
888+
- Use natural, realistic language
889+
890+
**2. Selecting bias_weight**
891+
892+
Start with `bias_weight=0.4-0.5` for balanced results. Adjust based on:
893+
- If performance drops too much → decrease bias_weight (e.g., 0.3)
894+
- If bias reduction is insufficient → increase bias_weight (e.g., 0.6)
895+
896+
**3. Interpreting Fairness Scores**
897+
898+
- **High scores (>0.75)**: Safe to prune - low bias AND low importance
899+
- **Medium scores (0.4-0.75)**: Moderate risk - evaluate case-by-case
900+
- **Low scores (<0.4)**: Risky to prune - high bias OR high importance
901+
902+
**4. Example Notebook**
903+
904+
For a complete working example with visualizations, see:
905+
- [examples/fairness_aware_pruning_demo.ipynb](../examples/fairness_aware_pruning_demo.ipynb)
906+
740907
---

optipfair/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
get_model_layers,
2020
)
2121

22-
__version__ = "0.2.4"
22+
__version__ = "0.3.0"
2323

2424
# Configure logging
2525
logging.basicConfig(

0 commit comments

Comments
 (0)