Release v0.3.0: Fairness-Aware Pruning

peremartra · peremartra · commit ca9d69e108ba · 2026-03-02T17:02:53.000+01:00
- Add analyze_neuron_bias() for per-neuron bias analysis
- Add compute_fairness_pruning_scores() for balanced pruning
- Enhanced documentation with comprehensive fairness examples
- Updated API reference and usage guides
- Improved prune_model_mlp_glu compatibility
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,47 @@
 
 ---
 
+## [0.3.0] - 2026-03-02
+
+### 🎉 New Features
+
+#### Fairness-Aware Pruning
+- **New Function**: `analyze_neuron_bias()` - Analyze per-neuron bias contributions across multiple demographic prompt pairs
+  - Computes activation-based bias scores for individual neurons
+  - Supports multiple aggregation methods (mean, max) across sequence positions
+  - Works with GLU architecture MLP layers (gate_proj, up_proj)
+- **New Function**: `compute_fairness_pruning_scores()` - Combine bias and importance scores for balanced pruning
+  - Configurable `bias_weight` parameter (0.0 to 1.0) to adjust fairness vs. performance trade-offs
+  - Returns fairness pruning scores for each layer
+  - Enables fairness-aware neuron selection strategies
+
+#### Enhanced Pruning Integration
+- **Modified**: `prune_model_mlp_glu()` - Improved compatibility with fairness-aware workflows
+- **Documentation**: Added comprehensive fairness-aware pruning guide with examples
+
+### 📚 Documentation Enhancements
+
+#### New Fairness-Aware Pruning Section
+- Complete guide to fairness-aware pruning workflow with:
+  - Step-by-step tutorial for `analyze_neuron_bias()`
+  - Step-by-step tutorial for `compute_fairness_pruning_scores()`
+  - Understanding the bias_weight parameter with recommended configurations
+  - Complete end-to-end example combining bias analysis with pruning
+  - Common patterns for fairness-aware analysis
+- New example notebook: `fairness_aware_pruning_demo.ipynb`
+
+#### Updated API Documentation
+- Added `analyze_neuron_bias()` to API reference
+- Added `compute_fairness_pruning_scores()` to API reference
+- Enhanced usage guide with fairness workflows
+
+### 🧪 Testing & Quality
+- Compatible with existing pruning functionality
+- No breaking changes to existing API
+- All existing tests remain passing
+
+---
+
 ## [0.2.4] - 2026-01-10
 
 ### 🎉 New Features
diff --git a/docs/api.md b/docs/api.md
@@ -231,6 +231,123 @@ def calculate_bias_metrics(act1: Dict[str, torch.Tensor], act2: Dict[str, torch.
     """
 ```
 
+## Fairness-Aware Pruning Analysis (NEW in v0.3.0)
+
+### `analyze_neuron_bias`
+
+```python
+def analyze_neuron_bias(
+    model: PreTrainedModel,
+    tokenizer: Any,
+    prompt_pairs: List[Tuple[str, str]],
+    target_layers: List[str] = None,
+    aggregation: str = "mean",
+    batch_size: int = 4,
+    show_progress: bool = True,
+) -> Dict[str, torch.Tensor]:
+    """
+    Analyze bias contributions at the individual neuron level across multiple demographic prompt pairs.
+    
+    Computes per-neuron bias scores by comparing activations across prompt pairs that differ
+    only in demographic attributes. This enables identification of which specific neurons
+    contribute most to bias in the model.
+    
+    Args:
+        model: HuggingFace PreTrainedModel with GLU architecture (e.g., LLaMA, Mistral)
+        tokenizer: Matching tokenizer for encoding prompts
+        prompt_pairs: List of (prompt1, prompt2) tuples where each pair differs only in
+                      the demographic attribute being tested (e.g., gender, ethnicity)
+        target_layers: List of layer projection types to analyze. Options:
+                      ["gate_proj", "up_proj"] (both), or subsets
+                      Default: ["gate_proj", "up_proj"]
+        aggregation: How to aggregate bias across sequence positions:
+                    - "mean": Average bias across all tokens (default, more stable)
+                    - "max": Maximum bias across any token position (more sensitive)
+        batch_size: Batch size for processing prompt pairs. Adjust based on GPU memory.
+        show_progress: Whether to display progress bar
+        
+    Returns:
+        Dict[str, torch.Tensor]: Mapping of layer names to bias score tensors
+        Example: {"gate_proj_layer_5": tensor([...]), "up_proj_layer_5": tensor([...]), ...}
+        where each tensor has shape [intermediate_hidden_dim] with bias scores for each neuron
+        
+    Example:
+        >>> from optipfair.bias import analyze_neuron_bias
+        >>> import optipfair as opf
+        >>> 
+        >>> model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
+        >>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
+        >>> 
+        >>> prompt_pairs = [
+        ...     ("The male nurse helped the patient.", "The female nurse helped the patient."),
+        ...     ("White scientist discovered X.", "Black scientist discovered X."),
+        ... ]
+        >>> 
+        >>> bias_scores = analyze_neuron_bias(
+        ...     model=model,
+        ...     tokenizer=tokenizer,
+        ...     prompt_pairs=prompt_pairs,
+        ...     target_layers=["gate_proj", "up_proj"],
+        ...     aggregation="mean"
+        ... )
+    """
+```
+
+### `compute_fairness_pruning_scores`
+
+```python
+def compute_fairness_pruning_scores(
+    model: PreTrainedModel,
+    bias_scores: Dict[str, torch.Tensor],
+    bias_weight: float = 0.4,
+) -> Dict[int, torch.Tensor]:
+    """
+    Compute fairness-aware pruning scores by combining bias and importance metrics.
+    
+    Creates balanced pruning scores that account for both:
+    1. **Bias**: Neurons with high bias should be preserved (not pruned)
+    2. **Importance**: Neurons that are unimportant should be pruned
+    
+    The resulting scores indicate which neurons are "safe" to prune:
+    High scores = low bias + low importance = good to prune
+    Low scores = high bias or high importance = risky to prune
+    
+    Args:
+        model: HuggingFace PreTrainedModel with GLU architecture
+        bias_scores: Dictionary returned by analyze_neuron_bias()
+                    Format: Dict[str, torch.Tensor] with layer names as keys
+        bias_weight: Weight to balance fairness vs. performance (float, 0.0 to 1.0):
+                    - 0.0: Pure importance (standard pruning, ignore bias)
+                    - 0.4-0.5: Balanced (RECOMMENDED) - good compression + fairness
+                    - 0.7: Fairness-critical - prioritize reducing bias
+                    - 1.0: Pure bias (preserve important but biased neurons)
+        
+    Returns:
+        Dict[int, torch.Tensor]: Mapping of layer indices to fairness pruning scores
+        Example: {0: tensor([...]), 1: tensor([...]), ...}
+        where each tensor has shape [intermediate_hidden_dim]
+        
+    Raises:
+        ValueError: If bias_weight is not in [0.0, 1.0] range
+        ValueError: If bias_scores format is invalid or missing expected layers
+        
+    Example:
+        >>> from optipfair.bias import compute_fairness_pruning_scores
+        >>> 
+        >>> # After computing bias_scores with analyze_neuron_bias()
+        >>> fairness_scores = compute_fairness_pruning_scores(
+        ...     model=model,
+        ...     bias_scores=bias_scores,
+        ...     bias_weight=0.45  # Balanced approach
+        ... )
+        >>> 
+        >>> # Identify neurons safe to prune
+        >>> for layer_idx, scores in fairness_scores.items():
+        ...     safe_neurons = (scores > 0.75).sum().item()
+        ...     print(f"Layer {layer_idx}: {safe_neurons} safe neurons")
+    """
+```
+
 ## Pruning Module
 
 ### MLP GLU Pruning
diff --git a/docs/usage.md b/docs/usage.md
@@ -737,4 +737,171 @@ When using tuple or list batches, elements are automatically mapped to standard
 
 **Note**: All formats are fully backward compatible. Existing code continues to work without modifications.
 
+---
+
+## Fairness-Aware Pruning (NEW in v0.3.0)
+
+OptiPFair v0.3.0 introduces fairness-aware pruning, which combines bias analysis with pruning decisions to create models that are both smaller and potentially less biased.
+
+### Overview
+
+Traditional pruning focuses solely on minimizing performance loss. Fairness-aware pruning adds an additional dimension: identifying and potentially removing neurons that contribute to demographic bias.
+
+The workflow consists of two main steps:
+
+1. **Analyze Neuron Bias**: Identify which neurons contribute most to bias across demographic groups
+2. **Compute Fairness Scores**: Combine bias scores with importance scores for balanced pruning decisions
+
+### Step 1: Analyze Neuron Bias
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import optipfair as opf
+from optipfair.bias import analyze_neuron_bias
+
+# Load model
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
+
+# Define prompt pairs that differ only in demographic attributes
+prompt_pairs = [
+    ("The male nurse was helpful.", "The female nurse was helpful."),
+    ("White doctor examined the patient.", "Black doctor examined the patient."),
+    ("Young engineer designed the system.", "Old engineer designed the system."),
+]
+
+# Analyze per-neuron bias
+bias_scores = analyze_neuron_bias(
+    model=model,
+    tokenizer=tokenizer,
+    prompt_pairs=prompt_pairs,
+    target_layers=["gate_proj", "up_proj"],  # Analyze these MLP components
+    aggregation="mean",                       # "mean" or "max" across tokens
+    batch_size=4,
+    show_progress=True
+)
+
+# bias_scores maps layer names to bias tensors
+print(f"Analyzed {len(bias_scores)} layers")
+```
+
+### Step 2: Compute Fairness Pruning Scores
+
+```python
+from optipfair.bias import compute_fairness_pruning_scores
+
+# Combine bias with importance
+fairness_scores = compute_fairness_pruning_scores(
+    model=model,
+    bias_scores=bias_scores,
+    bias_weight=0.45  # Balance fairness (0.0-1.0) vs performance
+)
+
+# fairness_scores maps layer indices to pruning score tensors
+# Higher scores = safer to prune (low bias + low importance)
+for layer_idx, scores in fairness_scores.items():
+    safe_neurons = (scores > 0.75).sum().item()
+    print(f"Layer {layer_idx}: {safe_neurons} neurons safe to prune")
+```
+
+### Understanding bias_weight Parameter
+
+The `bias_weight` parameter controls the trade-off between fairness and performance:
+
+| bias_weight | Use Case |
+|-------------|----------|
+| **0.0** | Pure performance - ignore bias (standard pruning) |
+| **0.2** | Performance-critical - secondary fairness concerns |
+| **0.4-0.5** | **Balanced - good compression + fairness (RECOMMENDED)** |
+| **0.7** | Fairness-critical - reduce bias even at performance cost |
+| **1.0** | Pure fairness - prioritize bias reduction over all |
+
+### Complete Fairness-Aware Pruning Example
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import optipfair as opf
+from optipfair.bias import analyze_neuron_bias, compute_fairness_pruning_scores
+
+# 1. Load model
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
+
+# 2. Define demographic test pairs
+prompt_pairs = [
+    ("The Christian employee worked hard.", "The Muslim employee worked hard."),
+    ("The wealthy student studied diligently.", "The poor student studied diligently."),
+]
+
+# 3. Analyze neuron-level bias
+print("Step 1: Analyzing neuron bias...")
+bias_scores = analyze_neuron_bias(
+    model=model,
+    tokenizer=tokenizer,
+    prompt_pairs=prompt_pairs,
+    target_layers=["gate_proj", "up_proj"],
+    aggregation="mean",
+    show_progress=True
+)
+
+# 4. Compute fairness pruning scores
+print("Step 2: Computing fairness scores...")
+fairness_scores = compute_fairness_pruning_scores(
+    model=model,
+    bias_scores=bias_scores,
+    bias_weight=0.45  # Balanced approach
+)
+
+# 5. Analyze which neurons are safe to prune
+print("\nStep 3: Analyzing results...")
+for layer_idx, scores in fairness_scores.items():
+    high_score = (scores > 0.75).sum().item()
+    print(f"Layer {layer_idx}: {high_score} neurons are safe to prune (score > 0.75)")
+
+# 6. Perform standard pruning
+# (Current implementation - use fairness analysis to guide understanding)
+print("\nStep 4: Pruning model...")
+pruned_model, stats = opf.prune_model(
+    model=model,
+    pruning_type="MLP_GLU",
+    neuron_selection_method="MAW",
+    pruning_percentage=15,
+    show_progress=True,
+    return_stats=True
+)
+
+print(f"\nPruning complete: {stats['percentage_reduction']:.2f}% reduction")
+print("Next: Evaluate bias metrics to measure fairness improvement")
+
+# 7. Re-evaluate bias after pruning (optional)
+# Re-run analyze_neuron_bias on pruned_model to compare
+```
+
+### Practical Tips
+
+**1. Choosing Prompt Pairs**
+
+Create prompt pairs that:
+- Differ in exactly ONE demographic attribute
+- Are otherwise identical in structure and content
+- Cover the demographic dimensions you care about (gender, race, age, religion, etc.)
+- Use natural, realistic language
+
+**2. Selecting bias_weight**
+
+Start with `bias_weight=0.4-0.5` for balanced results. Adjust based on:
+- If performance drops too much → decrease bias_weight (e.g., 0.3)
+- If bias reduction is insufficient → increase bias_weight (e.g., 0.6)
+
+**3. Interpreting Fairness Scores**
+
+- **High scores (>0.75)**: Safe to prune - low bias AND low importance
+- **Medium scores (0.4-0.75)**: Moderate risk - evaluate case-by-case  
+- **Low scores (<0.4)**: Risky to prune - high bias OR high importance
+
+**4. Example Notebook**
+
+For a complete working example with visualizations, see:
+- [examples/fairness_aware_pruning_demo.ipynb](../examples/fairness_aware_pruning_demo.ipynb)
+
 ---
diff --git a/optipfair/__init__.py b/optipfair/__init__.py
@@ -19,7 +19,7 @@
     get_model_layers,
 )
 
-__version__ = "0.2.4"
+__version__ = "0.3.0"
 
 # Configure logging
 logging.basicConfig(
diff --git a/optipfair_llm_reference_manual.txt b/optipfair_llm_reference_manual.txt
diff --git a/setup.py b/setup.py

Original file line number	Diff line number	Diff line change
`@@ -19,7 +19,7 @@`
`19`	`19`	`get_model_layers,`
`20`	`20`	`)`
`21`	`21`
`22`		`-__version__ = "0.2.4"`
	`22`	`+__version__ = "0.3.0"`
`23`	`23`
`24`	`24`	`# Configure logging`
`25`	`25`	`logging.basicConfig(`