Skip to content

Feat: Add Evaluation Metrics, W&B Logging, and Interleaved Evaluation#15

Open
psycoplankton wants to merge 3 commits into
LLM-Interp:masterfrom
psycoplankton:metrics
Open

Feat: Add Evaluation Metrics, W&B Logging, and Interleaved Evaluation#15
psycoplankton wants to merge 3 commits into
LLM-Interp:masterfrom
psycoplankton:metrics

Conversation

@psycoplankton

Copy link
Copy Markdown

Description:

This PR introduces comprehensive evaluation metrics for CLT training and integrates them with Weights & Biases (W&B) logging. It also enables interleaved evaluation and validation during the training process.

Key Changes:

📊 Evaluation Metrics (eval_metrics.py)

  • Added the CLTEvaluator class utilizing transformer_lens.HookedTransformer.
  • Implemented evaluate_replacement_and_kl() to calculate Replacement Score and KL Divergence by running clean, ablated (zeroed), and transcoder replaced passes.
  • Implemented evaluate_sparsity_and_l0() to calculate L0 norm, training/pruning sparsity, and pruned L0 norm based on a threshold (tau).
  • Implemented evaluate_activation_density() to calculate mean activation density and dead feature tracking.

📈 W&B Logging Integration (clt_trainer.py)

  • Integrated detailed W&B logging for evaluation and validation phases (_evaluate_and_log_metrics).
  • Logs essential metrics: replacement_score, kl_divergence, l0_norm, mean_activation_density, and dead_features_ratio.
  • Added rich custom visualizations to W&B:
    • Pareto Frontier plot (L0 Norm vs. Replacement Score).
    • Pruning Sweep plot for different tau values.
    • Activation Density Histogram tracking feature distribution.
    • Layer Density Table for per-layer analysis.

⚙️ Trainer & Config Updates

  • Added new training hyperparameters in CLTTrainingRunnerConfig: eval_step_size, val_step_size, and tau_sweep_steps.
  • Updated CLTTrainer and CLTTrainingRunner to accept a separate val_activations_store for validation evaluations.
  • Modified the training loop to trigger evaluation and validation runs dynamically at specified step intervals.

🧪 Testing (tests/test_eval_metrics.py)

  • Added a robust test suite using mocked transformers and CLTs.
  • Validated metric functions (evaluate_replacement_and_kl, evaluate_sparsity_and_l0, evaluate_activation_density) to ensure stability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant