Feat: Add Evaluation Metrics, W&B Logging, and Interleaved Evaluation by psycoplankton · Pull Request #15 · LLM-Interp/CLT-Forge

psycoplankton · 2026-06-25T19:10:22Z

Description:

This PR introduces comprehensive evaluation metrics for CLT training and integrates them with Weights & Biases (W&B) logging. It also enables interleaved evaluation and validation during the training process.

Key Changes:

📊 Evaluation Metrics (`eval_metrics.py`)

Added the CLTEvaluator class utilizing transformer_lens.HookedTransformer.
Implemented evaluate_replacement_and_kl() to calculate Replacement Score and KL Divergence by running clean, ablated (zeroed), and transcoder replaced passes.
Implemented evaluate_sparsity_and_l0() to calculate L0 norm, training/pruning sparsity, and pruned L0 norm based on a threshold (tau).
Implemented evaluate_activation_density() to calculate mean activation density and dead feature tracking.

📈 W&B Logging Integration (`clt_trainer.py`)

Integrated detailed W&B logging for evaluation and validation phases (_evaluate_and_log_metrics).
Logs essential metrics: replacement_score, kl_divergence, l0_norm, mean_activation_density, and dead_features_ratio.
Added rich custom visualizations to W&B:
- Pareto Frontier plot (L0 Norm vs. Replacement Score).
- Pruning Sweep plot for different tau values.
- Activation Density Histogram tracking feature distribution.
- Layer Density Table for per-layer analysis.

⚙️ Trainer & Config Updates

Added new training hyperparameters in CLTTrainingRunnerConfig: eval_step_size, val_step_size, and tau_sweep_steps.
Updated CLTTrainer and CLTTrainingRunner to accept a separate val_activations_store for validation evaluations.
Modified the training loop to trigger evaluation and validation runs dynamically at specified step intervals.

🧪 Testing (`tests/test_eval_metrics.py`)

Added a robust test suite using mocked transformers and CLTs.
Validated metric functions (evaluate_replacement_and_kl, evaluate_sparsity_and_l0, evaluate_activation_density) to ensure stability.

psycoplankton added 3 commits June 22, 2026 22:28

Added Tok-K CLT Training

e121101

Added Optimization of backward pass

5a8ae03

Evaluation metrics and WAB logging

29ee575

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add Evaluation Metrics, W&B Logging, and Interleaved Evaluation#15

Feat: Add Evaluation Metrics, W&B Logging, and Interleaved Evaluation#15
psycoplankton wants to merge 3 commits into
LLM-Interp:masterfrom
psycoplankton:metrics

psycoplankton commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

psycoplankton commented Jun 25, 2026

Description:

Key Changes:

📊 Evaluation Metrics (eval_metrics.py)

📈 W&B Logging Integration (clt_trainer.py)

⚙️ Trainer & Config Updates

🧪 Testing (tests/test_eval_metrics.py)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 Evaluation Metrics (`eval_metrics.py`)

📈 W&B Logging Integration (`clt_trainer.py`)

🧪 Testing (`tests/test_eval_metrics.py`)