Interest in a `candle-mi` crate for mechanistic interpretability? #3368

PCfVW · 2026-02-13T08:32:27Z

PCfVW
Feb 13, 2026

Hi,

I've been building a mechanistic interpretability (MI) toolkit in Rust on top of candle. The project currently supports 6 model architectures (StarCoder2, Qwen2, Gemma, LLaMA, Phi-3, RWKV-6) with MI-specific operations built into the forward pass:

Attention extraction: per-layer, per-head attention patterns
Attention knockout: pre-softmax ablation with KL divergence measurement
Attention steering: post-softmax scaling with calibration and dose-response curves
Logit lens: per-layer vocabulary projection of hidden states
State knockout and steering: for recurrent architectures (RWKV-6)
Effective attention: attention-equivalent matrices derived from RWKV's recurrence
KV-cached generation with intervention: steered autoregressive generation

Everything runs on consumer hardware (tested on an RTX 5060 Ti 16GB).

As far as I can tell, no Rust crate or project currently exists for LLM mechanistic interpretability. The Python ecosystem has TransformerLens, nnsight, and pyvene, but nothing equivalent in Rust.

I'm now designing the next iteration as a publishable crate, with two main goals:

A config-driven generic transformer: one forward pass implementation covering LLaMA, Qwen2, Gemma, Mistral, Phi-3, StarCoder2, etc., parameterized by ~7 configuration axes (norm type, activation, QKV layout, MLP layout, bias, embedding scaling, LM head tying). Adding a new model family would mean writing a config parser (~30 lines), not a new forward pass.
A generic RWKV backend covering v6 and v7, with a WkvKernel trait abstracting the version-specific recurrence formula. The same pattern could eventually extend to GLA, RetNet, and other linear RNN architectures.

Both backends would have TransformerLens-style hook points built in (resid_pre, attn_scores, attn_pattern, resid_mid, mlp_out, resid_post, etc.), with zero overhead when hooks are inactive.

Future extensions include CLT (cross-layer transcoder) and SAE (sparse autoencoder) support: loading pre-trained interpretability dictionaries and injecting features into the residual stream.

My question: would candle-mi be an appropriate name for this crate, or would you prefer third-party projects not use the candle- prefix? I'd also be interested to know if this is something the candle team would consider part of the broader candle ecosystem. Happy to share the roadmap and current codebase if useful.

Thanks for candle! It's been a pleasure to work with for this kind of research. :)

-- Eric Jacopin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interest in a `candle-mi` crate for mechanistic interpretability? #3368

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Interest in a candle-mi crate for mechanistic interpretability? #3368

Uh oh!

PCfVW Feb 13, 2026

Replies: 0 comments

Interest in a `candle-mi` crate for mechanistic interpretability? #3368

PCfVW
Feb 13, 2026