You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been building a mechanistic interpretability (MI) toolkit in Rust on top of candle. The project currently supports 6 model architectures (StarCoder2, Qwen2, Gemma, LLaMA, Phi-3, RWKV-6) with MI-specific operations built into the forward pass:
Attention knockout: pre-softmax ablation with KL divergence measurement
Attention steering: post-softmax scaling with calibration and dose-response curves
Logit lens: per-layer vocabulary projection of hidden states
State knockout and steering: for recurrent architectures (RWKV-6)
Effective attention: attention-equivalent matrices derived from RWKV's recurrence
KV-cached generation with intervention: steered autoregressive generation
Everything runs on consumer hardware (tested on an RTX 5060 Ti 16GB).
As far as I can tell, no Rust crate or project currently exists for LLM mechanistic interpretability. The Python ecosystem has TransformerLens, nnsight, and pyvene, but nothing equivalent in Rust.
I'm now designing the next iteration as a publishable crate, with two main goals:
A config-driven generic transformer: one forward pass implementation covering LLaMA, Qwen2, Gemma, Mistral, Phi-3, StarCoder2, etc., parameterized by ~7 configuration axes (norm type, activation, QKV layout, MLP layout, bias, embedding scaling, LM head tying). Adding a new model family would mean writing a config parser (~30 lines), not a new forward pass.
A generic RWKV backend covering v6 and v7, with a WkvKernel trait abstracting the version-specific recurrence formula. The same pattern could eventually extend to GLA, RetNet, and other linear RNN architectures.
Both backends would have TransformerLens-style hook points built in (resid_pre, attn_scores, attn_pattern, resid_mid, mlp_out, resid_post, etc.), with zero overhead when hooks are inactive.
Future extensions include CLT (cross-layer transcoder) and SAE (sparse autoencoder) support: loading pre-trained interpretability dictionaries and injecting features into the residual stream.
My question: would candle-mi be an appropriate name for this crate, or would you prefer third-party projects not use the candle- prefix? I'd also be interested to know if this is something the candle team would consider part of the broader candle ecosystem. Happy to share the roadmap and current codebase if useful.
Thanks for candle! It's been a pleasure to work with for this kind of research. :)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've been building a mechanistic interpretability (MI) toolkit in Rust on top of candle. The project currently supports 6 model architectures (StarCoder2, Qwen2, Gemma, LLaMA, Phi-3, RWKV-6) with MI-specific operations built into the forward pass:
Everything runs on consumer hardware (tested on an RTX 5060 Ti 16GB).
As far as I can tell, no Rust crate or project currently exists for LLM mechanistic interpretability. The Python ecosystem has TransformerLens, nnsight, and pyvene, but nothing equivalent in Rust.
I'm now designing the next iteration as a publishable crate, with two main goals:
A config-driven generic transformer: one forward pass implementation covering LLaMA, Qwen2, Gemma, Mistral, Phi-3, StarCoder2, etc., parameterized by ~7 configuration axes (norm type, activation, QKV layout, MLP layout, bias, embedding scaling, LM head tying). Adding a new model family would mean writing a config parser (~30 lines), not a new forward pass.
A generic RWKV backend covering v6 and v7, with a
WkvKerneltrait abstracting the version-specific recurrence formula. The same pattern could eventually extend to GLA, RetNet, and other linear RNN architectures.Both backends would have TransformerLens-style hook points built in (resid_pre, attn_scores, attn_pattern, resid_mid, mlp_out, resid_post, etc.), with zero overhead when hooks are inactive.
Future extensions include CLT (cross-layer transcoder) and SAE (sparse autoencoder) support: loading pre-trained interpretability dictionaries and injecting features into the residual stream.
My question: would
candle-mibe an appropriate name for this crate, or would you prefer third-party projects not use thecandle-prefix? I'd also be interested to know if this is something the candle team would consider part of the broader candle ecosystem. Happy to share the roadmap and current codebase if useful.Thanks for candle! It's been a pleasure to work with for this kind of research. :)
-- Eric Jacopin.
Beta Was this translation helpful? Give feedback.
All reactions