Skip to content

Add open-source circuit-tracer CLT loading for CLT-Forge visualization#12

Open
HowardHsuuu wants to merge 5 commits into
LLM-Interp:masterfrom
HowardHsuuu:add-circuit-tracer-clt-loading
Open

Add open-source circuit-tracer CLT loading for CLT-Forge visualization#12
HowardHsuuu wants to merge 5 commits into
LLM-Interp:masterfrom
HowardHsuuu:add-circuit-tracer-clt-loading

Conversation

@HowardHsuuu

Copy link
Copy Markdown

Summary

This PR adds support for loading open-source circuit-tracer CLTs into CLT-Forge attribution workflows and visualizing the resulting graphs with the existing CLT-Forge visual interface.

It supports:

  • loading circuit-tracer CLTs from HuggingFace
  • loading local circuit-tracer safetensors CLTs
  • loading circuit-tracer CLTs from the circuit-tracer cache
  • converting circuit-tracer attribution graphs into the CLT-Forge frontend .pt graph schema
  • converting circuit-tracer feature metadata into the CLT-Forge frontend feature JSON layout
  • an end-to-end notebook using mntss/clt-gemma-2-2b-426k with google/gemma-2-2b

The visual interface itself is unchanged; the bridge is implemented in the library layer.

Notes

The existing CLT-Forge visual interface currently seems to hardcode the header label as GPT-2 Small; this PR does not change that UI behavior. The converted graph metadata records the actual model source, e.g. google/gemma-2-2b.

Tests

Added focused tests in tests/attribution/test_circuit_tracer_bridge.py.

These cover:

  • circuit-tracer graph conversion into the CLT-Forge frontend loader contract
  • selected feature indexing through active_features[selected_features]
  • (layer, pos, feature_idx) to (pos, layer, feature_idx) conversion
  • circuit-tracer feature metadata conversion into CLT-Forge feature JSON
  • binary feature record loading from features/index.json.gz + layer_*.bin
  • HuggingFace CLT loader dispatch through the circuit-tracer import alias

GPU smoke test on H100:

  • loaded mntss/clt-gemma-2-2b-426k
  • loaded google/gemma-2-2b
  • ran attribution for The capital of France is
  • generated a CLT-Forge-compatible attribution graph
  • converted 50 circuit-tracer feature metadata records
  • loaded the graph through the existing CLT-Forge DataLoader
  • started the existing Dash visual interface successfully
  • captured a rendered frontend screenshot
frontend_screenshot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant