From 7c1d1419e4e0855da0ea67f8e59dc6099d18b311 Mon Sep 17 00:00:00 2001 From: degenfabian Date: Mon, 18 Aug 2025 19:19:48 +0200 Subject: [PATCH] updated loading in exploratory analysis demo to use transformer bridge --- demos/Exploratory_Analysis_Demo.ipynb | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/demos/Exploratory_Analysis_Demo.ipynb b/demos/Exploratory_Analysis_Demo.ipynb index d7e29f11d..b12304844 100644 --- a/demos/Exploratory_Analysis_Demo.ipynb +++ b/demos/Exploratory_Analysis_Demo.ipynb @@ -100,7 +100,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -118,7 +118,8 @@ "from jaxtyping import Float\n", "\n", "import transformer_lens.utils as utils\n", - "from transformer_lens import ActivationCache, HookedTransformer" + "from transformer_lens import ActivationCache\n", + "from transformer_lens.model_bridge import TransformerBridge" ] }, { @@ -245,12 +246,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The first step is to load in our model, GPT-2 Small, a 12 layer and 80M parameter transformer with `HookedTransformer.from_pretrained`. The various flags are simplifications that preserve the model's output but simplify its internals." + "The first step is to load in our model, GPT-2 Small, a 12 layer and 80M parameter transformer with `TransformerBridge.boot_transformers`. The various flags are simplifications that preserve the model's output but simplify its internals." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -270,13 +271,14 @@ ], "source": [ "# NBVAL_IGNORE_OUTPUT\n", - "model = HookedTransformer.from_pretrained(\n", - " \"gpt2-small\",\n", + "model = TransformerBridge.boot_transformers(\n", + " \"gpt2\",\n", " center_unembed=True,\n", " center_writing_weights=True,\n", " fold_ln=True,\n", " refactor_factored_attn_matrices=True,\n", ")\n", + "model.enable_compatibility_mode()\n", "\n", "# Get the default device used\n", "device: torch.device = utils.get_device()" @@ -372,7 +374,7 @@ "\n", "We want models that can take in arbitrary text, but models need to have a fixed vocabulary. So the solution is to define a vocabulary of **tokens** and to deterministically break up arbitrary text into tokens. Tokens are, essentially, subwords, and are determined by finding the most frequent substrings - this means that tokens vary a lot in length and frequency! \n", "\n", - "Tokens are a *massive* headache and are one of the most annoying things about reverse engineering language models... Different names will be different numbers of tokens, different prompts will have the relevant tokens at different positions, different prompts will have different total numbers of tokens, etc. Language models often devote significant amounts of parameters in early layers to convert inputs from tokens to a more sensible internal format (and do the reverse in later layers). You really, really want to avoid needing to think about tokenization wherever possible when doing exploratory analysis (though, of course, it's relevant later when trying to flesh out your analysis and make it rigorous!). HookedTransformer comes with several helper methods to deal with tokens: `to_tokens, to_string, to_str_tokens, to_single_token, get_token_position`\n", + "Tokens are a *massive* headache and are one of the most annoying things about reverse engineering language models... Different names will be different numbers of tokens, different prompts will have the relevant tokens at different positions, different prompts will have different total numbers of tokens, etc. Language models often devote significant amounts of parameters in early layers to convert inputs from tokens to a more sensible internal format (and do the reverse in later layers). You really, really want to avoid needing to think about tokenization wherever possible when doing exploratory analysis (though, of course, it's relevant later when trying to flesh out your analysis and make it rigorous!). TransformerBridge comes with several helper methods to deal with tokens: `to_tokens, to_string, to_str_tokens, to_single_token, get_token_position`\n", "\n", "**Exercise:** I recommend using `model.to_str_tokens` to explore how the model tokenizes different strings. In particular, try adding or removing spaces at the start, or changing capitalization - these change tokenization!" ]