Skip to content

Sanity Check Fails #1

@good-epic

Description

@good-epic

I cloned the git and am trying to run it locally to figure out some dimension issues I'm having in my refactor. In order to run it locally, I changed the model to "gpt2-small", and changed:

device = "cuda"
layers= [7, 14, 21, 40]
l0s = [92, 67, 129, 125]
saes = [SAE.from_pretrained(release="gemma-scope-9b-pt-res",
                             sae_id=f"layer_{layers[i]}/width_16k/average_l0_{l0s[i]}", 
                             device=device)[0] for i in range(len(layers))]`

to

device = "cuda"
layers= [3, 5, 7, 9]
saes = [SAE.from_pretrained(release="jbloom/GPT2-Small-SAEs-Reformatted",
                            sae_id=f"blocks.{layer}.hook_resid_pre", 
                            device=device)[0] for layer in layers]

The sanity check running the first ten batches of clean_tokens through model directly through the forward function and inserting the fwd_hooks with the build_hooks_list function as the value. These give very different values. The only similarity is that the signs match. Any idea why this might not be working? I tried limiting to using only two layers, both with 99.9% of variance explained by the SAEs. But the results are still not close.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions