Skip to content

Open Source XAI - Circuit tracer for Causal Explainability #200

@virajsharma2000

Description

@virajsharma2000

In this talk, I propose to discuss the problem of building explainable AI with the two approaches - causal vs correlational.

I will talk about what mech interp is in large language models like Gemma. It's a way to understand how models answer questions by looking inside them and checking which neurons activate when.

I will discuss the Anthropic's open sourced a python module - circuit-tracer, and also the Neuronpedia portal , helps us find neurons linked to real-world concepts. We will examine specific prompts on transformers and understand the various paths and thoughts that make use reach the output. (It is veery interesting - for me)

I will also talk about my own work on mech interp tooling (modelrecon) - with "activation cube" data structure (this is not a standard - I came up with it) as a means to share and visualize activation data. And also the "counterfactual" library that I am working on to correctly implement intervention testing

My code:

https://github.com/modelrecon

My slides:

https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions