vllm-interp

A fork of vLLM with support for SAE (Sparse Autoencoder) steering and activation extraction.

Base vLLM commit: a2e6fa7e

What's Added (vllm-interp)

The following files and modifications are specific to vllm-interp and not part of the original vLLM:

New Files

vllm/model_executor/models/goodfire_sae.py - Goodfire SAE setup and loading
vllm/model_executor/models/llama_models_and_saes.py - SAE configuration for different Llama models
example_scripts/async_example_llama.py - Example script for running Llama with SAE steering
local_reqs/requirements.txt - Additional requirements for SAE functionality

Modified Files

vllm/model_executor/models/llama.py - Extended with SAE steering support (see details below)
vllm/v1/worker/gpu_model_runner.py - Added steer_positions generation for steering

Key Modifications to `llama.py`

The main additions to the Llama model are:

init_sae_for_rank: Initializes and loads the SAE for each worker
forward_sae: Runs the intervention/steering using the SAE

The forward method is extended with two additional arguments:

interventions: A list of intervention inputs with featureID and strength
steer_positions: Generated in gpu_model_runner.py (not required when sending requests to the engine)

Key code locations:

Forward signature extension: vllm/model_executor/models/llama.py:L722
Output modification: vllm/model_executor/models/llama.py:L509
Sparse output handling: vllm/model_executor/models/llama.py:L356 and L367
Final layer tensor subtraction: vllm/model_executor/models/llama.py:L536

The intervention_enabled flag determines the steering layer, while feature_enabled determines the layer for reading features. Goodfire SAEs have different steering and feature layers (see llama_models_and_saes.py).

Installation

Prerequisites: Environment with CUDA 12.8 (vLLM is compiled with CUDA 12.8).

1. Prepare the environment

apt-get update -y
apt-get install python3.12 python3.12-venv -y
apt-get install python3-dev build-essential -y
apt-get install ninja-build cmake jq zip -y
apt-get install -y python3.12-dev build-essential ninja-build
apt install -y build-essential
apt install -y libstdc++-12-dev libc6-dev
apt install -y gcc g++ gcc-multilib g++-multilib

2. Create and activate virtualenv

python3.12 -m venv /tmp/vllm_env
source /tmp/vllm_env/bin/activate

3. Install sae_lens first (uses outdated numpy version)

pip install sae_lens==6.13.0

4. Install build packages

pip install pip wheel setuptools_scm setuptools --upgrade

5. Install requirements

pip install -r local_reqs/requirements.txt

6. Install vllm-interp as editable package with pre-built libraries

export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/a2e6fa7e035ff058fc37fdaaf014707efff2fcf3/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
pip install --editable .

The wheel URL corresponds to the base vLLM commit that vllm-interp is forked from.

Usage

Environment Variables

Required for attention backend (especially for Gemma models):

export VLLM_ATTENTION_BACKEND=FLASHINFER
export VLLM_FLASH_ATTN_VERSION=3

Set your Hugging Face token to pull Llama models:

export HF_TOKEN=your_token_here

Example Script

See example_scripts/async_example_llama.py for a complete example with batched steering inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.buildkite		.buildkite
.gemini		.gemini
.github		.github
benchmarks		benchmarks
cmake		cmake
container		container
csrc		csrc
docker		docker
docs		docs
example_scripts		example_scripts
examples		examples
local_reqs		local_reqs
requirements		requirements
tests		tests
tools		tools
vllm		vllm
.clang-format		.clang-format
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.shellcheckrc		.shellcheckrc
.yapfignore		.yapfignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
format.sh		format.sh
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py
use_existing_torch.py		use_existing_torch.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vllm-interp

What's Added (vllm-interp)

New Files

Modified Files

Key Modifications to `llama.py`

Installation

1. Prepare the environment

2. Create and activate virtualenv

3. Install sae_lens first (uses outdated numpy version)

4. Install build packages

5. Install requirements

6. Install vllm-interp as editable package with pre-built libraries

Usage

Environment Variables

Example Script

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

agencyenterprise/vllm-interp

Folders and files

Latest commit

History

Repository files navigation

vllm-interp

What's Added (vllm-interp)

New Files

Modified Files

Key Modifications to llama.py

Installation

1. Prepare the environment

2. Create and activate virtualenv

3. Install sae_lens first (uses outdated numpy version)

4. Install build packages

5. Install requirements

6. Install vllm-interp as editable package with pre-built libraries

Usage

Environment Variables

Example Script

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Key Modifications to `llama.py`

Packages