vllm-interp

A fork of vLLM with support for SAE (Sparse Autoencoder) steering and activation extraction.

Base vLLM commit: a2e6fa7e

What's Added (vllm-interp)

The following files and modifications are specific to vllm-interp and not part of the original vLLM:

New Files

vllm/model_executor/models/goodfire_sae.py - Goodfire SAE setup and loading
vllm/model_executor/models/llama_models_and_saes.py - SAE configuration for different Llama models
example_scripts/async_example_llama.py - Example script for running Llama with SAE steering
local_reqs/requirements.txt - Additional requirements for SAE functionality

Modified Files

vllm/model_executor/models/llama.py - Extended with SAE steering support (see details below)
vllm/v1/worker/gpu_model_runner.py - Added steer_positions generation for steering

Key Modifications to `llama.py`

The main additions to the Llama model are:

init_sae_for_rank: Initializes and loads the SAE for each worker
forward_sae: Runs the intervention/steering using the SAE

The forward method is extended with two additional arguments:

interventions: A list of intervention inputs with featureID and strength
steer_positions: Generated in gpu_model_runner.py (not required when sending requests to the engine)

Key code locations:

Forward signature extension: vllm/model_executor/models/llama.py:L722
Output modification: vllm/model_executor/models/llama.py:L509
Sparse output handling: vllm/model_executor/models/llama.py:L356 and L367
Final layer tensor subtraction: vllm/model_executor/models/llama.py:L536

The intervention_enabled flag determines the steering layer, while feature_enabled determines the layer for reading features. Goodfire SAEs have different steering and feature layers (see llama_models_and_saes.py).

Installation

Prerequisites: Environment with CUDA 12.8 (vLLM is compiled with CUDA 12.8).

1. Prepare the environment

apt-get update -y
apt-get install python3.12 python3.12-venv -y
apt-get install python3-dev build-essential -y
apt-get install ninja-build cmake jq zip -y
apt-get install -y python3.12-dev build-essential ninja-build
apt install -y build-essential
apt install -y libstdc++-12-dev libc6-dev
apt install -y gcc g++ gcc-multilib g++-multilib

2. Create and activate virtualenv

python3.12 -m venv /tmp/vllm_env
source /tmp/vllm_env/bin/activate

3. Install sae_lens first (uses outdated numpy version)

pip install sae_lens==6.13.0

4. Install build packages

pip install pip wheel setuptools_scm setuptools --upgrade

5. Install requirements

pip install -r local_reqs/requirements.txt

6. Install vllm-interp as editable package with pre-built libraries

export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/a2e6fa7e035ff058fc37fdaaf014707efff2fcf3/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
pip install --editable .

The wheel URL corresponds to the base vLLM commit that vllm-interp is forked from.

Usage

Environment Variables

Required for attention backend (especially for Gemma models):

export VLLM_ATTENTION_BACKEND=FLASHINFER
export VLLM_FLASH_ATTN_VERSION=3

Set your Hugging Face token to pull Llama models:

export HF_TOKEN=your_token_here

Example Script

See example_scripts/async_example_llama.py for a complete example with batched steering inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm-interp

What's Added (vllm-interp)

New Files

Modified Files

Key Modifications to `llama.py`

Installation

1. Prepare the environment

2. Create and activate virtualenv

3. Install sae_lens first (uses outdated numpy version)

4. Install build packages

5. Install requirements

6. Install vllm-interp as editable package with pre-built libraries

Usage

Environment Variables

Example Script

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

vllm-interp

What's Added (vllm-interp)

New Files

Modified Files

Key Modifications to llama.py

Installation

1. Prepare the environment

2. Create and activate virtualenv

3. Install sae_lens first (uses outdated numpy version)

4. Install build packages

5. Install requirements

6. Install vllm-interp as editable package with pre-built libraries

Usage

Environment Variables

Example Script

Key Modifications to `llama.py`