A fork of vLLM with support for SAE (Sparse Autoencoder) steering and activation extraction.
Base vLLM commit: a2e6fa7e
The following files and modifications are specific to vllm-interp and not part of the original vLLM:
vllm/model_executor/models/goodfire_sae.py- Goodfire SAE setup and loadingvllm/model_executor/models/llama_models_and_saes.py- SAE configuration for different Llama modelsexample_scripts/async_example_llama.py- Example script for running Llama with SAE steeringlocal_reqs/requirements.txt- Additional requirements for SAE functionality
vllm/model_executor/models/llama.py- Extended with SAE steering support (see details below)vllm/v1/worker/gpu_model_runner.py- Addedsteer_positionsgeneration for steering
The main additions to the Llama model are:
init_sae_for_rank: Initializes and loads the SAE for each workerforward_sae: Runs the intervention/steering using the SAE
The forward method is extended with two additional arguments:
interventions: A list of intervention inputs with featureID and strengthsteer_positions: Generated ingpu_model_runner.py(not required when sending requests to the engine)
Key code locations:
- Forward signature extension:
vllm/model_executor/models/llama.py:L722 - Output modification:
vllm/model_executor/models/llama.py:L509 - Sparse output handling:
vllm/model_executor/models/llama.py:L356andL367 - Final layer tensor subtraction:
vllm/model_executor/models/llama.py:L536
The intervention_enabled flag determines the steering layer, while feature_enabled determines the layer for reading features. Goodfire SAEs have different steering and feature layers (see llama_models_and_saes.py).
Prerequisites: Environment with CUDA 12.8 (vLLM is compiled with CUDA 12.8).
apt-get update -y
apt-get install python3.12 python3.12-venv -y
apt-get install python3-dev build-essential -y
apt-get install ninja-build cmake jq zip -y
apt-get install -y python3.12-dev build-essential ninja-build
apt install -y build-essential
apt install -y libstdc++-12-dev libc6-dev
apt install -y gcc g++ gcc-multilib g++-multilibpython3.12 -m venv /tmp/vllm_env
source /tmp/vllm_env/bin/activatepip install sae_lens==6.13.0pip install pip wheel setuptools_scm setuptools --upgradepip install -r local_reqs/requirements.txtexport VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/a2e6fa7e035ff058fc37fdaaf014707efff2fcf3/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
pip install --editable .The wheel URL corresponds to the base vLLM commit that vllm-interp is forked from.
Required for attention backend (especially for Gemma models):
export VLLM_ATTENTION_BACKEND=FLASHINFER
export VLLM_FLASH_ATTN_VERSION=3Set your Hugging Face token to pull Llama models:
export HF_TOKEN=your_token_hereSee example_scripts/async_example_llama.py for a complete example with batched steering inputs.