Command-V: Pasting LLM Behaviors via Activation Profiles

Finetune once, use on many LLMs. ⌘V is the first to demonstrate that you can "copy-paste" a finetuned adapter (for refusal suppression, jailbreaking resistance, chain-of-thought reasoning, etc.) learned by one LLM to another of a different size or architecture, without backpropagation.

How it works

⌘V leverages the fact that all transformers LLM activations (intermediate representations) can be intervened for behaviors using PEFT and approximated from another LLM. So to get it running:

Profile activations on both models using a small set of prompts
Create converters that map between model representation spaces
Transfer behaviors by applying the donor model's learned interventions to the recipient

See details in our paper.

⌘V Highlights

Low Resource: No backpropagation. No expert data. Low GPU memory usage.
Fast: Minutes to set up, not hours.
Cross-Family: In certain conditions, works between Llama, Qwen, and other model families

🪧 Demo

Check out the complete pipeline (./command_v_demo.ipynb) or on Colab:

🎯 Quick Start

Installation

git clone https://github.com/GithuBarry/Command-V.git
cd Command-V
pip install -r requirements.txt

Three-Step Pipeline

# Profile models on LIMA dataset (once per model)
python step1_capture_activations.py --models meta-llama/Llama-3.2-1B-Instruct meta-llama/Llama-3.1-8B-Instruct

# Learn activation space mappings (takes seconds)
# (Mappings are created on the fly in practice, making this step optional.)
python step2_derive_converters.py derive \
  --source-model meta-llama/Llama-3.1-8B-Instruct \
  --target-model meta-llama/Llama-3.2-1B-Instruct

# Transfer behaviors during inference
python step3_commandv_inference.py \
  --recipient-model meta-llama/Llama-3.2-1B-Instruct \
  --donor-model meta-llama/Llama-3.1-8B-Instruct \
  --adapter-folder reft-adapters/jailbreak/Llama-3.1-8B-Instruct/NodireftIntervention/l1/walledai--AdvBench/L0;2;4;6;8;10;12;14;16;18;20;22;24;26;28;30 \
  --input-source prompts/AdvBench/test.txt \
  --first-n 5 --print-output-only

📁 Project Structure

├── command_v_demo.ipynb         # Jupyter Demo
├── step1_capture_activations.py # Step 1: Activation profiling
├── step2_derive_converters.py   # Step 2: Converter derivation  
├── step3_commandv_inference.py  # Step 3: Behavioral transfer
├── commandv/                    # Core library
│   ├── core/                    # Main functionality
│   │   ├── capture.py           # Activation capture
│   │   ├── converters.py        # Pseudoinverse converters
│   │   └── inference.py         # Inference engine
│   ├── utils/                   # Utilities
│   └── data/                    # Data processing
├── outputs/                     # Generated files
│   ├── activations/             # Model activation profiles
│   ├── converters/              # Converter mappings (not by default)
│   └── inferences/              # Results
└── reft-adapters/               # Trained behavior adapters

📄 License

This project is licensed under the MIT License - see the LICENSE file for details. Certain artifacts are further gated for their potential to be abused.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Command-V: Pasting LLM Behaviors via Activation Profiles

How it works

⌘V Highlights

🪧 Demo

🎯 Quick Start

Installation

Three-Step Pipeline

📁 Project Structure

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
commandv		commandv
outputs/inferences		outputs/inferences
prompts		prompts
reft-adapters		reft-adapters
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
command_v_demo.ipynb		command_v_demo.ipynb
requirements.txt		requirements.txt
step1_capture_activations.py		step1_capture_activations.py
step2_derive_converters.py		step2_derive_converters.py
step3_commandv_inference.py		step3_commandv_inference.py

License

ippolito-cmu/Command-V

Folders and files

Latest commit

History

Repository files navigation

Command-V: Pasting LLM Behaviors via Activation Profiles

How it works

⌘V Highlights

🪧 Demo

🎯 Quick Start

Installation

Three-Step Pipeline

📁 Project Structure

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages