Finetune once, use on many LLMs. ⌘V is the first to demonstrate that you can "copy-paste" a finetuned adapter (for refusal suppression, jailbreaking resistance, chain-of-thought reasoning, etc.) learned by one LLM to another of a different size or architecture, without backpropagation.
⌘V leverages the fact that all transformers LLM activations (intermediate representations) can be intervened for behaviors using PEFT and approximated from another LLM. So to get it running:
- Profile activations on both models using a small set of prompts
- Create converters that map between model representation spaces
- Transfer behaviors by applying the donor model's learned interventions to the recipient
See details in our paper.
- Low Resource: No backpropagation. No expert data. Low GPU memory usage.
- Fast: Minutes to set up, not hours.
- Cross-Family: In certain conditions, works between Llama, Qwen, and other model families
Check out the complete pipeline (./command_v_demo.ipynb) or on Colab:
git clone https://github.com/GithuBarry/Command-V.git
cd Command-V
pip install -r requirements.txt
# Profile models on LIMA dataset (once per model)
python step1_capture_activations.py --models meta-llama/Llama-3.2-1B-Instruct meta-llama/Llama-3.1-8B-Instruct
# Learn activation space mappings (takes seconds)
# (Mappings are created on the fly in practice, making this step optional.)
python step2_derive_converters.py derive \
--source-model meta-llama/Llama-3.1-8B-Instruct \
--target-model meta-llama/Llama-3.2-1B-Instruct
# Transfer behaviors during inference
python step3_commandv_inference.py \
--recipient-model meta-llama/Llama-3.2-1B-Instruct \
--donor-model meta-llama/Llama-3.1-8B-Instruct \
--adapter-folder reft-adapters/jailbreak/Llama-3.1-8B-Instruct/NodireftIntervention/l1/walledai--AdvBench/L0;2;4;6;8;10;12;14;16;18;20;22;24;26;28;30 \
--input-source prompts/AdvBench/test.txt \
--first-n 5 --print-output-only
├── command_v_demo.ipynb # Jupyter Demo
├── step1_capture_activations.py # Step 1: Activation profiling
├── step2_derive_converters.py # Step 2: Converter derivation
├── step3_commandv_inference.py # Step 3: Behavioral transfer
├── commandv/ # Core library
│ ├── core/ # Main functionality
│ │ ├── capture.py # Activation capture
│ │ ├── converters.py # Pseudoinverse converters
│ │ └── inference.py # Inference engine
│ ├── utils/ # Utilities
│ └── data/ # Data processing
├── outputs/ # Generated files
│ ├── activations/ # Model activation profiles
│ ├── converters/ # Converter mappings (not by default)
│ └── inferences/ # Results
└── reft-adapters/ # Trained behavior adapters
This project is licensed under the MIT License - see the LICENSE file for details. Certain artifacts are further gated for their potential to be abused.