ReCoVeR (Reducing language Confusion with Vector Representations) is a lightweight representation steering method designed to mitigate language confusion in multilingual language models while maintaining strong task performance.
This repository provides tools to:
- Collect language-specific vectors
- Apply them at inference time for controlled steering
- Train enhanced vectors (ReCoVeR+) for improved performance
First, clone this repository and install the dependencies:
git clone https://github.com/hSterz/recover.git
cd recover
pip install -r requirements.txt ReCoVeR requires average language vectors to perform representation steering.
These vectors can be collected using the collect_language_vectors.
-
Clone or download the collect_language_vectors repository.
-
Adapt the
collect_language_vectors\run.sh
-
our model name (e.g., Qwen/Qwen2-7B-Instruct)
-
Dataset(s) used for language representation extraction
-
Batch size and other parameters
- Run the script!
This will output average representations per language, which can later be used for steering.
Once vectors are collected, they can be applied during inference.
Use the Steering class to load and apply vectors.
For evaluation, use the Language Confusion Benchmark (LCB) with the predict_lcb.py script.
python predict_lcb.py \
--crosslingual \
--alpha 1 \
--beta 1 \
--scaling norm \
--restore_norm \
--version recover \
--model_name Qwen/Qwen2-7B-Instruct
This outputs a csv file that can be evaluated with this evaluation code.
In addition to static vectors, ReCoVeR+ provides trained steering vectors for enhanced performance.
Run the training script:
cd recover
train.sh
Evaluate with the same predict_lcb.py script, but pass the path to the trained steering module:
python predict_lcb.py \
--crosslingual \
--alpha 1 \
--beta 0.9 \
--scaling norm \
--restore_norm \
--version recover_plus \
--model_name Qwen/Qwen2-7B-Instruct \
--path <PATH_TO_STEER>
@article{sterz2025recover,
title={ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance},
author={Hannah Sterz and Fabian David Schmidt and Goran Glavaš and Ivan Vulić},
year={2025},
journal={arXiv preprint arXiv:2509.14814},
}
