Personalization Editing

Repository Overview: This repository contains the code and data for the paper "Towards Effective Model Editing for LLM Personalization"

Repository Structure

data/: Contains the datasets used in Personalization Editing.
code/: Includes scripts and code to perform Personalization Editing and reproduce the results in the paper.

Installation

To set up the environment for running the code, follow these steps:

Clone the repository

Create a virtual environment and activate it:

conda create -n edit python=3.9 -y
conda activate edit

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Data Preparation

Datasets are stored in the data/ directory. There are following files:

data/
    .
    ├── prefeval_pro
    └── UPQA

Data Format

Each generated entry contains:

input_attribute: Original persona text
attribute_type: High-level category (e.g., "hobby", "profession", "pet", "location")
question: Direct question using the attribute_type (e.g., "What's my hobby?")
question_paraphrased: Natural rewording of the direct question
implicit_question: Conversational question that guides toward the target without naming the attribute
product_recommendation_question: Product suggestion question relevant to the attribute_type
target: Concise description extracted from the persona

Running Experiments

Quick start test run: To get started (e.g. using ROME to edit llama3-8b on UPQA), run:

cd ./code
python3 edit_cluster.py \
    --hparams_dir=ROME/llama3-8b \
    --data_path=../data/UPQA/balanced_subset.json \
    --device=0 \
    --size=100 \

To run the multi-turn evaluation, here is an example:

cd ./code
python run_edit.py \
    --hparams_dir=ROME/olmo2-7b \
    --data_path=prefeval_pro/prefeval_pro_balanced.json \
    --size=100 \
    --inter_turns=2 \
    --results_dir=prefeval_multi_turn \
    --device=0

Use --inter_turns to set the number of turns for multi-turn evaluation.

We use claude-3-7-sonnet as the evaluator to assess if model responses match the labels, switch to a local LLM (e.g., Llama3-8b) with ''. For experiments, we recommend using at least one GPU with 48 GB of memory (e.g., NVIDIA RTX A6000) Adjust the device number and evaluation model using --model_eval and --device_eval as shown in the example above.

For full experiments to reproduce the results in the paper:

Experiments for clustering-based preference representations:
```
./run_edit_cluster.sh
```
Experiments for multi-turn:
```
./run_edit.sh
./run_eval.sh
```

We evaluate models including Llama-3-8B-Instruct, OLMo-7B-Instruct-hf, Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GPT-J-6B and Mistral-7B-v0.3. All parameters are in the code/hparams/<method_name>/<model_name>.

Acknowledgements

We gratefully acknowledge the use of code and data from the following projects: EasyEdit, ROME, and PrefEval.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
data		data
LICENSES		LICENSES
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personalization Editing

Table of Contents

Repository Structure

Installation

Usage

Data Preparation

Data Format

Running Experiments

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

amazon-science/personalization-editing-upqa

Folders and files

Latest commit

History

Repository files navigation

Personalization Editing

Table of Contents

Repository Structure

Installation

Usage

Data Preparation

Data Format

Running Experiments

Acknowledgements

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages