Skip to content

amazon-science/personalization-editing-upqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Personalization Editing

Repository Overview: This repository contains the code and data for the paper "Towards Effective Model Editing for LLM Personalization"

Table of Contents

  1. Overview
  2. Repository Structure
  3. Installation
  4. Usage
  5. Citation

Repository Structure

  • data/: Contains the datasets used in Personalization Editing.
  • code/: Includes scripts and code to perform Personalization Editing and reproduce the results in the paper.

Installation

To set up the environment for running the code, follow these steps:

  1. Clone the repository

  2. Create a virtual environment and activate it:

    conda create -n edit python=3.9 -y
    conda activate edit
  3. Install the required dependencies:

    pip install -r requirements.txt

Usage

Data Preparation

  1. Datasets are stored in the data/ directory. There are following files:
data/
    .
    ├── prefeval_pro
    └── UPQA

Data Format

Each generated entry contains:

  • input_attribute: Original persona text
  • attribute_type: High-level category (e.g., "hobby", "profession", "pet", "location")
  • question: Direct question using the attribute_type (e.g., "What's my hobby?")
  • question_paraphrased: Natural rewording of the direct question
  • implicit_question: Conversational question that guides toward the target without naming the attribute
  • product_recommendation_question: Product suggestion question relevant to the attribute_type
  • target: Concise description extracted from the persona

Running Experiments

Quick start test run: To get started (e.g. using ROME to edit llama3-8b on UPQA), run:

cd ./code
python3 edit_cluster.py \
    --hparams_dir=ROME/llama3-8b \
    --data_path=../data/UPQA/balanced_subset.json \
    --device=0 \
    --size=100 \

To run the multi-turn evaluation, here is an example:

cd ./code
python run_edit.py \
    --hparams_dir=ROME/olmo2-7b \
    --data_path=prefeval_pro/prefeval_pro_balanced.json \
    --size=100 \
    --inter_turns=2 \
    --results_dir=prefeval_multi_turn \
    --device=0 
  • Use --inter_turns to set the number of turns for multi-turn evaluation.

We use claude-3-7-sonnet as the evaluator to assess if model responses match the labels, switch to a local LLM (e.g., Llama3-8b) with ''. For experiments, we recommend using at least one GPU with 48 GB of memory (e.g., NVIDIA RTX A6000) Adjust the device number and evaluation model using --model_eval and --device_eval as shown in the example above.

For full experiments to reproduce the results in the paper:

  1. Experiments for clustering-based preference representations:

    ./run_edit_cluster.sh
  2. Experiments for multi-turn:

    ./run_edit.sh
    ./run_eval.sh

We evaluate models including Llama-3-8B-Instruct, OLMo-7B-Instruct-hf, Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GPT-J-6B and Mistral-7B-v0.3. All parameters are in the code/hparams/<method_name>/<model_name>.

Acknowledgements

We gratefully acknowledge the use of code and data from the following projects: EasyEdit, ROME, and PrefEval.

About

No description, website, or topics provided.

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages