This repository contains the code for reproducing the experiments of the paper "Post-Training Denoising of User Profiles with LLMs in Collaborative Filtering Recommendation" by Ervin Dervishaj, Tuukka Ruotsalo, Maria Maistro and Christina Lioma, accepted as a full paper at ECIR 2026.
This repository requires Python 3.10.x. Create a Python environment and install the necessary packages:
pip install -r requirements.txtThe experiments have been prepared with RecBole v1.2.1 (included in this repository) with minimal changes:
pip install -e RecBoleThe repository is structured as follows:
/experiments: folder where all experimental configurations and results are collected./experiments/configs: RecBole run configuration files for the 3 datasets used in our experiments./experiments/prompts: the LLM prompts used for denoising./experiments/saved: folder where raw/preprocessed datasets, LLM generation and results are saved./models: collaborative filtering model used in the experiments./notebooks: contains the evaluationJupyternotebook./RecBole: code for RecBole v1.2.1 including some minimal changes for our experiments./utils: utility code for preparing data, prompting LLMs and evaluation./run.py: main entry point for running the experiments.
You can replicate our results using the following commands (using Yelp dataset as example). Each command saves locally necessary files (and loads them, if they exist) that are consumed by subsequent commands, so run these in the given order.
First, the dataset is prepared with RecBole:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_datasetwhere --config is the path to the RecBole run configuration file.
For Yelp (and Amazon CDs & Vinyl), we also sample 10000 users:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd sample_users --n 10000The following command prepares the item content that represents user profiles in the LLM prompts:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_item2contentThe dataloaders that will be used during training and validation should be prepared. If replicating experiments with few-shot examples in the prompts, include the flag --is_few_shot True:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_dataloaders [--is_few_shot True]Finally, compute the user histories for the LLM prompts:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_user_historiespython run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_trained_modelIn order to prepare the prompt data, our denoising approach requires computing the candidate item ranks from the CF model:
# Compute validation ranks
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_ranks --ranks_type dev --topk 10
# Compute test ranks
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_ranks --ranks_type test --topk 10The flag --topk N computes and saves the top-K recommendations for each user.
Next, you need to compute the denoising prompts to the LLM for each user. In our paper we experiment with zero-shot and few-shot (ICL examples) prompting strategies:
# Zero-shot prompting data
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_prompt_samplesFor ICL prompting with denoising examples:
# Compute denoising examples
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_examples --best 1where --best indicates the number of items removed from the user profile for the denoising example. It can be set to 1 or 2.
Then, compute the prompt data:
# ICL prompting data
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_prompt_samples --best 1For ICL prompt data with top-10 recommendations:
# ICL prompting with recommendation examples
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd get_prompt_samples --with_recs TrueThe following commands performs a denoising sweep over all the users in a dataset:
- zero-shot:
# Remove only 1 item from user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_1.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples.pkl# Remove 2 items from the user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_2.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples.pkl- few-shot with (1/2 best) denoising examples:
# `dev-prompt-samples-1-fs.pkl` includes denoising examples of removing 1 item from the user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_1.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples-1-fs.pkl# `dev-prompt-samples-1-fs.pkl` includes denoising examples of removing 2 item2 from the user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_2.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples-2-fs.pkl- few-shot with top-10 recommendation examples:
# Remove only 1 item from user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_1_recs.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples-recs.pkl# Remove 2 items from user profile
python -m utils.denoise_LLM --local --LLM Qwen/Qwen3-8B --system-prompt experiments/prompts/Yelp_remove_2_recs.txt --config experiments/configs/Yelp_CustomMultiVAE.yml --samples experiments/saved/2025/Yelp_CustomMultiVAE/dev-prompt-samples-recs.pklFor the other datasets, change the --config YAML file, the --system-prompt file and --samples prompt samples accordingly.
To clean some of the generated output from the LLM the following regular expressions are applied in the given order with modality replace all occurrences (in a code editor, e.g., Visual Studio Code):
| Match | Replacement |
|---|---|
[\|"]{3,} |
\" |
\\"'([^,\[\]]*)'\\" |
\"$1\" |
\\"\\"([^,\]\[]+)\\"\\" |
\"$1\" |
[\|"]+([^,\]\[\\"]+)[\|"]+ |
\"$1\" |
[\|"]+([^\]\[\\"]+)[\|"]+ |
\"$1\" |
\\"\\\\\\"([^,]+)\\\\\\"\\" |
\"$1\" |
'\\"([^,\[\]]+)\\"' |
\"$1\" |
"([^\[\]\\]+)" |
"[\"$1\"]" |
"\\"(.*)\\"" |
"[\"$1\"]" |
"\[([^,\\"]+)\]" |
"[\"$1\"]" |
"\[([^"]+)\]" |
"[\"$1\"]" |
"\['([^,\\"\]\[]+)',\s*'([^,\\"\]\[]+)'\]" |
"[\"$1\", \"$2\"]" |
"\[([^,\\"\]\[]+),\s*([^,\\"\]\[]+)\]" |
"[\"$1\", \"$2\"]" |
(?<=[" \[\]])[^\\\[\]",]+(?=[",\]\[]) |
\"$1\" |
Compute the UpperBoundOnVal baselines:
python run.py --config experiments/configs/Yelp_CustomMultiVAE.yml --cmd brute_force_cf --COMB 1--COMB indicates number of combinations to remove from the user profile. It can be set to 1 or 2.
To evaluate our LLM denoising approach and the baselines, run the notebook eval.ipynb in /notebooks folder. Results are saved in the path specified by the checkpoint_dir property in the RecBole configuration file. Change the parameter config_file_list to one of the RecBole configurations files in /experiments/configs folder:
config = Config(config_file_list=['experiments/configs/Yelp_CustomMultiVAE.yml'])and update accordingly the LLM result filenames/prompts.
