Skip to content

Conversation

@YuzeHao2023
Copy link

@YuzeHao2023 YuzeHao2023 commented Jan 6, 2026

【论文复现:将论文中的基于能量的奖励模型(EBRM)替换为QBM】

任务描述:奖励模型(RMs)对于将大语言模型(LLMs)与人类偏好对齐至关重要,然而它们往往难以捕捉复杂的人类偏好,并难以泛化至未见数据。本任务需将论文《Energy-Based Reward Models for Robust Language Model Alignment》中提到的基于能量的奖励模型(EBRM)中的Energy Score模块替换为QBM,并使用文章中提到的数据集进行结果对比验证。

论文中EBRM模型中的energy score算法的传入参数为RM模型传出的特征值'embedding'以及RM模型的打分'r', 传出的参数为'r*'作为修正的打分值。我们使用QBM替换energy score同样使用相同的参数传递方法来进行修正打分。

模型训练部分我们对两个数据集均训练了5个epochs并进行数据的可视化(参考example/qbm_ebrm_results/imgs/文件夹或example/qbm_ebrm_results/README.md)。

文件变更:

├── README.md
├── imgs
│   ├── pairwire-training_plots.png
│   └── training_plots.png
├── model_final.pth
├── prepare_rmb_dataset.py
├── rmb_dataset.pt
├── rmb_dataset2.pt
├── rmb_dataset_pairwise.pt
├── rmb_dataset_train.pt
├── rmb_dataset_val.pt
├── run_diagnostic_train.py
├── run_qbm_ebm.py # smoke text
├── run_train_qbm_ebm.py
├── save_and_plot_results.py
└── training_metrics.npz

close #78

参考文献:

@article{lochab2025energy,
  title={Energy-Based Reward Models for Robust Language Model Alignment},
  author={Lochab, Anamika and Zhang, Ruqi},
  journal={arXiv preprint arXiv:2504.13134},
  year={2025}
}
@inproceedings{lambert2025rewardbench,
  title={Rewardbench: Evaluating reward models for language modeling},
  author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, Lester James Validad and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and others},
  booktitle={Findings of the Association for Computational Linguistics: NAACL 2025},
  pages={1755--1797},
  year={2025}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants