Skip to content

sail-sg/feedback-conditional-policy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FCP (Feedback Conditional Policy)

arXiv Hugging Face Collection

This is the official repository for the paper Language Models Can Learn from Verbal Feedback Without Scalar Rewards.

A training framework that implements Feedback Conditional Policy (FCP) for aligning large language models with verbal feedback.

📝 Updates

  • 2026-01-05: Simplified the codebase, provided modification documentation (MODIFICATIONS_FCP.md), and released model checkpoints on Hugging Face.
  • 2025-09-25: Open-sourced this repository.

🚀 Quick Start

Prerequisites

  • verl framework
  • Set your OPENAI_API_KEY environment variable before training

🏋️ Training

Offline FCP Training

Use LLaMA-Factory's built-in SFT training code with the SFT datasets mentioned below.

FCP Bootstrapping (Online) Training

Run the VERL training script:

./verl/recipe/fcp/run_fcp.sh

Configuration details can be found in verl/recipe/fcp/config/fcp_trainer.yaml.

📊 Datasets & Frameworks

We use different frameworks and datasets for different training stages:

Offline FCP Training

Framework: LLaMA-Factory
Datasets:

FCP Bootstrapping (Online) Training

Framework: verl
Datasets:

📖 Citation

If you find this code useful, please consider citing our paper:

@article{luo2025languagemodelslearnverbal,
      title={Language Models Can Learn from Verbal Feedback Without Scalar Rewards}, 
      author={Renjie Luo and Zichen Liu and Xiangyan Liu and Chao Du and Min Lin and Wenhu Chen and Wei Lu and Tianyu Pang},
      journal={arXiv preprint arXiv:2509.22638},
      year={2025}
}

About

Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •