Skip to content

Factual Preference Alignment is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

License

Notifications You must be signed in to change notification settings

VectorInstitute/Factual-Preference-Alignment

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

A Modular Training Framework for Factuality-Aware Direct Preference Optimization(F-DPO)

🌐 Website: vectorinstitute.github.io/Factual-Preference-Alignment Β |Β  πŸ“„ Paper: arxiv.org/abs/2601.03027 Β |Β  πŸ“Š Dataset: Hugging Face


🧭 About

Factuality-aware Direct Preference Optimization is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

The project introduces F-DPO, a factuality-aware extension of Direct Preference Optimization (DPO) that incorporates:

  • Explicit factuality supervision
  • Synthetic hallucination inversion
  • Margin-based factual penalties

The repository provides end-to-end infrastructure for:

  • Dataset construction
  • Multi-model preference fine-tuning
  • Automated factuality evaluation

All components are config-driven, reproducible, and aligned with the Vector Institute AI Engineering Template.


✨ Key Contributions

  • πŸ” Binary factuality supervision integrated into preference learning
  • πŸ§ͺ Synthetic hallucination inversion pairs
  • πŸ“ Ξ”-margin factual penalties for controllable hallucination suppression
  • βš™οΈ Fully config-driven data, training, and evaluation pipelines
  • πŸ“Š Multi-model Γ— multi-Ξ” benchmarking at scale

πŸ“¦ Repository Structure

aixpert/
β”‚
β”œβ”€β”€ src/aixpert/
β”‚   β”œβ”€β”€ config/                  # Central config.yaml
β”‚   β”œβ”€β”€ data_construction/       # 8-stage factual dataset pipeline
β”‚   β”œβ”€β”€ training/                # Original-DPO & F-DPO training
β”‚   β”œβ”€β”€ evaluation/              # GPT-4o-mini judge evaluation
β”‚   └── utils/                   # Shared helpers
β”‚
β”œβ”€β”€ README.md
└── pyproject.toml

🧠 What Is F-DPO?

Standard DPO aligns models to human preferences, but does not explicitly discourage hallucinated yet preferred responses.

F-DPO introduces a factuality-aware margin:

  • Each preference tuple includes (h_w, h_l) factuality indicators
  • A penalty Ξ» is applied when the preferred response is less factual
  • Optimization pressure shifts toward factually correct preferences

➑️ Result: Lower hallucination rates without sacrificing preference alignment


πŸ”¬ Skywork β†’ F-DPO Data Construction Pipeline

This repository contains a complete eight-stage pipeline for converting the Skywork Reward-Preference-80K dataset into balanced, factual-aware DPO datasets.

Pipeline Stages

Stage Description
1 Skywork extraction & de-duplication
2 Preference pair conversion
3 Binary factuality scoring (GPT-4o-mini)
4 Canonical DPO transformation
5 Synthetic hallucination generation
6 Dataset merging
7 Balanced bucket construction
8 Optional preference flipping

All paths and parameters are defined in:

src/aixpert/config/config.yaml

βš™οΈ Configuration-Driven Design

Every component β€” datasets, models, hyperparameters, outputs, and evaluation β€” is controlled via:

src/aixpert/config/config.yaml

Loaded using:

from utils.config_loader import load_config
cfg = load_config()

This enables:

  • Full reproducibility
  • Multi-model automation
  • Zero hard-coded paths

πŸ‹οΈ Training Pipelines

1️⃣ Original-DPO (Baseline)

python -m aixpert.training.run_dpo_training \
  --model "google/gemma-2-9b-it"

Trains standard DPO using Skywork preferences.


2️⃣ F-DPO (Ξ”-Margin Training)

python -m aixpert.training.run_factual_training \
  --model_id "google/gemma-2-9b-it" \
  --short "gemma2-9b" \
  --delta 10

Each Ξ” value produces a separate fine-tuned model.


πŸ“Š Evaluation Pipeline

Evaluation is performed using GPT-4o-mini as an LLM-as-a-Judge.

Metrics

Metric Meaning
factuality Mean factual score
halluc_rate % outputs below threshold
win_rate Ξ”-model vs baseline
count Prompts evaluated

Run evaluation:

python -m aixpert.evaluation.evaluations.run_all_evaluations

Outputs:

eval_results.json

πŸ§ͺ Supported Models

  • Gemma-2 (2B, 9B)
  • Qwen-2.5 / Qwen-3
  • LLaMA-3.x
  • Any TRL-compatible causal LLM

Models are registered centrally in config.yaml.


🧰 Frameworks & Tooling

  • Hugging Face TRL β€” DPO reference implementation
  • Unsloth β€” QLoRA optimization
  • BitsAndBytes β€” 4-bit quantization
  • Flash-Attention-2
  • Weights & Biases β€” experiment tracking
  • Accelerate β€” multi-GPU orchestration

πŸ“š Dataset Attribution & Credits

This project builds upon and extends the Skywork Reward-Preference-80K dataset.

We do not claim ownership of the Skywork dataset. All credit belongs to the original authors.

If you use this repository, please cite Skywork:

@article{liu2024skywork,
  title={Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs},
  author={Liu, Chris Yuhao and Zeng, Liang and Liu, Jiacai and Yan, Rui and He, Jujie and Wang, Chaojie and Yan, Shuicheng and Liu, Yang and Zhou, Yahui},
  journal={arXiv preprint arXiv:2410.18451},
  year={2024}
}

For dataset-related concerns, please contact the Skywork authors via their paper or Hugging Face repository.


πŸ“– Citation (Factuality-aware Direct Preference Optimization)

If you find this code or dataset useful for your research, please consider citing:

@article{FactualAlignment2026,
  title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
  author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
  journal={arXiv preprint arXiv:2601.03027},
  year={2026}
}

πŸ“¬ Contact

For questions, collaborations, or issues:

  • Open a GitHub Issue
  • Or contact the maintainers via the Vector Institute

⚑ Factuality-aware Direct Preference Optimization promotes in reducing hallucinations and increase factualness

We invite researchers and practitioners to build upon this framework.

About

Factual Preference Alignment is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages