π Website: vectorinstitute.github.io/Factual-Preference-Alignment Β |Β π Paper: arxiv.org/abs/2601.03027 Β |Β π Dataset: Hugging Face
Factuality-aware Direct Preference Optimization is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).
The project introduces F-DPO, a factuality-aware extension of Direct Preference Optimization (DPO) that incorporates:
- Explicit factuality supervision
- Synthetic hallucination inversion
- Margin-based factual penalties
The repository provides end-to-end infrastructure for:
- Dataset construction
- Multi-model preference fine-tuning
- Automated factuality evaluation
All components are config-driven, reproducible, and aligned with the Vector Institute AI Engineering Template.
- π Binary factuality supervision integrated into preference learning
- π§ͺ Synthetic hallucination inversion pairs
- π Ξ-margin factual penalties for controllable hallucination suppression
- βοΈ Fully config-driven data, training, and evaluation pipelines
- π Multi-model Γ multi-Ξ benchmarking at scale
aixpert/
β
βββ src/aixpert/
β βββ config/ # Central config.yaml
β βββ data_construction/ # 8-stage factual dataset pipeline
β βββ training/ # Original-DPO & F-DPO training
β βββ evaluation/ # GPT-4o-mini judge evaluation
β βββ utils/ # Shared helpers
β
βββ README.md
βββ pyproject.toml
Standard DPO aligns models to human preferences, but does not explicitly discourage hallucinated yet preferred responses.
F-DPO introduces a factuality-aware margin:
- Each preference tuple includes
(h_w, h_l)factuality indicators - A penalty Ξ» is applied when the preferred response is less factual
- Optimization pressure shifts toward factually correct preferences
β‘οΈ Result: Lower hallucination rates without sacrificing preference alignment
This repository contains a complete eight-stage pipeline for converting the Skywork Reward-Preference-80K dataset into balanced, factual-aware DPO datasets.
| Stage | Description |
|---|---|
| 1 | Skywork extraction & de-duplication |
| 2 | Preference pair conversion |
| 3 | Binary factuality scoring (GPT-4o-mini) |
| 4 | Canonical DPO transformation |
| 5 | Synthetic hallucination generation |
| 6 | Dataset merging |
| 7 | Balanced bucket construction |
| 8 | Optional preference flipping |
All paths and parameters are defined in:
src/aixpert/config/config.yaml
Every component β datasets, models, hyperparameters, outputs, and evaluation β is controlled via:
src/aixpert/config/config.yaml
Loaded using:
from utils.config_loader import load_config
cfg = load_config()This enables:
- Full reproducibility
- Multi-model automation
- Zero hard-coded paths
python -m aixpert.training.run_dpo_training \
--model "google/gemma-2-9b-it"Trains standard DPO using Skywork preferences.
python -m aixpert.training.run_factual_training \
--model_id "google/gemma-2-9b-it" \
--short "gemma2-9b" \
--delta 10Each Ξ value produces a separate fine-tuned model.
Evaluation is performed using GPT-4o-mini as an LLM-as-a-Judge.
| Metric | Meaning |
|---|---|
| factuality | Mean factual score |
| halluc_rate | % outputs below threshold |
| win_rate | Ξ-model vs baseline |
| count | Prompts evaluated |
Run evaluation:
python -m aixpert.evaluation.evaluations.run_all_evaluationsOutputs:
eval_results.json
- Gemma-2 (2B, 9B)
- Qwen-2.5 / Qwen-3
- LLaMA-3.x
- Any TRL-compatible causal LLM
Models are registered centrally in config.yaml.
- Hugging Face TRL β DPO reference implementation
- Unsloth β QLoRA optimization
- BitsAndBytes β 4-bit quantization
- Flash-Attention-2
- Weights & Biases β experiment tracking
- Accelerate β multi-GPU orchestration
This project builds upon and extends the Skywork Reward-Preference-80K dataset.
We do not claim ownership of the Skywork dataset. All credit belongs to the original authors.
If you use this repository, please cite Skywork:
@article{liu2024skywork,
title={Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs},
author={Liu, Chris Yuhao and Zeng, Liang and Liu, Jiacai and Yan, Rui and He, Jujie and Wang, Chaojie and Yan, Shuicheng and Liu, Yang and Zhou, Yahui},
journal={arXiv preprint arXiv:2410.18451},
year={2024}
}For dataset-related concerns, please contact the Skywork authors via their paper or Hugging Face repository.
If you find this code or dataset useful for your research, please consider citing:
@article{FactualAlignment2026,
title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
journal={arXiv preprint arXiv:2601.03027},
year={2026}
}For questions, collaborations, or issues:
- Open a GitHub Issue
- Or contact the maintainers via the Vector Institute
β‘ Factuality-aware Direct Preference Optimization promotes in reducing hallucinations and increase factualness
We invite researchers and practitioners to build upon this framework.
