You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Factuality-aware Direct Preference Optimization** is a **research and engineering framework** for studying and improving **factual alignment in preference-optimized Large Language Models (LLMs)**.
21
21
22
-
The project introduces **Factual-DPO**, a factuality-aware extension of **Direct Preference Optimization (DPO)** that incorporates:
22
+
The project introduces **F-DPO**, a factuality-aware extension of **Direct Preference Optimization (DPO)** that incorporates:
│ ├── training/ # Original-DPO & Factual-DPO training
56
+
│ ├── training/ # Original-DPO & F-DPO training
57
57
│ ├── evaluation/ # GPT-4o-mini judge evaluation
58
58
│ └── utils/ # Shared helpers
59
59
│
@@ -63,11 +63,11 @@ aixpert/
63
63
64
64
---
65
65
66
-
## 🧠 What Is Factual-DPO?
66
+
## 🧠 What Is F-DPO?
67
67
68
68
Standard DPO aligns models to **human preferences**, but does not explicitly discourage **hallucinated yet preferred responses**.
69
69
70
-
**Factual-DPO** introduces a factuality-aware margin:
70
+
**F-DPO** introduces a factuality-aware margin:
71
71
72
72
* Each preference tuple includes `(h_w, h_l)` factuality indicators
73
73
* A penalty λ is applied when the preferred response is less factual
@@ -77,7 +77,7 @@ Standard DPO aligns models to **human preferences**, but does not explicitly dis
77
77
78
78
---
79
79
80
-
## 🔬 Skywork → Factual-DPO Data Construction Pipeline
80
+
## 🔬 Skywork → F-DPO Data Construction Pipeline
81
81
82
82
This repository contains a complete **eight-stage pipeline** for converting the **Skywork Reward-Preference-80K** dataset into **balanced, factual-aware DPO datasets**.
83
83
@@ -138,7 +138,7 @@ Trains standard DPO using Skywork preferences.
0 commit comments