Skip to content

Commit be52160

Browse files
authored
Update README.md
1 parent 5227049 commit be52160

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning
2-
### A Modular Training Framework for Factual-Aware DPO
2+
### A Modular Training Framework for Factuality-Aware Direct Preference Optimization(F-DPO)
33

44
<p align="center" style="margin-top: -10px; margin-bottom: -10px;">
55
<img src="docs/assets/factualDPO.png" width="320"/>
@@ -19,7 +19,7 @@
1919

2020
**Factuality-aware Direct Preference Optimization** is a **research and engineering framework** for studying and improving **factual alignment in preference-optimized Large Language Models (LLMs)**.
2121

22-
The project introduces **Factual-DPO**, a factuality-aware extension of **Direct Preference Optimization (DPO)** that incorporates:
22+
The project introduces **F-DPO**, a factuality-aware extension of **Direct Preference Optimization (DPO)** that incorporates:
2323

2424
* Explicit factuality supervision
2525
* Synthetic hallucination inversion
@@ -53,7 +53,7 @@ aixpert/
5353
├── src/aixpert/
5454
│ ├── config/ # Central config.yaml
5555
│ ├── data_construction/ # 8-stage factual dataset pipeline
56-
│ ├── training/ # Original-DPO & Factual-DPO training
56+
│ ├── training/ # Original-DPO & F-DPO training
5757
│ ├── evaluation/ # GPT-4o-mini judge evaluation
5858
│ └── utils/ # Shared helpers
5959
@@ -63,11 +63,11 @@ aixpert/
6363

6464
---
6565

66-
## 🧠 What Is Factual-DPO?
66+
## 🧠 What Is F-DPO?
6767

6868
Standard DPO aligns models to **human preferences**, but does not explicitly discourage **hallucinated yet preferred responses**.
6969

70-
**Factual-DPO** introduces a factuality-aware margin:
70+
**F-DPO** introduces a factuality-aware margin:
7171

7272
* Each preference tuple includes `(h_w, h_l)` factuality indicators
7373
* A penalty λ is applied when the preferred response is less factual
@@ -77,7 +77,7 @@ Standard DPO aligns models to **human preferences**, but does not explicitly dis
7777

7878
---
7979

80-
## 🔬 Skywork → Factual-DPO Data Construction Pipeline
80+
## 🔬 Skywork → F-DPO Data Construction Pipeline
8181

8282
This repository contains a complete **eight-stage pipeline** for converting the **Skywork Reward-Preference-80K** dataset into **balanced, factual-aware DPO datasets**.
8383

@@ -138,7 +138,7 @@ Trains standard DPO using Skywork preferences.
138138

139139
---
140140

141-
### 2️⃣ Factual-DPO (Δ-Margin Training)
141+
### 2️⃣ F-DPO (Δ-Margin Training)
142142

143143
```bash
144144
python -m aixpert.training.run_factual_training \

0 commit comments

Comments
 (0)