Skip to content

Commit a5508f7

Browse files
author
sindchad
committed
updated readme.md
1 parent 02c82de commit a5508f7

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,15 @@ You do **not** need a `requirements.txt` or `uv.lock` file; everything is specif
1818
git clone https://github.com/VectorInstitute/AIXpert-preference-alignment.git
1919
cd AIXpert-preference-alignment
2020

21+
```
2122
### 🛠️ 2️⃣ Create and Sync the Environment
2223

2324
Run the following command to automatically create a virtual environment and install all dependencies:
2425

2526
```bash
2627
uv sync
2728
source .venv/bin/activate
28-
29+
```
2930
## 🧮 Dataset Construction – `Sky()` Function Overview - preprocess_dataset.py
3031

3132
The `Sky()` function prepares a **pairwise preference dataset** used for **Direct Preference Optimization (DPO)** fine-tuning.
@@ -84,7 +85,7 @@ Once your environment is set up:
8485
8586
```bash
8687
uv run inference_best_of_n.py
87-
88+
```
8889
## 🧩 Inference – All Template Generation (Hint Sampling)
8990

9091
Once `inference_best_of_n.py` has generated the multi-sample outputs,
@@ -99,7 +100,7 @@ It also includes built-in checkpointing and resume capabilities for long runs.
99100

100101
```bash
101102
uv run inference_all.py
102-
103+
```
103104
## 🧹 Post-Processing
104105

105106
After generating the structured outputs (`guide.jsonl`, `guide_reverse.jsonl`, and `output.jsonl`) from the inference scripts,
@@ -111,7 +112,7 @@ this script performs **final cleanup and reindexing** to ensure that all generat
111112

112113
```bash
113114
uv run post_processing.py
114-
115+
```
115116
## 🧱 DPO Dataset Construction
116117

117118
After cleaning and aligning the inference outputs, this script **constructs the final dataset** required for **Direct Preference Optimization (DPO)** fine-tuning.
@@ -123,7 +124,7 @@ It parses the model’s reasoning outputs, identifies preference-aligned samples
123124

124125
```bash
125126
uv run construct_dpo_dataset.py
126-
127+
```
127128
## 📊 DPO Dataset Inspection
128129

129130
After constructing the DPO dataset using `construct_dpo_dataset.py`,
@@ -135,7 +136,7 @@ this script provides a **quick inspection and summary** of the saved dataset —
135136

136137
```bash
137138
uv run dpo_dataset.py
138-
139+
```
139140
## 🧠 Direct Preference Optimization Training
140141
This script performs **Direct Preference Optimization (DPO)** fine-tuning on the constructed dataset using **Qwen2-7B-Instruct**.
141142
It aligns the model’s responses with human-preferred outputs by learning from **(chosen, rejected)** pairs generated earlier.
@@ -146,7 +147,7 @@ It aligns the model’s responses with human-preferred outputs by learning from
146147

147148
```bash
148149
uv run dpo_training.py
149-
150+
```
150151
## 📊 Model Evaluation Script
151152

152153
This script evaluates the **Direct Preference Optimization (DPO)** fine-tuned model against the **base model (Qwen2-7B-Instruct)** on the validation split of the dataset.
@@ -158,3 +159,4 @@ It measures how well each model predicts the **preferred (chosen)** answers from
158159

159160
```bash
160161
uv run accuracy.py
162+
```

0 commit comments

Comments
 (0)