Skip to content

Commit a8d0d1a

Browse files
author
Sindhujach217
committed
update readme.md
1 parent ddc75a9 commit a8d0d1a

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

src/aixpert/training/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ m(x, y_w, y_l) =
2525
-
2626
\log \frac{\pi_{\text{ref}}(y_w \mid x)}{\pi_{\text{ref}}(y_l \mid x)}
2727
\]
28+
```
2829

2930
The **Original DPO loss** is:
3031

@@ -37,7 +38,7 @@ The **Original DPO loss** is:
3738
\log \sigma\left(\beta \cdot m(x,y_w,y_l)\right)
3839
\right]
3940
\]
40-
41+
```
4142
where:
4243
- \(\pi_\theta\) is the trainable policy
4344
- \(\pi_{\text{ref}}\) is the frozen reference policy

0 commit comments

Comments
 (0)