Skip to content

Commit 5a658f8

Browse files
committed
Set kl_coef = 0 explicitly for REC configs
1 parent 17f2a40 commit 5a658f8

File tree

2 files changed

+22
-6
lines changed

2 files changed

+22
-6
lines changed

examples/rec_gsm8k/README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -83,11 +83,10 @@ algorithm:
8383
epsilon_high: 0.2
8484
clip_mode: "one-side"
8585
weight: "none"
86-
temp: 1.0
87-
regularizer: "none"
88-
regularizer_coef: 0.0
8986
advantage_fn_args:
9087
std_normalize: false
88+
kl_loss_fn_args:
89+
kl_coef: 0.0
9190
```
9291

9392
**REC-OneSide-IS:**
@@ -100,11 +99,10 @@ algorithm:
10099
epsilon_high: 0.2
101100
clip_mode: "one-side"
102101
weight: "importance_sampling"
103-
temp: 1.0
104-
regularizer: "none"
105-
regularizer_coef: 0.0
106102
advantage_fn_args:
107103
std_normalize: false
104+
kl_loss_fn_args:
105+
kl_coef: 0.0
108106
```
109107

110108
**REC-GSPO-NoIS:**
@@ -119,6 +117,8 @@ algorithm:
119117
weight: "none"
120118
advantage_fn_args:
121119
std_normalize: false
120+
kl_loss_fn_args:
121+
kl_coef: 0.0
122122
```
123123

124124
**REC-GSPO-IS:**
@@ -133,6 +133,8 @@ algorithm:
133133
weight: "gspo_importance_sampling"
134134
advantage_fn_args:
135135
std_normalize: false
136+
kl_loss_fn_args:
137+
kl_coef: 0.0
136138
```
137139

138140
**REC-TwoSide-IS:**
@@ -150,6 +152,8 @@ algorithm:
150152
regularizer_coef: 0.0
151153
advantage_fn_args:
152154
std_normalize: false
155+
kl_loss_fn_args:
156+
kl_coef: 0.0
153157
```
154158

155159
**REC-Ring-NoIS:**
@@ -169,6 +173,8 @@ algorithm:
169173
regularizer_coef: 0.0
170174
advantage_fn_args:
171175
std_normalize: false
176+
kl_loss_fn_args:
177+
kl_coef: 0.0
172178
```
173179

174180
### REP family
@@ -187,6 +193,8 @@ algorithm:
187193
regularizer_coef: 0.1
188194
advantage_fn_args:
189195
std_normalize: false
196+
kl_loss_fn_args:
197+
kl_coef: 0.0
190198
```
191199

192200

@@ -202,6 +210,8 @@ algorithm:
202210
regularizer_coef: 0.1
203211
advantage_fn_args:
204212
std_normalize: false
213+
kl_loss_fn_args:
214+
kl_coef: 0.0
205215
```
206216

207217
### RED family
@@ -219,6 +229,8 @@ algorithm:
219229
advantage_fn_args:
220230
std_normalize: false
221231
drop: "balance"
232+
kl_loss_fn_args:
233+
kl_coef: 0.0
222234
```
223235

224236

@@ -234,6 +246,8 @@ algorithm:
234246
temp: 1.0
235247
advantage_fn_args:
236248
std_normalize: false
249+
kl_loss_fn_args:
250+
kl_coef: 0.0
237251
```
238252

239253
## Citation

examples/rec_gsm8k/gsm8k.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ algorithm:
1818
weight: "none"
1919
advantage_fn_args:
2020
std_normalize: false
21+
kl_loss_fn_args:
22+
kl_coef: 0.0
2123
cluster:
2224
node_num: 1
2325
gpu_per_node: 8

0 commit comments

Comments
 (0)