File tree Expand file tree Collapse file tree 2 files changed +22
-6
lines changed
Expand file tree Collapse file tree 2 files changed +22
-6
lines changed Original file line number Diff line number Diff line change @@ -83,11 +83,10 @@ algorithm:
8383 epsilon_high: 0.2
8484 clip_mode: "one-side"
8585 weight: "none"
86- temp: 1.0
87- regularizer: "none"
88- regularizer_coef: 0.0
8986 advantage_fn_args:
9087 std_normalize: false
88+ kl_loss_fn_args:
89+ kl_coef: 0.0
9190```
9291
9392** REC-OneSide-IS:**
@@ -100,11 +99,10 @@ algorithm:
10099 epsilon_high: 0.2
101100 clip_mode: "one-side"
102101 weight: "importance_sampling"
103- temp: 1.0
104- regularizer: "none"
105- regularizer_coef: 0.0
106102 advantage_fn_args:
107103 std_normalize: false
104+ kl_loss_fn_args:
105+ kl_coef: 0.0
108106```
109107
110108** REC-GSPO-NoIS:**
@@ -119,6 +117,8 @@ algorithm:
119117 weight: "none"
120118 advantage_fn_args:
121119 std_normalize: false
120+ kl_loss_fn_args:
121+ kl_coef: 0.0
122122```
123123
124124** REC-GSPO-IS:**
@@ -133,6 +133,8 @@ algorithm:
133133 weight: "gspo_importance_sampling"
134134 advantage_fn_args:
135135 std_normalize: false
136+ kl_loss_fn_args:
137+ kl_coef: 0.0
136138```
137139
138140** REC-TwoSide-IS:**
@@ -150,6 +152,8 @@ algorithm:
150152 regularizer_coef: 0.0
151153 advantage_fn_args:
152154 std_normalize: false
155+ kl_loss_fn_args:
156+ kl_coef: 0.0
153157```
154158
155159** REC-Ring-NoIS:**
@@ -169,6 +173,8 @@ algorithm:
169173 regularizer_coef: 0.0
170174 advantage_fn_args:
171175 std_normalize: false
176+ kl_loss_fn_args:
177+ kl_coef: 0.0
172178```
173179
174180### REP family
@@ -187,6 +193,8 @@ algorithm:
187193 regularizer_coef: 0.1
188194 advantage_fn_args:
189195 std_normalize: false
196+ kl_loss_fn_args:
197+ kl_coef: 0.0
190198```
191199
192200
@@ -202,6 +210,8 @@ algorithm:
202210 regularizer_coef: 0.1
203211 advantage_fn_args:
204212 std_normalize: false
213+ kl_loss_fn_args:
214+ kl_coef: 0.0
205215```
206216
207217### RED family
@@ -219,6 +229,8 @@ algorithm:
219229 advantage_fn_args:
220230 std_normalize: false
221231 drop: "balance"
232+ kl_loss_fn_args:
233+ kl_coef: 0.0
222234```
223235
224236
@@ -234,6 +246,8 @@ algorithm:
234246 temp: 1.0
235247 advantage_fn_args:
236248 std_normalize: false
249+ kl_loss_fn_args:
250+ kl_coef: 0.0
237251```
238252
239253## Citation
Original file line number Diff line number Diff line change @@ -18,6 +18,8 @@ algorithm:
1818 weight : " none"
1919 advantage_fn_args :
2020 std_normalize : false
21+ kl_loss_fn_args :
22+ kl_coef : 0.0
2123cluster :
2224 node_num : 1
2325 gpu_per_node : 8
You can’t perform that action at this time.
0 commit comments