Skip to content

Commit d03cdea

Browse files
author
Tong Li
committed
update reward fn
1 parent 678f5a9 commit d03cdea

File tree

1 file changed

+2
-2
lines changed
  • applications/ColossalChat/coati/distributed/reward

1 file changed

+2
-2
lines changed

applications/ColossalChat/coati/distributed/reward/reward_fn.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ def math_reward_fn(input_ids, gt_answer, response_idx, **kwargs):
1111
return reward
1212

1313
decoded_final_answer = tokenizer.decode(input_ids[s : e + 1], skip_special_tokens=True)
14-
gt_answer = tokenizer.decode(gt_answer.squeeze(0))
14+
gt_answer = tokenizer.decode(gt_answer.squeeze(0), skip_special_tokens=True)
1515
final_answer, processed_str = extract_solution(decoded_final_answer)
1616

1717
format_valid = validate_response_structure(processed_str, kwargs["tags"])
@@ -20,7 +20,7 @@ def math_reward_fn(input_ids, gt_answer, response_idx, **kwargs):
2020
else:
2121
reward += 1.0
2222
if gt_answer.strip().replace(" ", "").lower() == final_answer.strip().replace(" ", "").lower():
23-
reward = reward + 9.0
23+
reward = reward + 2.0
2424
return reward
2525

2626

0 commit comments

Comments
 (0)