Skip to content

Conversation

@zpqiu
Copy link

@zpqiu zpqiu commented Jan 20, 2026

When RM (e.g., dapo) returns a dict instead of a scalar value, raise a clear error if neither reward_key nor eval_reward_key is set.

This prevents confusing errors like:

  TypeError: unsupported operand type(s) for +: 'int' and 'dict'

The error message now includes available keys and suggests the fix:

  Please specify --reward_key or --eval_reward_key (e.g., --reward_key score)

zpqiu and others added 2 commits January 20, 2026 00:45
When RM (e.g., dapo) returns a dict instead of a scalar value,
raise a clear error if neither reward_key nor eval_reward_key is set.

This prevents confusing errors like:
  TypeError: unsupported operand type(s) for +: 'int' and 'dict'

The error message now includes available keys and suggests the fix:
  Please specify --reward_key or --eval_reward_key (e.g., --reward_key score)
@zpqiu
Copy link
Author

zpqiu commented Jan 20, 2026

@zhuzilin could you take a look, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant