Skip to content
This repository was archived by the owner on Nov 8, 2024. It is now read-only.

Commit 2c51eea

Browse files
authored
fix: Use action key to select best scoring action (#62)
1 parent 70595d1 commit 2c51eea

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

eppo_client/bandit.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,12 @@ def weigh_actions(
160160
self, action_scores, gamma, probability_floor
161161
) -> Dict[str, float]:
162162
number_of_actions = len(action_scores)
163-
best_action = max(action_scores, key=action_scores.get)
164-
best_score = action_scores[best_action]
163+
# Find the max score
164+
best_score = max(action_scores.values())
165+
# Get all the keys that have the same best score (if there's more than one)
166+
best_action_keys = [k for k, v in action_scores.items() if v == best_score]
167+
# Get the lowest lexicographically ordered key.
168+
best_action = min(best_action_keys)
165169

166170
# adjust probability floor for number of actions to control the sum
167171
min_probability = probability_floor / number_of_actions

0 commit comments

Comments
 (0)