Skip to content

Commit 310643a

Browse files
committed
Doc string, docs, and minor refactor for Bayesian bandits.
1 parent 0f69159 commit 310643a

File tree

3 files changed

+22
-8
lines changed

3 files changed

+22
-8
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ for t in range(10000):
8989
regb.append(bb.regret())
9090
bc._run('ucb')
9191
regc.append(bc.regret())
92-
bd._run('bayesian_bandit')
92+
bd._run('bayesian')
9393
regd.append(bd.regret())
9494

9595

@@ -116,6 +116,5 @@ For documentation on the slots API, see [slots-docs.md](https://github.com/royco
116116

117117
### Todo list:
118118
- More MAB strategies
119-
- Bayesian bandits
120119
- Argument to save regret values after each trial in an array.
121120
- TESTS!

docs/slots-docs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ mab.strategy_info()
127127
- [x] Softmax
128128
- [ ] Softmax decreasing
129129
- [x] Upper credible bound
130+
- [x] Bayesian bandits
130131

131132
###Example: Running slots with a live website
132133
```Python

slots/slots.py

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def __init__(self, num_bandits=3, probs=None, payouts=None, live=False,
7575
self.stop_value = stop_criterion.get('value', 0.1)
7676

7777
# Bandit selection strategies
78-
self.strategies = ['eps_greedy', 'softmax', 'ucb', 'bayesian_bandit']
78+
self.strategies = ['eps_greedy', 'softmax', 'ucb', 'bayesian']
7979

8080
def run(self, trials=100, strategy=None, parameters=None):
8181
'''
@@ -169,13 +169,27 @@ def max_mean(self):
169169

170170
return np.argmax(self.wins / (self.pulls + 0.1))
171171

172-
def bayesian_bandit(self, params):
172+
def bayesian(self, params=None):
173173
'''
174-
Run the Bayesian Bandit algorithm which utilizes a beta distribution for exploration and exploitation.
175-
:param params:
176-
:return:
174+
Run the Bayesian Bandit algorithm which utilizes a beta distribution
175+
for exploration and exploitation.
176+
177+
Parameters
178+
----------
179+
params : None
180+
For API consistency, this function can take a parameters argument,
181+
but it is ignored.
182+
183+
Returns
184+
-------
185+
int
186+
Index of chosen bandit
177187
'''
178-
p_success_arms = [np.random.beta(self.wins[i] + 1, self.pulls[i] - self.wins[i] + 1) for i in range(len(self.wins))]
188+
p_success_arms = [
189+
np.random.beta(self.wins[i] + 1, self.pulls[i] - self.wins[i] + 1)
190+
for i in range(len(self.wins))
191+
]
192+
179193
return np.array(p_success_arms).argmax()
180194

181195
def eps_greedy(self, params):

0 commit comments

Comments
 (0)