Skip to content

Commit 3b7e8dc

Browse files
committed
Updated README and docs
1 parent 7da8991 commit 3b7e8dc

File tree

3 files changed

+138
-19
lines changed

3 files changed

+138
-19
lines changed

README.md

Lines changed: 109 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,116 @@
1-
#slots
2-
###*A multi-armed bandit library for Python*
1+
# slots
2+
### *A multi-armed bandit library for Python*
33

44
Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python.
55

6-
See [slots-notes.md](https://github.com/roycoding/slots/blob/master/slots-notes.md) for design ideas.
7-
8-
####Author
6+
#### Author
97
[Roy Keyes](https://roycoding.github.io) -- roy.coding@gmail
108

11-
####License: BSD
9+
#### License: BSD
1210
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
1311

12+
13+
### Introduction
14+
slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.
15+
16+
slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
17+
18+
```Python
19+
import slots
20+
21+
# Try 3 bandits with arbitrary win probabilities
22+
b = slots.MAB()
23+
b.run()
24+
```
25+
26+
To inspect the results and compare the estimated win probabilities versus the true win probabilities:
27+
```Python
28+
b.best
29+
> 0
30+
31+
# Assuming payout of 1.0 for all "wins"
32+
b.est_payouts()
33+
> array([ 0.83888149, 0.78534031, 0.32786885])
34+
35+
b.bandits.probs
36+
> [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]
37+
```
38+
39+
For "real world" (online) usage, test results can be sequentially fed into an `MAB` object. The tests will continue until a stopping criterion is met.
40+
41+
Using slots to determine the best of 3 variations on a live website.
42+
```Python
43+
mab = slots.MAB(live=True, payouts=[]*3)
44+
```
45+
46+
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
47+
```Python
48+
mab.online_trial(bandit=2,payout=1)
49+
```
50+
51+
The response of mab.online_trial() is a dict of the form:
52+
```Python
53+
{'new_trial': boolean, 'choice': int, 'best': int}
54+
```
55+
Where:
56+
- If the criterion is met, `new_trial` = `False`.
57+
- `choice` is the current choice of arm to try.
58+
- `best` is the current best estimate of the highest payout arm.
59+
60+
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax and upper credibility bound strategies are also implemented.
61+
62+
#### Regret analysis
63+
A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
64+
65+
For example, the regret curves for several different MAB strategies can be generated as follows:
66+
```Python
67+
68+
import matplotlib.pyplot as plt
69+
import seaborn as sns
70+
import slots
71+
72+
# Test multiple strategies for the same bandit probabilities
73+
probs = [0.4, 0.9, 0.8]
74+
75+
ba = slots.MAB(probs=probs)
76+
bb = slots.MAB(probs=probs)
77+
bc = slots.MAB(probs=probs)
78+
79+
# Run trials and calculate the regret after each trial
80+
rega = []
81+
regb = []
82+
regc = []
83+
for t in range(10000):
84+
ba._run('eps_greedy')
85+
rega.append(ba.regret())
86+
bb._run('softmax')
87+
regb.append(bb.regret())
88+
bc._run('ucb')
89+
regc.append(bc.regret())
90+
91+
92+
# Pretty plotting
93+
sns.set_style('whitegrid')
94+
sns.set_context('poster')
95+
96+
plt.figure(figsize=(15,4))
97+
plt.plot(rega, label='$\epsilon$-greedy ($\epsilon$=0.1)')
98+
plt.plot(regb, label='Softmax ($T$=0.1)')
99+
plt.plot(regc, label='UCB')
100+
plt.legend()
101+
plt.xlabel('Trials')
102+
plt.ylabel('Regret')
103+
plt.title('Multi-armed bandit strategy performance (slots)')
104+
plt.ylim(0,0.2);
105+
```
106+
![](./misc/regret_plot.png)
107+
108+
### API documentation
109+
For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/slots-docs.md).
110+
111+
112+
### Todo list:
113+
- More MAB strategies
114+
- Bayesian bandits
115+
- Argument to save regret values after each trial in an array.
116+
- TESTS!

misc/regret_plot.png

41.4 KB
Loading

slots-notes.md renamed to slots-docs.md

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
#Multi-armed bandit library notes
1+
# slots
2+
## Multi-armed bandit library in Python
23

3-
### What does the library need to do?
4+
## Documentation
5+
This documents details the current and planned API for slots. Non-implemented features are noted as such.
6+
7+
### What does the library need to do? An aspirational list.
48
1. Set up N bandits with probabilities, p_i, and payouts, pay_i.
59
2. Implement several MAB strategies, with kwargs as parameters, and consistent API.
610
3. Allow for T trials.
@@ -10,7 +14,8 @@
1014
2. number of trials completed for each arm
1115
3. scores for each arm
1216
4. average payout per arm (payout*wins/trials?)
13-
5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t) See [ref](https://www.princeton.edu/~sbubeck/SurveyBCB12.pdf)
17+
5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
18+
- See [ref](http://research.microsoft.com/en-us/um/people/sebubeck/SurveyBCB12.pdf)
1419
6. Use sane defaults.
1520
7. Be obvious and clean.
1621

@@ -32,47 +37,53 @@ mab = slots.MAB(payouts = [1,10,15])
3237

3338
# Bandits with payouts specified by arrays (i.e. payout data with unknown probabilities)
3439
# payouts is an N * T array, with N bandits and T trials
40+
# (Partially implemented)
3541
mab = slots.MAB(live = True, payouts = [[0,0,0,0,1.2,0,0],[0,0.1,0,0,0.1,0.1,0]]
3642
```
3743

3844
Running tests with strategy, S
3945

4046
```Python
41-
# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 1000
47+
# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 100
4248
mab.run()
4349

44-
# Run chosen strategy with specified parameters and trials
45-
mab.eps_greedy(eps = 0.2, trials = 10000)
50+
# Run chosen strategy with specified parameters and number of trials
4651
mab.run(strategy = 'eps_greedy',params = {'eps':0.2}, trials = 10000)
4752

4853
# Run strategy, updating old trial data
54+
# (NOT YET IMPLEMENTED)
4955
mab.run(continue = True)
5056
```
5157

5258
Displaying / retrieving bandit properties
5359

5460
```Python
5561
# Default: display number of bandits, probabilities and payouts
62+
# (NOT YET IMPLEMENTED)
5663
mab.bandits.info()
5764

5865
# Display info for bandit i
66+
# (NOT YET IMPLEMENTED)
5967
mab.bandits[i]
6068

6169
# Retrieve bandits' payouts, probabilities, etc
6270
mab.bandits.payouts
6371
mab.bandits.probs
6472

6573
# Retrieve count of bandits
74+
# (NOT YET IMPLEMENTED)
6675
mab.bandits.count
6776
```
6877

6978
Setting bandit properties
7079

7180
```Python
7281
# Reset bandits to defaults
82+
# (NOT YET IMPLEMENTED)
7383
mab.bandits.reset()
7484

7585
# Set probabilities or payouts
86+
# (NOT YET IMPLEMENTED)
7687
mab.bandits.probs_set([0.1,0.05,0.2,0.15])
7788
mab.bandits.payouts_set([1,1.5,0.5,0.8])
7889
```
@@ -84,33 +95,38 @@ Displaying / retrieving test info
8495
mab.best()
8596

8697
# Retrieve bandit probability estimates
98+
# (NOT YET IMPLEMENTED)
8799
mab.prob_est()
88100

89101
# Retrieve bandit probability estimate of bandit i
102+
# (NOT YET IMPLEMENTED)
90103
mab.prob_est(i)
91104

92105
# Retrieve bandit payout estimates (p * payout)
93-
mab.payout_est()
106+
mab.est_payout()
94107

95108
# Retrieve current bandit choice
109+
# (NOT YET IMPLEMENTED, use mab.choices[-1])
96110
mab.current()
97111

98112
# Retrieve sequence of choices
99113
mab.choices
100114

101-
# Retrieve probabilty estimate history
115+
# Retrieve probability estimate history
116+
# (NOT YET IMPLEMENTED)
102117
mab.prob_est_sequence
103118

104119
# Retrieve test strategy info (current strategy) -- a dict
120+
# (NOT YET IMPLEMENTED)
105121
mab.strategy_info()
106122
```
107123

108124
###Proposed MAB strategies
109-
1. Epsilon-greedy
110-
2. Epsilon decreasing
111-
3. Softmax
112-
4. Softmax decreasing
113-
5. Upper credible bound
125+
- [x] Epsilon-greedy
126+
- [ ] Epsilon decreasing
127+
- [x] Softmax
128+
- [ ] Softmax decreasing
129+
- [x] Upper credible bound
114130

115131
###Example: Running slots with a live website
116132
```Python

0 commit comments

Comments
 (0)