Skip to content

Commit cba257c

Browse files
committed
Updated docs and readme for new defaults
1 parent ec5a8ec commit cba257c

File tree

2 files changed

+55
-59
lines changed

2 files changed

+55
-59
lines changed

README.md

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,33 @@ slots is a Python library designed to allow the user to explore and use simple m
1515

1616
slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
1717

18+
Using slots to determine the best of 3 variations on a live website.
1819
```Python
1920
import slots
2021

22+
mab = slots.MAB(3)
23+
```
24+
25+
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
26+
```Python
27+
mab.online_trial(bandit=2,payout=1)
28+
```
29+
30+
The response of `mab.online_trial()` is a dict of the form:
31+
```Python
32+
{'new_trial': boolean, 'choice': int, 'best': int}
33+
```
34+
Where:
35+
- If the criterion is met, `new_trial` = `False`.
36+
- `choice` is the current choice of arm to try.
37+
- `best` is the current best estimate of the highest payout arm.
38+
39+
40+
To test strategies on arms with pre-set probabilities:
41+
42+
```Python
2143
# Try 3 bandits with arbitrary win probabilities
22-
b = slots.MAB()
44+
b = slots.MAB(3, live=False)
2345
b.run()
2446
```
2547

@@ -36,28 +58,7 @@ b.bandits.probs
3658
> [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]
3759
```
3860

39-
For "real world" (online) usage, test results can be sequentially fed into an `MAB` object. The tests will continue until a stopping criterion is met.
40-
41-
Using slots to determine the best of 3 variations on a live website.
42-
```Python
43-
mab = slots.MAB(live=True, payouts=[]*3)
44-
```
45-
46-
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
47-
```Python
48-
mab.online_trial(bandit=2,payout=1)
49-
```
50-
51-
The response of mab.online_trial() is a dict of the form:
52-
```Python
53-
{'new_trial': boolean, 'choice': int, 'best': int}
54-
```
55-
Where:
56-
- If the criterion is met, `new_trial` = `False`.
57-
- `choice` is the current choice of arm to try.
58-
- `best` is the current best estimate of the highest payout arm.
59-
60-
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound, and Bayesian bandit strategies are also implemented.
61+
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.
6162

6263
#### Regret analysis
6364
A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
@@ -83,7 +84,7 @@ strategies = [{'strategy': 'eps_greedy', 'regret': [],
8384
]
8485

8586
for s in strategies:
86-
s['mab'] = slots.MAB(probs=probs)
87+
s['mab'] = slots.MAB(probs=probs, live=False)
8788

8889
# Run trials and calculate the regret after each trial
8990
for t in range(10000):

docs/slots-docs.md

Lines changed: 30 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -19,29 +19,43 @@ This documents details the current and planned API for slots. Non-implemented fe
1919
6. Use sane defaults.
2020
7. Be obvious and clean.
2121

22-
###Library API ideas:
23-
Creating a MAB test instance:
22+
### Library API ideas:
23+
#### Running slots with a live website
24+
```Python
25+
# Using slots to determine the best of 3 variations on a live website. 3 is the default.
26+
mab = slots.MAB(3)
27+
28+
# Make the first choice randomly, record responses, and input reward
29+
# 2 was chosen.
30+
# Run online trial (input most recent result) until test criteria is met.
31+
mab.online_trial(bandit=2,payout=1)
32+
33+
# Repsonse of mab.online_trial() is a dict of the form:
34+
{'new_trial': boolean, 'choice': int, 'best': int}
35+
36+
# Where:
37+
# If the criterion is met, new_trial = False.
38+
# choice is the current choice of arm to try.
39+
# best is the current best estimate of the highest payout arm.
40+
```
41+
42+
#### Creating a MAB test instance:
2443

2544
```Python
2645
# Default: 3 bandits with random p_i and pay_i = 1
27-
mab = slots.MAB()
46+
mab = slots.MAB(live=False)
2847

2948
# Set up 4 bandits with random p_i and pay_i
30-
mab = slots.MAB(4)
49+
mab = slots.MAB(4, live=False)
3150

3251
# 4 bandits with specified p_i
33-
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])
52+
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1], live=False)
3453

3554
# 3 bandits with specified pay_i
36-
mab = slots.MAB(payouts = [1,10,15])
37-
38-
# Bandits with payouts specified by arrays (i.e. payout data with unknown probabilities)
39-
# payouts is an N * T array, with N bandits and T trials
40-
# (Partially implemented)
41-
mab = slots.MAB(live = True, payouts = [[0,0,0,0,1.2,0,0],[0,0.1,0,0,0.1,0.1,0]]
55+
mab = slots.MAB(payouts = [1,10,15], live=False)
4256
```
4357

44-
Running tests with strategy, S
58+
#### Running tests with strategy, S
4559

4660
```Python
4761
# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 100
@@ -55,7 +69,7 @@ mab.run(strategy = 'eps_greedy',params = {'eps':0.2}, trials = 10000)
5569
mab.run(continue = True)
5670
```
5771

58-
Displaying / retrieving bandit properties
72+
#### Displaying / retrieving bandit properties
5973

6074
```Python
6175
# Default: display number of bandits, probabilities and payouts
@@ -75,7 +89,7 @@ mab.bandits.probs
7589
mab.bandits.count
7690
```
7791

78-
Setting bandit properties
92+
#### Setting bandit properties
7993

8094
```Python
8195
# Reset bandits to defaults
@@ -88,7 +102,7 @@ mab.bandits.probs_set([0.1,0.05,0.2,0.15])
88102
mab.bandits.payouts_set([1,1.5,0.5,0.8])
89103
```
90104

91-
Displaying / retrieving test info
105+
#### Displaying / retrieving test info
92106

93107
```Python
94108
# Retrieve current "best" bandit
@@ -121,29 +135,10 @@ mab.prob_est_sequence
121135
mab.strategy_info()
122136
```
123137

124-
###Proposed MAB strategies
138+
### Proposed MAB strategies
125139
- [x] Epsilon-greedy
126140
- [ ] Epsilon decreasing
127141
- [x] Softmax
128142
- [ ] Softmax decreasing
129143
- [x] Upper credible bound
130144
- [x] Bayesian bandits
131-
132-
###Example: Running slots with a live website
133-
```Python
134-
# Using slots to determine the best of 3 variations on a live website.
135-
mab = slots.MAB(live=True, payouts=[]*3)
136-
137-
# Make the first choice randomly, record responses, and input reward
138-
# 2 was chosen.
139-
# Run online trial (input most recent result) until test criteria is met.
140-
mab.online_trial(bandit=2,payout=1)
141-
142-
# Repsonse of mab.online_trial() is a dict of the form:
143-
{'new_trial': boolean, 'choice': int, 'best': int}
144-
145-
# Where:
146-
# If the criterion is met, new_trial = False.
147-
# choice is the current choice of arm to try.
148-
# best is the current best estimate of the highest payout arm.
149-
```

0 commit comments

Comments
 (0)