Skip to content

Commit 38f7681

Browse files
authored
Merge pull request #22 from roycoding/0.4.0
0.4.0
2 parents 16cfb43 + 6f99968 commit 38f7681

File tree

8 files changed

+235
-137
lines changed

8 files changed

+235
-137
lines changed

README.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,56 @@
11
# slots
2-
### *A multi-armed bandit library for Python*
2+
3+
## *A multi-armed bandit library for Python*
34

45
Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python.
56

67
[![PyPI](https://img.shields.io/pypi/v/slots)](https://pypi.org/project/slots/)
7-
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
8+
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/slots)](https://pypi.org/project/slots/)
89
[![Downloads](https://pepy.tech/badge/slots)](https://pepy.tech/project/slots)
910

10-
#### Author
11+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
12+
[![type hints with mypy](https://img.shields.io/badge/type%20hints-mypy-brightgreen)](http://mypy-lang.org/)
13+
14+
### Author
15+
1116
[Roy Keyes](https://roycoding.github.io) -- roy.coding@gmail
1217

13-
#### License: MIT
14-
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
18+
### License: MIT
1519

20+
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
1621

1722
### Introduction
23+
1824
slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.
1925

2026
slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
2127

2228
Using slots to determine the best of 3 variations on a live website.
29+
2330
```Python
2431
import slots
2532

2633
mab = slots.MAB(3, live=True)
2734
```
2835

2936
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
37+
3038
```Python
3139
mab.online_trial(bandit=2,payout=1)
3240
```
3341

3442
The response of `mab.online_trial()` is a dict of the form:
43+
3544
```Python
3645
{'new_trial': boolean, 'choice': int, 'best': int}
3746
```
47+
3848
Where:
49+
3950
- If the criterion is met, `new_trial` = `False`.
4051
- `choice` is the current choice of arm to try.
4152
- `best` is the current best estimate of the highest payout arm.
4253

43-
4454
To test strategies on arms with pre-set probabilities:
4555

4656
```Python
@@ -50,28 +60,31 @@ b.run()
5060
```
5161

5262
To inspect the results and compare the estimated win probabilities versus the true win probabilities:
63+
5364
```Python
65+
# Current best guess
5466
b.best()
5567
> 0
5668

57-
# Assuming payout of 1.0 for all "wins"
58-
b.est_payouts()
69+
# Estimate of the payout probabilities
70+
b.est_probs()
5971
> array([ 0.83888149, 0.78534031, 0.32786885])
6072

73+
# Ground truth payout probabilities (if known)
6174
b.bandits.probs
6275
> [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]
6376
```
6477

6578
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.
6679

6780
#### Regret analysis
81+
6882
A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
6983

7084
For example, the regret curves for several different MAB strategies can be generated as follows:
71-
```Python
7285

86+
```Python
7387
import matplotlib.pyplot as plt
74-
import seaborn as sns
7588
import slots
7689

7790
# Test multiple strategies for the same bandit probabilities
@@ -97,8 +110,7 @@ for t in range(10000):
97110
s['regret'].append(s['mab'].regret())
98111

99112
# Pretty plotting
100-
sns.set_style('whitegrid')
101-
sns.set_context('poster')
113+
plt.style.use(['seaborn-poster','seaborn-whitegrid'])
102114

103115
plt.figure(figsize=(15,4))
104116

@@ -111,22 +123,29 @@ plt.ylabel('Regret')
111123
plt.title('Multi-armed bandit strategy performance (slots)')
112124
plt.ylim(0,0.2);
113125
```
114-
![](./misc/regret_plot.png)
126+
127+
![Regret plot](./misc/regret_plot.png)
115128

116129
### API documentation
117-
For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).
118130

131+
For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).
119132

120133
### Todo list:
134+
121135
- More MAB strategies
122136
- Argument to save regret values after each trial in an array.
123137
- TESTS!
124138

125139
### Contributing
126140

127-
I welcome contributions, though the pace of development is highly variable. Please file issues and sumbit pull requests as makes sense.
141+
I welcome contributions, though the pace of development is highly variable. Please file issues and submit pull requests as makes sense.
128142

129143
The current development environment uses:
130144

131145
- pytest >= 5.3 (5.3.2)
132146
- black >= 19.1 (19.10b0)
147+
- mypy = 0.761
148+
149+
You can pip install these easily by including `dev-requirements.txt`.
150+
151+
For mypy config, see `mypy.ini`. For black config, see `pyproject.toml`.

dev-requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
mypy>=0.761
2+
black>=19.10b0
3+
pytest>=5.3.2

docs/slots-docs.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,46 +13,49 @@ This documents details the current and planned API for slots. Non-implemented fe
1313
1. Current choice
1414
2. number of trials completed for each arm
1515
3. scores for each arm
16-
4. average payout per arm (payout*wins/trials?)
16+
4. average payout per arm (wins/trials?)
1717
5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
1818
- See [ref](http://research.microsoft.com/en-us/um/people/sebubeck/SurveyBCB12.pdf)
1919
6. Use sane defaults.
2020
7. Be obvious and clean.
21+
8. For the time being handle only binary payouts.
2122

2223
### Library API ideas:
2324
#### Running slots with a live website
2425
```Python
25-
# Using slots to determine the best of 3 variations on a live website. 3 is the default.
26+
# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
2627
mab = slots.MAB(3, live=True)
2728

2829
# Make the first choice randomly, record responses, and input reward
2930
# 2 was chosen.
30-
# Run online trial (input most recent result) until test criteria is met.
31+
# Update online trial (input most recent result) until test criteria is met.
3132
mab.online_trial(bandit=2,payout=1)
3233

3334
# Repsonse of mab.online_trial() is a dict of the form:
3435
{'new_trial': boolean, 'choice': int, 'best': int}
3536

3637
# Where:
3738
# If the criterion is met, new_trial = False.
38-
# choice is the current choice of arm to try.
39+
# choice is the current choice of arm to try next.
3940
# best is the current best estimate of the highest payout arm.
4041
```
4142

4243
#### Creating a MAB test instance:
4344

4445
```Python
45-
# Default: 3 bandits with random p_i and pay_i = 1
46-
mab = slots.MAB(live=False)
46+
# Default: 3 bandits with random probabilities, p_i.
47+
mab = slots.MAB()
4748

48-
# Set up 4 bandits with random p_i and pay_i
49-
mab = slots.MAB(4, live=False)
49+
# Set up 4 bandits with random p_i.
50+
mab = slots.MAB(4)
5051

5152
# 4 bandits with specified p_i
52-
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1], live=False)
53+
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])
5354

54-
# 3 bandits with specified pay_i
55-
mab = slots.MAB(payouts = [1,10,15], live=False)
55+
# Creating 3 bandits with histoprical payout data
56+
mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
57+
[1,0,0,...],
58+
[0,0,0,...]]))
5659
```
5760

5861
#### Running tests with strategy, S
@@ -98,8 +101,8 @@ mab.bandits.reset()
98101

99102
# Set probabilities or payouts
100103
# (NOT YET IMPLEMENTED)
101-
mab.bandits.probs_set([0.1,0.05,0.2,0.15])
102-
mab.bandits.payouts_set([1,1.5,0.5,0.8])
104+
mab.bandits.set_probs([0.1,0.05,0.2,0.15])
105+
mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])
103106
```
104107

105108
#### Displaying / retrieving test info
@@ -114,10 +117,10 @@ mab.prob_est()
114117

115118
# Retrieve bandit probability estimate of bandit i
116119
# (NOT YET IMPLEMENTED)
117-
mab.prob_est(i)
120+
mab.est_prob(i)
118121

119-
# Retrieve bandit payout estimates (p * payout)
120-
mab.est_payout()
122+
# Retrieve bandit probability estimates
123+
mab.est_probs()
121124

122125
# Retrieve current bandit choice
123126
# (NOT YET IMPLEMENTED, use mab.choices[-1])

mypy.ini

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[mypy]
2+
disallow_untyped_calls = True
3+
disallow_untyped_defs = True
4+
5+
[mypy-numpy]
6+
ignore_missing_imports = True

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[tool.black]
2+
line-length = 79

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[bdist_wheel]
22
# This flag says that the code is written to work on both Python 2 and Python
33
# 3.
4-
universal=1
4+

setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
setup(
1717
name='slots',
1818

19-
version='0.3.1',
19+
version='0.4.0',
2020

2121
description='A multi-armed bandit library for Python',
2222
long_description=long_description,
@@ -50,9 +50,9 @@
5050

5151
# Specify the Python versions you support here. In particular, ensure
5252
# that you indicate whether you support Python 2, Python 3 or both.
53-
'Programming Language :: Python :: 2.7',
54-
'Programming Language :: Python :: 3.4',
5553
'Programming Language :: Python :: 3.5',
54+
'Programming Language :: Python :: 3.6',
55+
'Programming Language :: Python :: 3.7',
5656
],
5757

5858
# What does your project relate to?

0 commit comments

Comments
 (0)