You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated Readme and added dev dependencies and configs.
- Removed Seaborn requirement from example code in Readme
- Added `dev-requirements.txt` for pip installation of dev tools
- Added references to dev tool config files
- Linted Readme
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
18
+
### License: MIT
20
19
20
+
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
21
21
22
22
### Introduction
23
+
23
24
slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.
24
25
25
26
slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
26
27
27
28
Using slots to determine the best of 3 variations on a live website.
29
+
28
30
```Python
29
31
import slots
30
32
31
33
mab = slots.MAB(3, live=True)
32
34
```
33
35
34
36
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
37
+
35
38
```Python
36
39
mab.online_trial(bandit=2,payout=1)
37
40
```
38
41
39
42
The response of `mab.online_trial()` is a dict of the form:
-`best` is the current best estimate of the highest payout arm.
47
53
48
-
49
54
To test strategies on arms with pre-set probabilities:
50
55
51
56
```Python
@@ -55,6 +60,7 @@ b.run()
55
60
```
56
61
57
62
To inspect the results and compare the estimated win probabilities versus the true win probabilities:
63
+
58
64
```Python
59
65
# Current best guess
60
66
b.best()
@@ -72,13 +78,13 @@ b.bandits.probs
72
78
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.
73
79
74
80
#### Regret analysis
81
+
75
82
A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
76
83
77
84
For example, the regret curves for several different MAB strategies can be generated as follows:
78
-
```Python
79
85
86
+
```Python
80
87
import matplotlib.pyplot as plt
81
-
import seaborn as sns
82
88
import slots
83
89
84
90
# Test multiple strategies for the same bandit probabilities
0 commit comments