You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
18
+
### License: MIT
15
19
20
+
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
16
21
17
22
### Introduction
23
+
18
24
slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.
19
25
20
26
slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
21
27
22
28
Using slots to determine the best of 3 variations on a live website.
29
+
23
30
```Python
24
31
import slots
25
32
26
33
mab = slots.MAB(3, live=True)
27
34
```
28
35
29
36
Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
37
+
30
38
```Python
31
39
mab.online_trial(bandit=2,payout=1)
32
40
```
33
41
34
42
The response of `mab.online_trial()` is a dict of the form:
By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.
66
79
67
80
#### Regret analysis
81
+
68
82
A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
69
83
70
84
For example, the regret curves for several different MAB strategies can be generated as follows:
71
-
```Python
72
85
86
+
```Python
73
87
import matplotlib.pyplot as plt
74
-
import seaborn as sns
75
88
import slots
76
89
77
90
# Test multiple strategies for the same bandit probabilities
0 commit comments