Python implementation of various Multi-armed bandit algorithms like Upper-confidence bound algorithm, Epsilon-greedy algorithm and Exp3 algorithm
- Implemented all algorithms for 2-armed bandit.
- Each algorithm has time horizon T as 10000.
- Each experiment is repeated for 100 times to get mean results.
- Ploted the cummulative regret at time t against the rounds t = 1,...,T.
- Ploted the percentage of times optimal arm played against the rounds t = 1,...,T.
- Final plots are given in
Figures/folder.
- All algorithms file is given in
Code/folder. - Input of each algorithm is mean of first arm and mean of second arm.
- Here please note that for simplicity, I assumed that mean of first arm is greater than mean of second arm.
- To check effect of epsilon on Epsilon-greedy algorithm, I have run the epsilon-greedy algorithm for epsilon = 0.01, 0.1.
- Figures of following problem is given in
Figures/folder.
| Problem | Arm 1 | Arm 2 |
|---|---|---|
| P1 | 0.9 | 0.6 |
| P2 | 0.9 | 0.8 |
| P3 | 0.55 | 0.45 |