Skip to content

Commit f111d14

Browse files
committed
Add HMM evolver and update docs
1 parent 49e0809 commit f111d14

File tree

7 files changed

+311
-171
lines changed

7 files changed

+311
-171
lines changed

README.md

Lines changed: 64 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,59 @@
33
This repository contains reinforcement learning training code for the following
44
strategy types:
55
* Lookup tables (LookerUp)
6-
* Particle Swarm algorithms (PSOGambler)
6+
* Particle Swarm algorithms (PSOGambler), a stochastic version of LookerUp
77
* Feed Forward Neural Network (EvolvedANN)
88
* Finite State Machine (FSMPlayer)
9+
* Hidden Markov Models (HMMPLayer), essentially a stochastic version of a finite state machine
910

10-
The training is done by evolutionary algorithms or particle swarm algorithms. There
11-
is another repository that trains Neural Networks with gradient descent. In this
12-
repository there are scripts for each strategy type:
11+
Model training is by [evolutionary algorithms](https://en.wikipedia.org/wiki/Evolutionary_algorithm)
12+
or [particle swarm algorithms](https://en.wikipedia.org/wiki/Particle_swarm_optimization).
13+
There is another repository in the [Axerlod project](https://github.com/Axelrod-Python/Axelrod)
14+
that trains Neural Networks with gradient descent (using tensorflow)
15+
that will likely be incorporated here. In this repository there are scripts
16+
for each strategy type with a similar interface:
1317

1418
* [looker_evolve.py](looker_evolve.py)
1519
* [pso_evolve.py](pso_evolve.py)
1620
* [ann_evolve.py](ann_evolve.py)
1721
* [fsm_evolve.py](fsm_evolve.py)
22+
* [hmm_evolve.py](hmm_evolve.py)
23+
24+
See below for usage instructions.
1825

1926
In the original iteration the strategies were run against all the default
20-
strategies in the Axelrod library. This is slow and probably not necessary. For
21-
example the Meta players are just combinations of the other players, and very
22-
computationally intensive; it's probably ok to remove those. So by default the
23-
training strategies are the `short_run_time_strategies` from the Axelrod library.
27+
strategies in the Axelrod library. This is slow and probably not necessary.
28+
By default the training strategies are the `short_run_time_strategies` from the
29+
Axelrod library. You may specify any other set of strategies for training.
30+
31+
Basic testing is done by running the trained model against the full set of
32+
strategies in various tournaments. Depending on the optimization function
33+
testing methods will vary.
2434

2535
## The Strategies
2636

27-
The LookerUp strategies are based on lookup tables with three parameters:
28-
* n1, the number of rounds of trailing history to use and
37+
**LookerUp** is based on lookup tables with three parameters:
38+
* n1, the number of rounds of trailing history to use
2939
* n2, the number of rounds of trailing opponent history to use
3040
* m, the number of rounds of initial opponent play to use
3141

32-
PSOGambler is a stochastic version of LookerUp, trained with a particle swarm
33-
algorithm. The resulting strategies are generalizations of memory-N strategies.
42+
These are generalizations of deterministic memory-N strategies.
43+
44+
**PSOGambler** is a stochastic version of LookerUp, trained with a particle
45+
swarm algorithm. The resulting strategies are generalizations of memory-N
46+
strategies.
47+
48+
**EvolvedANN** is based on a [feed forward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network)
49+
with a single hidden layer. Various features are derived from the history of play.
50+
The number of nodes in the hidden layer can also be changed.
3451

35-
EvolvedANN is one hidden layer feed forward neural network based algorithm.
36-
Various features are derived from the history of play. The number of nodes in
37-
the hidden layer can be changed.
52+
**EvolvedFSM** searches over [finite state machines](https://en.wikipedia.org/wiki/Finite-state_machine)
53+
with a given number of states.
3854

39-
EvolvedFSM searches over finite state machines with a given number of states.
55+
**EvolvedHMM** implements a simple [hidden markov model](https://en.wikipedia.org/wiki/Hidden_Markov_model)
56+
based strategy, a stochastic finite state machine.
4057

41-
Note that large values of the parameters will make the strategies prone to
58+
Note that large values of some parameters will make the strategies prone to
4259
overfitting.
4360

4461
## Optimization Functions
@@ -183,6 +200,36 @@ Options:
183200
--states NUM_STATES Number of FSM states [default: 8]
184201
```
185202
203+
204+
### Hidden Markov Model
205+
206+
```bash
207+
$ python hmm_evolve.py -h
208+
Hidden Markov Model Evolver
209+
210+
Usage:
211+
fsm_evolve.py [-h] [--generations GENERATIONS] [--population POPULATION]
212+
[--mu MUTATION_RATE] [--bottleneck BOTTLENECK] [--processes PROCESSORS]
213+
[--output OUTPUT_FILE] [--objective OBJECTIVE] [--repetitions REPETITIONS]
214+
[--turns TURNS] [--noise NOISE] [--nmoran NMORAN]
215+
[--states NUM_STATES]
216+
217+
Options:
218+
-h --help Show help
219+
--generations GENERATIONS Generations to run the EA [default: 500]
220+
--population POPULATION Population size [default: 40]
221+
--mu MUTATION_RATE Mutation rate [default: 0.1]
222+
--bottleneck BOTTLENECK Number of individuals to keep from each generation [default: 10]
223+
--processes PROCESSES Number of processes to use [default: 1]
224+
--output OUTPUT_FILE File to write data to [default: fsm_tables.csv]
225+
--objective OBJECTIVE Objective function [default: score]
226+
--repetitions REPETITIONS Repetitions in objective [default: 100]
227+
--turns TURNS Turns in each match [default: 200]
228+
--noise NOISE Match noise [default: 0.00]
229+
--nmoran NMORAN Moran Population Size, if Moran objective [default: 4]
230+
--states NUM_STATES Number of FSM states [default: 5]
231+
```
232+
186233
## Open questions
187234
188235
* What's the best table for n1, n2, m for LookerUp and PSOGambler? What's the

ann_evolve.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
--nmoran NMORAN Moran Population Size, if Moran objective [default: 4]
2828
--features FEATURES Number of ANN features [default: 17]
2929
--hidden HIDDEN Number of hidden nodes [default: 10]
30-
--mu_distance DISTANCE Delta max for weights updates [default: 5]
30+
--mu_distance DISTANCE Delta max for weights updates [default: 10]
3131
"""
3232

3333
import random

evolve_utils.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,13 @@ def objective_score_diff(me, other, turns, noise, repetitions):
7474
scores_for_this_opponent.append(score_diff)
7575
return scores_for_this_opponent
7676

77-
def objective_moran_win(me, other, turns, noise, repetitions):
77+
def objective_moran_win(me, other, turns, noise, repetitions, N=5):
7878
"""Objective function to maximize Moran fixations over N=4 matches"""
7979
assert(noise == 0)
80-
# N = 4 population
81-
population = (me, me.clone(), other, other.clone())
80+
population = []
81+
for _ in range(N):
82+
population.append(me.clone())
83+
population.append(other.clone())
8284
mp = axl.MoranProcess(population, turns=turns, noise=noise)
8385

8486
scores_for_this_opponent = []
@@ -117,7 +119,7 @@ def player(self):
117119
def params(self):
118120
pass
119121

120-
def crossover(self):
122+
def crossover(self, other):
121123
pass
122124

123125

@@ -196,7 +198,7 @@ def evolve(self):
196198
repr(self.population[results[0][1]]))
197199
# Write the data
198200
row = [self.generation, mean(scores), pstdev(scores), results[0][0],
199-
repr(results[0][1])]
201+
repr(self.population[results[0][1]])]
200202
self.outputer.write(row)
201203

202204
## Next Population

fsm_evolve.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ class FSMParams(Params):
4848
def __init__(self, num_states, mutation_rate=None, rows=None,
4949
initial_state=0, initial_action=C):
5050
self.PlayerClass = FSMPlayer
51-
# Initialize to "zero" state?
5251
self.num_states = num_states
5352
if mutation_rate is None:
5453
self.mutation_rate = 1 / (2 * num_states)
@@ -117,7 +116,6 @@ def mutate(self):
117116
self.initial_action = flip_action(self.initial_action)
118117
if random.random() < self.mutation_rate / (10 * self.num_states):
119118
self.initial_state = randrange(self.num_states)
120-
# return self
121119
# Change node size?
122120

123121
@staticmethod

0 commit comments

Comments
 (0)