Skip to content

Commit 63aa9a0

Browse files
committed
Finite State Machine training algorithms
1 parent f0fd3c2 commit 63aa9a0

File tree

7 files changed

+366
-83
lines changed

7 files changed

+366
-83
lines changed

README.md

Lines changed: 135 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,161 @@
11
# Axelrod Evolvers
22

3-
This repository contains training code for the strategies LookerUp, PSOGambler, and EvolvedANN (feed-forward neural network).
4-
There are three scripts, one for each strategy:
5-
* looker_evolve.py
6-
* pso_evolve.py
7-
* ann_evolve.py
8-
9-
In the original iteration the strategies were run against all the default strategies in the Axelrod library. This is slow and probably not necessary. For example the Meta players are just combinations of the other players, and very computationally intensive; it's probably ok to remove those.
3+
This repository contains reinforcement learning training code for the following
4+
strategy types:
5+
* Lookup tables (LookerUp)
6+
* Particle Swarm algorithms (PSOGambler)
7+
* Feed Forward Neural Network (EvolvedANN)
8+
* Finite State Machine (FSMPlayer)
9+
10+
The training is done by evolutionary algorithms or particle swarm algorithms. There
11+
is another repository that trains Neural Networks with gradient descent. In this
12+
repository there are scripts for each strategy type:
13+
14+
* [looker_evolve.py](looker_evolve.py)
15+
* [pso_evolve.py](pso_evolve.py)
16+
* [ann_evolve.py](ann_evolve.py)
17+
* [fsm_evolve.py](fsm_evolve.py)
18+
19+
In the original iteration the strategies were run against all the default
20+
strategies in the Axelrod library. This is slow and probably not necessary. For
21+
example the Meta players are just combinations of the other players, and very
22+
computationally intensive; it's probably ok to remove those. So by default the
23+
training strategies are the `short_run_time_strategies` from the Axelrod library.
1024

1125
## The Strategies
1226

13-
The LookerUp strategies are based on lookup tables with two parameters:
14-
* n, the number of rounds of trailing history to use and
27+
The LookerUp strategies are based on lookup tables with three parameters:
28+
* n1, the number of rounds of trailing history to use and
29+
* n2, the number of rounds of trailing opponent history to use
1530
* m, the number of rounds of initial opponent play to use
1631

17-
PSOGambler is a stochastic version of LookerUp, trained with a particle swarm algorithm.
32+
PSOGambler is a stochastic version of LookerUp, trained with a particle swarm
33+
algorithm. The resulting strategies are generalizations of memory-N strategies.
1834

1935
EvolvedANN is one hidden layer feed forward neural network based algorithm.
36+
Various features are derived from the history of play. The number of nodes in
37+
the hidden layer can be changed.
2038

21-
All three strategies are trained with an evolutionary algorithm and are examples of reinforcement learning.
39+
EvolvedFSM searches over finite state machines with a given number of states.
2240

23-
### Open questions
41+
Note that large values of the parameters will make the strategies prone to
42+
overfitting.
2443

25-
* What's the best table for n, m for LookerUp and PSOGambler?
26-
* What's the best table against parameterized strategies? For example, if the opponents are `[RandomPlayer(x) for x in np.arange(0, 1, 0.01)], what lookup table is best? Is it much different from the generic table?
27-
* Can we separate n into n1 and n2 where different amounts of history are used for the player and the opponent?
28-
* Are there other features that would improve the performance of EvolvedANN?
44+
## Optimization Functions
2945

46+
There are three objective functions:
47+
* Maximize mean match score over all opponents with `objective_match_score`
48+
* Maximize mean match score difference over all opponents with `objective_match_score_difference`
49+
* Maximize Moran process fixation probability with `objective_match_moran_win`
3050

3151
## Running
3252

33-
`python lookup-evolve.py -h`
34-
35-
will display help. There are a number of options and you'll want to set the mutation rate appropriately. The number of keys defining the strategy is `2**{n + m + 1}` so you want a mutation rate in the neighborhood of `2**(-n-m)` so that there's enough variation introduced.
36-
37-
38-
Here are some recommended defaults:
53+
### Look up Tables
54+
55+
```bash
56+
$ python lookup_evolve.py -h
57+
Lookup Evolve.
58+
59+
Usage:
60+
lookup_evolve.py [-h] [-p PLAYS] [-o OPP_PLAYS] [-s STARTING_PLAYS]
61+
[-g GENERATIONS] [-k STARTING_POPULATION] [-u MUTATION_RATE] [-b BOTTLENECK]
62+
[-i PROCESSORS] [-f OUTPUT_FILE] [-z INITIAL_POPULATION_FILE] [-n NOISE]
63+
64+
Options:
65+
-h --help show this
66+
-p PLAYS number of recent plays in the lookup table [default: 2]
67+
-o OPP_PLAYS number of recent plays in the lookup table [default: 2]
68+
-s STARTING_PLAYS number of opponent starting plays in the lookup table [default: 2]
69+
-g GENERATIONS how many generations to run the program for [default: 500]
70+
-k STARTING_POPULATION starting population size for the simulation [default: 20]
71+
-u MUTATION_RATE mutation rate i.e. probability that a given value will flip [default: 0.1]
72+
-b BOTTLENECK number of individuals to keep from each generation [default: 10]
73+
-i PROCESSORS number of processors to use [default: 1]
74+
-f OUTPUT_FILE file to write data to [default: tables.csv]
75+
-z INITIAL_POPULATION_FILE file to read an initial population from [default: None]
76+
-n NOISE match noise [default: 0.00]
3977
```
40-
python lookup_evolve.py -p 3 -s 3 -g 100000 -k 20 -u 0.01 -b 20 -i 4 -o evolve3-3.csv
4178

42-
python lookup_evolve.py -p 3 -s 2 -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve3-2.csv
79+
There are a number of options and you'll want to set the
80+
mutation rate appropriately. The number of keys defining the strategy is
81+
`2**{n + m + 1}` so you want a mutation rate in the neighborhood of `2**(-n-m)`
82+
so that there's enough variation introduced.
4383

44-
python lookup_evolve.py -p 3 -s 1 -g 100000 -k 20 -u 0.06 -b 20 -i 4 -o evolve3-1.csv
84+
### Particle Swarm
4585

46-
python lookup_evolve.py -p 1 -s 3 -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve1-3.csv
86+
```bash
87+
$ python pso_evolve.py -h
88+
Particle Swarm strategy training code.
4789

48-
python lookup_evolve.py -p 2 -s 3 -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve2-3.csv
49-
```
50-
### 2, 2 is the current winner:
51-
```
52-
python lookup_evolve.py -p 2 -s 2 -g 100000 -k 20 -u 0.06 -b 20 -i 4 -o evolve2-2.csv
53-
54-
python lookup_evolve.py -p 1 -s 2 -g 100000 -k 20 -u 0.1 -b 20 -i 2 -o evolve1-2.csv
55-
56-
python lookup_evolve.py -p 1 -s 2 -g 100000 -k 20 -u 0.1 -b 20 -i 2 -o evolve2-1.csv
90+
Usage:
91+
pso_evolve.py [-h] [-p PLAYS] [-s STARTING_PLAYS] [-g GENERATIONS]
92+
[-i PROCESSORS] [-o OPP_PLAYS] [-n NOISE]
5793

94+
Options:
95+
-h --help show help
96+
-p PLAYS number of recent plays in the lookup table [default: 2]
97+
-o OPP_PLAYS number of recent opponent's plays in the lookup table [default: 2]
98+
-s STARTING_PLAYS number of opponent starting plays in the lookup table [default: 2]
99+
-i PROCESSORS number of processors to use [default: 1]
100+
-n NOISE match noise [default: 0.0]
58101
```
59-
### 4, 4 (might take for ever / need a ton of ram)
102+
103+
Note that to use the multiprocessor version you'll need to install pyswarm 0.70
104+
directly (pip installs 0.60 which lacks mutiprocessing support).
105+
106+
### Neural Network
107+
108+
```bash
109+
$ python ann_evolve.py -h
110+
Training ANN strategies with an evolutionary algorithm.
111+
112+
Usage:
113+
ann_evolve.py [-h] [-g GENERATIONS] [-u MUTATION_RATE] [-b BOTTLENECK]
114+
[-d MUTATION_DISTANCE] [-i PROCESSORS] [-o OUTPUT_FILE]
115+
[-k STARTING_POPULATION] [-n NOISE]
116+
117+
Options:
118+
-h --help show this
119+
-g GENERATIONS how many generations to run the program for [default: 10000]
120+
-u MUTATION_RATE mutation rate i.e. probability that a given value will flip [default: 0.4]
121+
-d MUTATION_DISTANCE amount of change a mutation will cause [default: 10]
122+
-b BOTTLENECK number of individuals to keep from each generation [default: 6]
123+
-i PROCESSORS number of processors to use [default: 4]
124+
-o OUTPUT_FILE file to write statistics to [default: weights.csv]
125+
-k STARTING_POPULATION starting population size for the simulation [default: 5]
126+
-n NOISE match noise [default: 0.0]
60127
```
61-
python lookup_evolve.py -p 4 -s 4 -g 100000 -k 20 -u 0.002 -b 20 -i 4 -o evolve4-4.csv
128+
129+
### Finite State Machines
130+
131+
```bash
132+
$ python fsm_evolve.py -h
133+
FSM Evolve.
134+
135+
Usage:
136+
fsm_evolve.py [-h] [-s NUM_STATES] [-g GENERATIONS]
137+
[-k STARTING_POPULATION] [-u MUTATION_RATE] [-b BOTTLENECK]
138+
[-i PROCESSORS] [-f OUTPUT_FILE] [-n NOISE]
139+
140+
Options:
141+
-h --help show this
142+
-s NUM_STATES number FSM states [default: 16]
143+
-g GENERATIONS how many generations to run the program for [default: 500]
144+
-k STARTING_POPULATION starting population size for the simulation [default: 20]
145+
-u MUTATION_RATE mutation rate i.e. probability that a given value will flip [default: 0.1]
146+
-b BOTTLENECK number of individuals to keep from each generation [default: 10]
147+
-i PROCESSORS number of processors to use [default: 1]
148+
-f OUTPUT_FILE file to write data to [default: fsm_tables.csv]
149+
-n NOISE match noise [default: 0.00]
62150
```
63-
## Analyzing
64151

65-
The output files `evolve{n}-{m}.csv` can be easily sorted by `analyze_data.py`, which will output the best performing tables. These can be added back into Axelrod.
152+
## Open questions
153+
154+
* What's the best table for n1, n2, m for LookerUp and PSOGambler? What's the
155+
smallest value of the parameters that gives good results?
156+
* Similarly what's the optimal number of states for a finite state machine
157+
strategy?
158+
* What's the best table against parameterized strategies? For example, if the
159+
opponents are `[RandomPlayer(x) for x in np.arange(0, 1, 0.01)], what lookup
160+
table is best? Is it much different from the generic table?
161+
* Are there other features that would improve the performance of EvolvedANN?

ann_evolve.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,17 @@
1111
1212
Options:
1313
-h --help show this
14-
-g GENERATIONS how many generations to run the program for [default: 10000]
14+
-g GENERATIONS how many generations to run the program for [default: 1000]
1515
-u MUTATION_RATE mutation rate i.e. probability that a given value will flip [default: 0.4]
16-
-d MUTATION_DISTANCE amount of change a mutation will cause [default: 10]
17-
-b BOTTLENECK number of individuals to keep from each generation [default: 6]
16+
-d MUTATION_DISTANCE amount of change a mutation will cause [default: 5]
17+
-b BOTTLENECK number of individuals to keep from each generation [default: 5]
1818
-i PROCESSORS number of processors to use [default: 4]
1919
-o OUTPUT_FILE file to write statistics to [default: weights.csv]
20-
-k STARTING_POPULATION starting population size for the simulation [default: 5]
20+
-k STARTING_POPULATION starting population size for the simulation [default: 10]
2121
-n NOISE match noise [default: 0.0]
2222
"""
2323

2424
import csv
25-
from copy import deepcopy
2625
from itertools import repeat
2726
from multiprocessing import Pool
2827
import os
@@ -32,6 +31,7 @@
3231
from docopt import docopt
3332
import numpy as np
3433

34+
import axelrod as axl
3535
from axelrod.strategies.ann import ANN, split_weights
3636
from axelrod_utils import score_for, objective_match_score, objective_match_moran_win
3737

@@ -62,7 +62,7 @@ def crossover(weights_collection):
6262
if i == j:
6363
continue
6464
crosspoint = random.randrange(len(w1))
65-
new_weights = deepcopy(w1[0:crosspoint]) + deepcopy(w2[crosspoint:])
65+
new_weights = list(w1[0:crosspoint]) + list(w2[crosspoint:])
6666
copies.append(new_weights)
6767
return copies
6868

@@ -86,22 +86,24 @@ def evolve(starting_weights, mutation_rate, mutation_distance, generations,
8686

8787
for generation in range(generations):
8888
print("Generation " + str(generation))
89+
size = 19 * hidden_layer_size
90+
random_weights = [get_random_weights(size) for _ in range(4)]
91+
weights_to_copy = [list(x[1]) for x in current_bests]
92+
weights_to_copy += random_weights
8993

90-
weights_to_copy = [x[1] for x in current_bests] + \
91-
[get_random_weights(19 * hidden_layer_size) for _ in
92-
range(2)]
9394
# Crossover
9495
copies = crossover(weights_to_copy)
9596
# Mutate
9697
copies = mutate(copies, mutation_rate)
9798

98-
population = copies + weights_to_copy
99+
population = copies + [list(x[1]) for x in current_bests] + random_weights
99100

100101
# map the population to get a list of (score, weights) tuples
101102
# this list will be sorted by score, best weights first
102103
results = score_all_weights(population, strategies, noise=noise,
103104
hidden_layer_size=hidden_layer_size)
104105

106+
results.sort(key=itemgetter(0), reverse=True)
105107
current_bests = results[0: bottleneck]
106108

107109
# get all the scores for this generation
@@ -137,8 +139,7 @@ def evolve(starting_weights, mutation_rate, mutation_distance, generations,
137139
size = 19 * hidden_layer_size
138140

139141
starting_weights = [get_random_weights(size) for _ in range(starting_population)]
140-
141-
# strategies = axl.short_run_time_strategies
142+
strategies = axl.short_run_time_strategies
142143

143144
evolve(starting_weights, mutation_rate, mutation_distance, generations,
144145
bottleneck, strategies, output_file, noise,

axelrod_utils.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,13 @@ def objective_match_score_difference(me, other, turns, noise):
3333
scores_for_this_opponent.append(score_diff)
3434
return scores_for_this_opponent
3535

36-
def objective_match_moran_win(me, other, turns, noise=0):
36+
def objective_match_moran_win(me, other, turns, noise=0, repetitions=100):
3737
"""Objective function to maximize Moran fixations over N=4 matches"""
3838
assert(noise == 0)
3939
# N = 4 population
4040
population = (me, me.clone(), other, other.clone())
4141
mp = axl.MoranProcess(population, turns=turns, noise=noise)
4242

43-
if mp._stochastic:
44-
repetitions = 100
45-
else:
46-
repetitions = 1
47-
4843
scores_for_this_opponent = []
4944

5045
for _ in range(repetitions):

0 commit comments

Comments
 (0)