Axelrod-Python
diff --git a/‎README.md
Lines changed: 135 additions & 39 deletions b/‎README.md
Lines changed: 135 additions & 39 deletions
diff --git a/‎ann_evolve.py
Lines changed: 13 additions & 12 deletions b/‎ann_evolve.py
Lines changed: 13 additions & 12 deletions
diff --git a/‎axelrod_utils.py
Lines changed: 1 addition & 6 deletions b/‎axelrod_utils.py
Lines changed: 1 addition & 6 deletions
@@ -1,65 +1,161 @@
 # Axelrod Evolvers
 
-This repository contains training code for the strategies LookerUp, PSOGambler, and EvolvedANN (feed-forward neural network).
-There are three scripts, one for each strategy:
-* looker_evolve.py
-* pso_evolve.py
-* ann_evolve.py
-
-In the original iteration the strategies were run against all the default strategies in the Axelrod library. This is slow and probably not necessary. For example the Meta players are just combinations of the other players, and very computationally intensive; it's probably ok to remove those.
+This repository contains reinforcement learning training code for the following
+strategy types:
+* Lookup tables (LookerUp)
+* Particle Swarm algorithms (PSOGambler)
+* Feed Forward Neural Network (EvolvedANN)
+* Finite State Machine (FSMPlayer)
+
+The training is done by evolutionary algorithms or particle swarm algorithms. There
+is another repository that trains Neural Networks with gradient descent. In this
+repository there are scripts for each strategy type:
+
+* [looker_evolve.py](looker_evolve.py)
+* [pso_evolve.py](pso_evolve.py)
+* [ann_evolve.py](ann_evolve.py)
+* [fsm_evolve.py](fsm_evolve.py)
+
+In the original iteration the strategies were run against all the default
+strategies in the Axelrod library. This is slow and probably not necessary. For
+example the Meta players are just combinations of the other players, and very
+computationally intensive; it's probably ok to remove those. So by default the
+training strategies are the `short_run_time_strategies` from the Axelrod library.
 
 ## The Strategies
 
-The LookerUp strategies are based on lookup tables with two parameters:
-* n, the number of rounds of trailing history to use and
+The LookerUp strategies are based on lookup tables with three parameters:
+* n1, the number of rounds of trailing history to use and
+* n2, the number of rounds of trailing opponent history to use
 * m, the number of rounds of initial opponent play to use
 
-PSOGambler is a stochastic version of LookerUp, trained with a particle swarm algorithm.
+PSOGambler is a stochastic version of LookerUp, trained with a particle swarm
+algorithm. The resulting strategies are generalizations of memory-N strategies.
 
 EvolvedANN is one hidden layer feed forward neural network based algorithm.
+Various features are derived from the history of play. The number of nodes in
+the hidden layer can be changed.
 
-All three strategies are trained with an evolutionary algorithm and are examples of reinforcement learning.
+EvolvedFSM searches over finite state machines with a given number of states.
 
-### Open questions
+Note that large values of the parameters will make the strategies prone to
+overfitting.
 
-* What's the best table for n, m for LookerUp and PSOGambler?
-* What's the best table against parameterized strategies? For example, if the opponents are `[RandomPlayer(x) for x in np.arange(0, 1, 0.01)], what lookup table is best? Is it much different from the generic table?
-* Can we separate n into n1 and n2 where different amounts of history are used for the player and the opponent?
-* Are there other features that would improve the performance of EvolvedANN?
+## Optimization Functions
 
+There are three objective functions:
+* Maximize mean match score over all opponents with `objective_match_score`
+* Maximize mean match score difference over all opponents with `objective_match_score_difference`
+* Maximize Moran process fixation probability with `objective_match_moran_win`
 
 ## Running
 
-`python lookup-evolve.py -h`
-
-will display help. There are a number of options and you'll want to set the mutation rate appropriately. The number of keys defining the strategy is `2**{n + m + 1}` so you want a mutation rate in the neighborhood of `2**(-n-m)` so that there's enough variation introduced.
-
-
-Here are some recommended defaults:
+### Look up Tables
+
+```bash
+$ python lookup_evolve.py -h
+Lookup Evolve.
+
+Usage:
+    lookup_evolve.py [-h] [-p PLAYS] [-o OPP_PLAYS] [-s STARTING_PLAYS]
+    [-g GENERATIONS] [-k STARTING_POPULATION] [-u MUTATION_RATE] [-b BOTTLENECK]
+    [-i PROCESSORS] [-f OUTPUT_FILE] [-z INITIAL_POPULATION_FILE] [-n NOISE]
+
+Options:
+    -h --help                   show this
+    -p PLAYS                    number of recent plays in the lookup table [default: 2]
+    -o OPP_PLAYS                number of recent plays in the lookup table [default: 2]
+    -s STARTING_PLAYS           number of opponent starting plays in the lookup table [default: 2]
+    -g GENERATIONS              how many generations to run the program for [default: 500]
+    -k STARTING_POPULATION      starting population size for the simulation [default: 20]
+    -u MUTATION_RATE            mutation rate i.e. probability that a given value will flip [default: 0.1]
+    -b BOTTLENECK               number of individuals to keep from each generation [default: 10]
+    -i PROCESSORS               number of processors to use [default: 1]
+    -f OUTPUT_FILE              file to write data to [default: tables.csv]
+    -z INITIAL_POPULATION_FILE  file to read an initial population from [default: None]
+    -n NOISE                    match noise [default: 0.00]
 ```
-python lookup_evolve.py -p 3 -s 3  -g 100000 -k 20 -u 0.01 -b 20 -i 4 -o evolve3-3.csv
 
-python lookup_evolve.py -p 3 -s 2  -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve3-2.csv
+There are a number of options and you'll want to set the
+mutation rate appropriately. The number of keys defining the strategy is
+`2**{n + m + 1}` so you want a mutation rate in the neighborhood of `2**(-n-m)`
+so that there's enough variation introduced.
 
-python lookup_evolve.py -p 3 -s 1  -g 100000 -k 20 -u 0.06 -b 20 -i 4 -o evolve3-1.csv
+### Particle Swarm
 
-python lookup_evolve.py -p 1 -s 3  -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve1-3.csv
+```bash
+$ python pso_evolve.py -h
+Particle Swarm strategy training code.
 
-python lookup_evolve.py -p 2 -s 3  -g 100000 -k 20 -u 0.03 -b 20 -i 4 -o evolve2-3.csv
-```
-### 2, 2 is the current winner:
-```
-python lookup_evolve.py -p 2 -s 2  -g 100000 -k 20 -u 0.06 -b 20 -i 4 -o evolve2-2.csv
-
-python lookup_evolve.py -p 1 -s 2  -g 100000 -k 20 -u 0.1 -b 20 -i 2 -o evolve1-2.csv
-
-python lookup_evolve.py -p 1 -s 2  -g 100000 -k 20 -u 0.1 -b 20 -i 2 -o evolve2-1.csv
+Usage:
+    pso_evolve.py [-h] [-p PLAYS] [-s STARTING_PLAYS] [-g GENERATIONS]
+    [-i PROCESSORS] [-o OPP_PLAYS] [-n NOISE]
 
+Options:
+    -h --help             show help
+    -p PLAYS              number of recent plays in the lookup table [default: 2]
+    -o OPP_PLAYS          number of recent opponent's plays in the lookup table [default: 2]
+    -s STARTING_PLAYS     number of opponent starting plays in the lookup table [default: 2]
+    -i PROCESSORS         number of processors to use [default: 1]
+    -n NOISE              match noise [default: 0.0]
 ```
-### 4, 4 (might take for ever / need a ton of ram)
+
+Note that to use the multiprocessor version you'll need to install pyswarm 0.70
+directly (pip installs 0.60 which lacks mutiprocessing support).
+
+### Neural Network
+
+```bash
+$ python ann_evolve.py -h
+Training ANN strategies with an evolutionary algorithm.
+
+Usage:
+    ann_evolve.py [-h] [-g GENERATIONS] [-u MUTATION_RATE] [-b BOTTLENECK]
+    [-d MUTATION_DISTANCE] [-i PROCESSORS] [-o OUTPUT_FILE]
+    [-k STARTING_POPULATION] [-n NOISE]
+
+Options:
+    -h --help                    show this
+    -g GENERATIONS               how many generations to run the program for [default: 10000]
+    -u MUTATION_RATE             mutation rate i.e. probability that a given value will flip [default: 0.4]
+    -d MUTATION_DISTANCE         amount of change a mutation will cause [default: 10]
+    -b BOTTLENECK                number of individuals to keep from each generation [default: 6]
+    -i PROCESSORS                number of processors to use [default: 4]
+    -o OUTPUT_FILE               file to write statistics to [default: weights.csv]
+    -k STARTING_POPULATION       starting population size for the simulation [default: 5]
+    -n NOISE                     match noise [default: 0.0]
 ```
-python lookup_evolve.py -p 4 -s 4  -g 100000 -k 20 -u 0.002 -b 20 -i 4 -o evolve4-4.csv
+
+### Finite State Machines
+
+```bash
+$ python fsm_evolve.py -h
+FSM Evolve.
+
+Usage:
+    fsm_evolve.py [-h] [-s NUM_STATES] [-g GENERATIONS]
+    [-k STARTING_POPULATION] [-u MUTATION_RATE] [-b BOTTLENECK]
+    [-i PROCESSORS] [-f OUTPUT_FILE] [-n NOISE]
+
+Options:
+    -h --help                   show this
+    -s NUM_STATES               number FSM states [default: 16]
+    -g GENERATIONS              how many generations to run the program for [default: 500]
+    -k STARTING_POPULATION      starting population size for the simulation [default: 20]
+    -u MUTATION_RATE            mutation rate i.e. probability that a given value will flip [default: 0.1]
+    -b BOTTLENECK               number of individuals to keep from each generation [default: 10]
+    -i PROCESSORS               number of processors to use [default: 1]
+    -f OUTPUT_FILE              file to write data to [default: fsm_tables.csv]
+    -n NOISE                    match noise [default: 0.00]
 ```
-## Analyzing
 
-The output files `evolve{n}-{m}.csv` can be easily sorted by `analyze_data.py`, which will output the best performing tables. These can be added back into Axelrod.
+## Open questions
+
+* What's the best table for n1, n2, m for LookerUp and PSOGambler? What's the
+smallest value of the parameters that gives good results?
+* Similarly what's the optimal number of states for a finite state machine
+strategy?
+* What's the best table against parameterized strategies? For example, if the
+opponents are `[RandomPlayer(x) for x in np.arange(0, 1, 0.01)], what lookup
+table is best? Is it much different from the generic table?
+* Are there other features that would improve the performance of EvolvedANN?
@@ -11,18 +11,17 @@
 
 Options:
     -h --help                    show this
-    -g GENERATIONS               how many generations to run the program for [default: 10000]
+    -g GENERATIONS               how many generations to run the program for [default: 1000]
     -u MUTATION_RATE             mutation rate i.e. probability that a given value will flip [default: 0.4]
-    -d MUTATION_DISTANCE         amount of change a mutation will cause [default: 10]
-    -b BOTTLENECK                number of individuals to keep from each generation [default: 6]
+    -d MUTATION_DISTANCE         amount of change a mutation will cause [default: 5]
+    -b BOTTLENECK                number of individuals to keep from each generation [default: 5]
     -i PROCESSORS                number of processors to use [default: 4]
     -o OUTPUT_FILE               file to write statistics to [default: weights.csv]
-    -k STARTING_POPULATION       starting population size for the simulation [default: 5]
+    -k STARTING_POPULATION       starting population size for the simulation [default: 10]
     -n NOISE                     match noise [default: 0.0]
 """
 
 import csv
-from copy import deepcopy
 from itertools import repeat
 from multiprocessing import Pool
 import os
@@ -32,6 +31,7 @@
 from docopt import docopt
 import numpy as np
 
+import axelrod as axl
 from axelrod.strategies.ann import ANN, split_weights
 from axelrod_utils import score_for, objective_match_score, objective_match_moran_win
 
@@ -62,7 +62,7 @@ def crossover(weights_collection):
             if i == j:
                 continue
             crosspoint = random.randrange(len(w1))
-            new_weights = deepcopy(w1[0:crosspoint]) + deepcopy(w2[crosspoint:])
+            new_weights = list(w1[0:crosspoint]) + list(w2[crosspoint:])
             copies.append(new_weights)
     return copies
 
@@ -86,22 +86,24 @@ def evolve(starting_weights, mutation_rate, mutation_distance, generations,
 
         for generation in range(generations):
             print("Generation " + str(generation))
+            size = 19 * hidden_layer_size
+            random_weights = [get_random_weights(size) for _ in range(4)]
+            weights_to_copy = [list(x[1]) for x in current_bests]
+            weights_to_copy += random_weights
 
-            weights_to_copy = [x[1] for x in current_bests] + \
-                              [get_random_weights(19 * hidden_layer_size) for _ in
-                               range(2)]
             # Crossover
             copies = crossover(weights_to_copy)
             # Mutate
             copies = mutate(copies, mutation_rate)
 
-            population = copies + weights_to_copy
+            population = copies + [list(x[1]) for x in current_bests] + random_weights
 
             # map the population to get a list of (score, weights) tuples
             # this list will be sorted by score, best weights first
             results = score_all_weights(population, strategies, noise=noise,
                                         hidden_layer_size=hidden_layer_size)
 
+            results.sort(key=itemgetter(0), reverse=True)
             current_bests = results[0: bottleneck]
 
             # get all the scores for this generation
@@ -137,8 +139,7 @@ def evolve(starting_weights, mutation_rate, mutation_distance, generations,
     size = 19 * hidden_layer_size
 
     starting_weights = [get_random_weights(size) for _ in range(starting_population)]
-
-    # strategies = axl.short_run_time_strategies
+    strategies = axl.short_run_time_strategies
 
     evolve(starting_weights, mutation_rate, mutation_distance, generations,
            bottleneck, strategies, output_file, noise,
 
@@ -33,18 +33,13 @@ def objective_match_score_difference(me, other, turns, noise):
         scores_for_this_opponent.append(score_diff)
     return scores_for_this_opponent
 
-def objective_match_moran_win(me, other, turns, noise=0):
+def objective_match_moran_win(me, other, turns, noise=0, repetitions=100):
     """Objective function to maximize Moran fixations over N=4 matches"""
     assert(noise == 0)
     # N = 4 population
     population = (me, me.clone(), other, other.clone())
     mp = axl.MoranProcess(population, turns=turns, noise=noise)
 
-    if mp._stochastic:
-        repetitions = 100
-    else:
-        repetitions = 1
-
     scores_for_this_opponent = []
 
     for _ in range(repetitions):