This algorithm simulates a population of individuals with 4 initial traits and multiple secondary genes. The algorithm mutates the population though generations and visualizes real behaviourism of the society. It also saves and draws some plots to ease the burden of information visualization.
If you want, the modules Individual.py and DataAnalysisModule.py can be used as standalone libraries to be embedded in your personal projects.
If you have a question about the code or the hypotheses I made, do not hesitate to post a comment in the comment section below. If you also have a suggestion on how this notebook could be improved, please reach out to me.
The program is built according to some general rules and it uses the command line arguments in order to ensure a more automated usage. The arguments are as follows:
Arguments:
the defaults are taken from the main script, usually they are:
- nr_ind_start = 10
- nr_generations = 10
- genes = [-98, -69, -65, -46, 49, 50, 74, 91, 117, 178]
- length = 400
- width = 400
- nat_selection = False
- d_analysis = False
- d_analysis_option = ''
- file_path = ''
If we write in the cmd the following line: python Houses_Prices.py -h, the help list will appear in the following order:
Start the program in the following order, where [-c] is optional
python Houses_prices.py [-p nr] [-g nr] [-l nr] [-w nr] [-n] [-a opt] -f location_input
-p / --pop | set the initial population, default 10
-g / --gen | set the number of generations, default 10
-l / --length | set the environment length, default 400
-w / --width | set the environment width, default 400
-n / --nat | run the natural selection algorithm
-a / --analysis | run the data analysis algorithm
| options:
| s -> save plots
| d -> display plots
| sd -> save and display plots
-f / --file | input/output file of the data
First, we get the arguments from the command line:
nr_ind_start = 10
nr_generations = 10
environment = (400, 400)
nat_selection = False
d_analysis = False
d_analysis_option = ''
file_path = 'natural_selection_data.csv'
# define the class objects
nr_ind_start, nr_generations, environment, nat_selection, d_analysis, d_analysis_option, file_path = get_arguments()
The command line influences the workflow of the algorithm with the following interpretation:
if d_analysis_option == 's':
plt.savefig('Images/' + file_path[-5] + str(nr_crt) + '_plot.png')
nr_crt += 1
if d_analysis_option == 'd':
plt.show()
if d_analysis_option == 'sd':
plt.savefig('Images/' + file_path[-5] + str(nr_crt) + '_plot.png')
nr_crt += 1
plt.show()For further information, please open the get_arguments.py
The main algorithm is split between 4 main modules and these are:
- get_arguments.py which has the main function of argument detection and decision
- DataAnalysisModule.py which has the class that displays data in form of plots
- Individual.py which has the class that has the individual's parameters, attributes and methods that make possible data manipulation
- NaturalSelection.py which is the main script and must be called in order to make the program functional
Make a mathematical explanation of the universal population's asymptotic logistic growth and implement said algorithms in a working program.
The genetic algorithm cycle is as the following:
This simulation requires some mathematical knowledge, because a population is interpreted as a set of individual, who are also a set of traits, so we can write mathematicaly that and denote the following identities:
We will define a function named Fitness in order to select the most fitted individuals. A fitness of an organism is the mathematical function that generates a number based on the traits of the individual in order to establish the chance of survival of the individual:
In order to clean the population of the worst fitted individual in the society, we need to calculate the mean value of the population's fitness and then establish the rule that those individuals that do not have at least 90% of the entire population mean fitness, won't reproduce. This value was chosen in order to ensure a more real world behaviour of a population, where only the fittest will reproduce. The mathematical functions are as followed:
Now, if an individual was able to divide, a series of mutations will take place on the features of the child. These mutations are probabilistic and mirror the behaviour of DNA strands when they break in order to be able to replicate. This act is implemented in a function and defined as the following:
In the theory of genetic algorithms comes the notion of crowding effect. This effect is a factor on the population's ability to divide, quantifying the space that the society lives on. It starts small and increases as the population increses and stops the individuals to divide and suprapopulate the environment. Note that the environment was chosen as a rectagle with size l and w, but it can be generated as a surface of any shape and size to accommodate a more realistic population. The general formula is the following:
The equation that allows the individuals to divide and increase the population. The number P(i) represents the death/life chances of an individual in the population:
The programming language used will be python, because it has many useful frameworks and libraries.
The individual's probability of secondary gene mutation, the variable probability_gene_modification from the module Individual.py can be modified
The program has the following stages:
- calculate fitness and delete unfit individuals:
fitness_average = 0
for ind in population:
fitness_average += ind.fitness_function()
fitness_average = fitness_average / len(population)
temp_ind = []
for ind in population:
if ind.fitness_function() >= fitness_average * 0.9 and ind.life_exp > 0:
ind.life_exp -= 1
temp_ind.append(ind)
population = temp_ind- divide the current generation
mutant_temp = []
for ind in population:
random.seed(ind.fitness_function())
chance_of_child = random.random()
if chance_of_child - round(crowding_effect(population, environment), 4) >= 0.1:
# if > 0.3, mutate
# otherwise, don't
random.seed(chance_of_child *
crowding_effect(population, environment) *
float(datetime.now().microsecond))
chance_of_mutation = random.random()
mutant = copy.copy(ind)
if chance_of_mutation >= 0.3:
mutant = ind.mutation(mutant, i)
mutant.generation = i
mutant_temp.append(mutant)- Eliminate old individuals, make the new generation and save it in the .csv
temp_ind = []
for ind in population:
if ind.life_exp > 0:
ind.save_individual(file_path, i)
temp_ind.append(ind)
population = temp_ind
temp_ind = []
# make the new generation
for ind in mutant_temp:
ind.save_individual(file_path, i)
population.append(ind)- Repeat the cycle
The simulation predicts accurately the evolution of a simple population of bacteria/virus with a small set of traits and can be used to exemplify the notion of genetic algorithm
To check the corectness of the mathematical identities and functions, the statistics of every individual is saved in a .csv and saved in a plot. These plots have the mission to encapsulate and display the evolution of the population though generations and verify if the parameters and notions are correctly implemented and designed. The most helpful plot is the following, it displays the evolution of the society in 348 generations and marks the total population that have ever lived:
Herbert Spencer's well-known phrase "survival of the fittest" should be interpreted as: "Survival of the form (phenotypic or genotypic) that will leave the most copies of itself in successive generations.". In the simulation, it was difficult to implement a genotypic-like algorithm, so a more numerical fitness function was implemented, which resulted in a increase of population's average fitness level, as Herbert theorized. The following plots show the relation between the level of fitness and the number of individuals that share the same level:
The following plot shows the relation between the current generation and it's fitness level:
These levels of general fitness can be seen as the result of 4 distinct main traits. These will be plotted and analysed further, in order to establish the main factors of the evolution of the simulated population:
As expected, the population is divided in categories and no two individuals share the exact same set of traits among the generations. As one can extract from the plots, the distribution is Gaussian, which approves the hypothesis and that the functions were correctly implemented. For further analysis, more generations and diverse environment are required, but will be implemented in future tests and improvements
To better analyse the data set, one needs to know the most important features of the population, so the next plot displays the first 5 features that increase the fitness and the first 5 features that decrease the general fitness of an individual
The algorithm ran for 5 distict populations, each with 100 generations and some information was revealed. The total population of each society respects the logistic growth and can be put side-by-side with a real population of slow self-replicating individuals:
The algorithm produces a population that resembles the behaviour and mutations that happen in the real world and can be used to analyse abstract modifications in traits among the individuals in a species, by making mathematical assumptions and notations according to a general rule.
This program was made for fun from the idea of genetic mutations and can be used by anyone.






