Note: All reference from Hod's slides if not specified
- No algorithm is universally better than any other algorithm ( for all problems)
- Improvement in an algorithm makes it better for some problem and worse for others
- Use suitable algorithm
- Inductive bias
- simpler problems, GA overkill
- baseline, comparison/diagnostics
- "inner loop" of more complex algorithms
- not too simple (e.g. convex problem with gradient)
- not too hard (e.g. needle in the haystack)
- problem with substructure (sub-components that are sub solution to sub problems)
- needle in the haystack
- deceptive gradient
- sample n random points
- rank top m points
- sample a new point within the convex hull of the m points
- repeat from 2
- reflection/contraction/expansion/shrinkage
- reflect the point win the highest objective function through centroid of the remaining simplex
- best point? Expansion
- good point? Reflection
- worst point? Shrinkage
ref:capsis
- Let
$s = s_0$ - For
$k = 0$ to$k_{max}$ :$T = temperature((k+1)/k_{max})$ - Pick a random neighbour,
$s_{new} = neighbour(s)$ - If
$e^{(f(s_{new})- f(s))/T} >= random(0,1):$ $s← s_{new}$
- Output: the final state s
- generation->mutate->combine and rank->down select->repeat
- direct encoding: one gene for every DOF
- indirect encoding: evolves rules, more compact. e.g. evolve mass-spring parameters w.r.t centers
- mutation: change a chromosome, e.g. flip bits
- crossover: split & recombine parts from two individuals
- composition: concatenate two individuals
- implicit parallelism (building block hypothesis)
- GA implicitly identify and recombine "building blocks". i.e. low order, low defining-length schemata with above average fitness
- GA solutions are composed of building blocks that contain useful genes
- evolution recombines good building blocks.
- Schemata with a low order and a short defining length are called building blocks
- building blocks destroyed by crossover: e.g. uniform crossover
- bad representation: loose linkage
- building block inside the genome are competing
- GA works if representation and variation operator allow effective recombination of building blocks
- list of indices
- priority encoding
- connectivity matrix
related individuals that are able to breed among themselves, but are not able to breed with members of another species
- Allopatric speciation: physical separation (different geographical area) of a population, e.g., evolve TSP in islands
- sympatric speciation: (same geographic area) reproductive/behavioral separation. e.g. the same solution with reversed order
- fitness proportionate:
- roulette wheel
- UCS (Stochastic universal sampling)
- rank based:
- truncation:top k% are replicated and replace bottom 100-k% with variation
- tournament: Select random k, among those select top for variation
-
$(\mu,\lambda)$ generates$\lambda$ new offspring and uses top$\mu$ to populate the next generation - Elitism: Keep the best k solutions around unmodified
- steady state selection
- Choose parent(s) at random
- create offspring
- Choose someone in population
- If child better than selected individual, replace it
Beware: Produces races on multiprocessor architectures
- roulette wheel selection may suffer from premature convergence
- rank selection may not perform well due to small variations in large fitness
- normalization:
- gaussian: subtract mean, divide by std
- linear: bring min-max to [0,1]
- nonlinear: boltzman selection F=exp(f/T)
- selection pressure too high, diversity lost
- diversity maintenance, increase population size, lower selection pressure
- selection pressure
- crossover probability
- mutation probability
- population size ...
- The change in the frequency of an existing variant (allele) in the population due to random sampling
- mutation: incremental progress, refinement
- crossover: recombination of building blocks, discovering new areas (initially possibly inferior)
- one of two or more forms of the gene or genetic locus
- Order of a schema = num of specified alleles in a gene
$2^{N-o}$
Schema | Order | Represented Strings |
---|---|---|
*** | 0 | 000 001 010 011 100 101 110 111 |
*1* | 1 | 010 011 110 111 |
*10 | 2 | 010 110 |
101 | 3 | 101 |
-
$p_m$ is the probability of mutating any bit. - schemata with higher order have a higher probability of being disrupted by mutation
-
$p_c$ = crossover probability, -
$l$ = length of the bit string in the search space -
d(H) = distance between the furthest two non-* symbols, e.g d(*10*)=1, d(*1*0*)=2,
-
-1 gives lower bound
-
schemata with a long defining length have a higher probability to be disrupted by crossover
- 1,0,*
$3^N$
- Suppose p individuals of length N
- between
$2^N$ and$min(P2^N,3^N)$
- genetic proximity of functionally related genes
- loose linkage->good building blocks cannot be promoted
- light linkage->evolvability (potential to keep improving)
- inversion: reorder the genes as we progress, orderings with tight linkage will prevail
- evolve crossover point: introns/probabilities/junk DNA
- infer directly and decouple: linkage learning
- trait: phenotypic characteristic
- pleiotropy: one gene might affect several traits
- polygeny: one trait might be affected by several genes
- epistasis: genetic interaction, when the action of one gene is modified by one or more other independent genes
- Supergenes: turn other genes on and off
- strings v.s trees (open-ended representation)
- alleles v.s building blocks
- GP has variable linkage, crossover is hierarchical
- mutation (small, random)
- change coefficient
- replace branch with constant
- crossover (large, non-random)
- swap sub-trees
- Representation: heap data structure
- element k has children 2k+1 and 2k+2 (zero indexed)
- evaluation (heap): evaluate at the bottom and work way up
- replace variables with values
- replace operator with the result of the operation
- solutions get unnecessarily large but do not add any meaningful content. e.g. F(x)=x+
x-x
- combated using operator that reduce the size of the solution: pruning/snipping.
- snipping: replace a sub-branch with a constant (average)
- pruning: eliminate sub-branches with relatively low contribution
- parallel/serial/r/l/c...
- variational operator should keep the DOF the same
- start from a four-bar mechanism
- T operator replaces a given link with two links that pass through a newly created node. The new node is also connected to another existing node
- D operator creates a new node and connects it to both the endpoints of a given link, forming a rigid triangle component
- ref: How to Draw a Straight Line Using a GP by Hod
- single/multiple populations where the relative ranking among two individuals depends on a third individual. E.g., chess against a co-evolving partner, takes care of the gradient problem
- Collusion: e.g. bad simulator scores high for bad simulated robot
- large (infinite) search space
- no objective measures exist
- objective measures difficult to formalize/unknown
- certain types of structure in search space
- body-brain / morphography-controller
- Antogonistic: predator-prey
- Cooperative: symbiosis
- Asymmetric: teacher-learner, host-parasite
- the "extreme" form: symbiogenesis, where independent sybiont merge into single individual (reproduce together only) e.g. mitochondrial symbiogenisys
- Objective: Ground truth
- Subjective: Fitness as measured using co-evolving metric
- expression trees and data-points both subject to mutation and crossover
- subjective fitness can be
- misleading progress
- cycles of forgetting
- collusion
- joint fitness may lead to "hitchhikers."
- credit assignment problem
- "it takes all the running you can do, to keep in the same place"
- one population's progress is is assessed using another population. relative progress matters.
- divide and conquer
- fitness of combined sub-solution depends on the overall distance
- ref: How to Solve It: Modern Heuristics (p425)
- Make crossover effective
– Building block material
– Hold on to building blocks that are not currently used - Find multiple optima,
– Including useful sub-optima - Combat deceptive gradients
- High selection pressure
- Selection noise: drift causes convergence even in the absence of fitness pressure
- Operator noise: High mutation and crossover may lose solutions, or lose critical building blocks, so population converges on what’s left
- fitness sharing: Fitness is divided by number of similar individuals
-
crowding: Replace individuals that are similar
- Stochastically: The more similar, the more likely to be replaced
- Deterministically: Similar parent is replaced. d(p1,c1)+d(p2,c2)<? d(p1,c2)+d(p2+c1)
-
Niching: evolve individual in spatial/topological niches; migrate occasionally between niches. (paper)
-
sequential( temporal) niching: restart many times, flatten areas where previous optima were found. (paper):
- Initialize: equate the modified fitness function with the raw fitness function
- Run the GA or other search technique using the modified fitness function, keeping a record of the best individual found in the run
- Update the modified fitness function to give a depression in the region near the best individual producing a new modified fitness function
- If the raw fitness of the best individual exceeds the solution threshold, display this as a solution
- If not all solutions have been found, return to step 2
-
why? Random individuals unlikely to be selected
-
Hierarchical Fair Competition. (paper)
-
Age-Layered Population Structure. (paper)
- weight unknown
- Select based on distance from the Pareto front
- Thinning and sampling
- Pareto efficiency/optimality: situation where no individual or preference criterion can be better off without making at least one individual or preference criterion worse off or without any loss thereof.
- Pareto frontier: the set of all Pareto efficient allocations,
source: wikipedia
- Select by Pareto-rank and Crowding distance
- Sort the population into a hierarchy of subpopulations based on the ordering of Pareto dominance.
- Evaluate similarity between members of each sub-group on the Pareto front
- The resulting groups and similarity measures are used to promote a diverse front of non-dominated solutions. (paper)
- Use other individuals as dimensions of multi-objective evolution
- Evolve partial solutions
- Keep a partial solution as long as there is no other partial solution better than it in all contexts (component is best in at least one context).
- evolvability: age-pareto
- simplicity
- novelty/diversity
- e.g. max(corr(residual_errors_of_neareast_fitness_neighbor))
- robustness/sensitivity
- modularity
- cost of manufacturing
- ...
- synergistic learning: the synergy between learning and evolution
- genetic assimilation: plastic mechanisms are assimilated by the genotype; learned behaviors become instinctive
e.g. One max:
- initial random population
- select top 50%
- estimate probability p(x=1|selected individual)
- generate new population based on the probability
- repeat from step 2
source:Larranaga and Lozano, EDAs, Kluwer 2002. p64