“Reuse knowledge. Reduce grind. Optimize evolution.”
Training Machine Learning models is a high-cost grind — large datasets, long training times, and heavy compute requirements.
This project applies Transfer Learning within a Genetic Programming (GP) framework to reduce training effort by reusing knowledge from a Source Task and adapting it to a Target Task.
A pre-trained GP population is transferred, refined, and evolved to solve a related problem more efficiently.
- Name:
227_cpu_small - Observations: 8,192
- Variables: 13
- Attributes: Continuous
- Missing / NaN Values: Zero values present
- Duplicate Rows: None
- Name:
197_cpu_act - Observations: 8,192
- Variables: 20
- Attributes: Continuous
- Missing / NaN Values: Zero values present
- Duplicate Rows: None
- Duplicate rows removed to reduce bias
- Min–Max Normalization applied to all features
Normalization prevents features with large numeric ranges (e.g. freemem) from overpowering smaller-scale features (e.g. runqsz), ensuring fair contribution during evolution.
The GP regressor is represented as an Expression Tree:
- Internal Nodes: Operators
- Leaf Nodes: Operands (terminals)
Each individual consists of:
- A root node selected from the functional set
- Child nodes selected from functional or terminal sets
Expression trees offer flexibility and allow easy manipulation during evolution.
- Generation Method: Growth Method
- Initial Tree Depth:
0 - Ensures simple expressions at the start of evolution
Mean Absolute Error (MAE) is used as the fitness function.
Why MAE?
- Always non-negative
- Robust to outliers
- Ideal for regression-based GP evaluation
Each individual expression is evaluated using MAE — lower is better.
Tournament Selection is used:
- Randomly selects
kindividuals - The individual with the lowest fitness score wins
- One parent returned per tournament
Why Tournament Selection?
- Simple and efficient
- Computational complexity: O(n)
- Maintains balanced selection pressure
Two operators drive evolution:
- Crossover → Exploitation
- Mutation → Exploration
This balance ensures both refinement of strong individuals and discovery of new solutions.
- Random subtrees are selected from two parents
- Subtrees are swapped to form offspring
- Controlled by the Crossover Rate
- A random node in the tree is replaced
- Introduces novelty and diversity
- Controlled by the Mutation Rate
- Source GP is trained on the source dataset
- Final source population is extracted
- A portion is transferred to the Target GP
- Remaining population is generated via growth method
- Target GP evolves over
ggenerations - Final fitness is evaluated on the target dataset
The Transfer Rate determines how much knowledge is reused from the source population.
- Evolution ends after a fixed number of generations
-
Source GP:
50- Balances diversity and training efficiency
-
Target GP:
25- Encourages reuse of transferred individuals while allowing novelty
- If random value < 0.6 → crossover occurs
- Otherwise, individual is copied unchanged
- Higher rate to encourage exploration
- Reduces risk of premature convergence
- Point mutation affects only one node
- Best individuals are preserved
- Prevents loss of high-fitness solutions
- Prevents excessive tree growth
- Reduces noise and bloat
- Selected after multiple simulations
- Growth Method
- k = 2
- Balances selection pressure and diversity
Functional Set: { +, -, *, /, sqrt, cos, sin, log }
- Includes both binary and unary operators
- Operators are stored as strings for expression tree construction
Terminal Set: f(xi) for i in range(1, num_features+1) + ["c"]
- Represents all features in the dataset and a constant
c - Varies depending on the GP and dataset used:
- Source GP:
x1, x2, ..., x13+ constantc - Target GP:
x1, x2, ..., x20+ constantc
- Source GP:
Safety Handling:
- Division by zero → returns
1 - Square root of negative values → absolute value applied first
This ensures robust evaluation and prevents crashes during evolution.
🧩 Main Quest: Transfer Learning with Genetic Programming
🚀 Objective: Reduce training cost while maintaining performance
🏆 Reward: Efficient knowledge reuse across related tasks
Honours-level project — built for evolution, not brute force.