A biologically-grounded framework for energy-efficient generalization.
Generalization is not achieved by better regularization. It emerges from energy-bounded computation under survival pressure.
The loss function is not designed — it propagates from one physical constraint: energy is finite and entropy always wins.
Current deep learning optimizes a designed loss on fixed datasets. Biological intelligence evolved under a different regime: survival pressure in a dynamic environment, with no labels, no backpropagation, and strict energy constraints. This gap is the origin of the generalization problem.
The central thesis:
- Propagate surprise (prediction error), not raw activation.
- Pay for computation with energy — structure that does not contribute to survival is pruned away.
| Standard Deep Learning | SPEG |
|---|---|
| Fixed topology | Topology is the output of learning |
| Minimize designed loss | Survive energy depletion |
| Propagate activation | Propagate surprise only |
| Global gradient | Local causal correlation |
| Regularise by design | Regularise by physics (energy) |
| Generalise by data scale | Generalise by environment pressure |
| Complexity = parameter count | Complexity = K* (energy-bounded) |
| Transfer = fine-tuning | Transfer = structural overlap ΔK |
A Sparse Predictive Energy Graph is a dynamical system where structure and computation are the same thing, and energy efficiency is a first-class constraint.
G = (V, E, μ, ε, e, w, ρ, S)
| Symbol | Meaning |
|---|---|
| V | Dynamic node set (neurons) |
| E ⊆ V×V | Dynamic edge set (synapses) |
| μ_i(t) | Prediction held by node i |
| ε_i(t) | Prediction error at node i |
| e_i(t) | Energy budget of node i (≥ 0) |
| w_ij(t) | Synaptic weight |
| ρ_i(t) | Energy influx from environment |
| S(t) | Scalar survival signal (organism energy delta) |
Equation 1 — Prediction Error (replaces forward pass)
- Sensory: ε_i(t) = o_i(t) − μ_i(t)
- Internal: ε_i(t) = Σ_j w_ji · ε_j(t − τ_ji) − μ_i(t)
Transmission delay τ_ji ≥ 0 encodes causal direction. Negative weights produce inhibitory errors.
Equation 2 — Prediction Update (replaces learning rule)
μ_i(t+1) = μ_i(t) + α · ε_i(t)
The prediction chases the error. A node where ε_i → 0 has learned its inputs and costs nothing further.
Equation 3 — Energy Dynamics (replaces the loss function)
e_i(t+1) = e_i(t) + ρ_i(t) − β|ε_i(t)| − γ Σ_j |w_ij|
Every unit of surprise and every maintained connection costs energy. This is the only regulariser — set by physics.
Equation 4 — Weight Update (replaces backprop)
Δw_ij = η · ε_i(t) · ε_j(t − τ_ij) · S(t) − λ w_ij
Three forces: co-surprise strengthens (Hebbian on errors); causal direction from τ (STDP); survival signal S gates learning (neuromodulation); decay forces use-it-or-lose-it (homeostasis).
Equation 5 — Topology Dynamics (replaces fixed architecture)
- Remove node i if e_i(t) ≤ 0
- Remove edge (i,j) if |w_ij| < ε_min for T consecutive steps
- Add node at i = argmax_i [ ρ_i(t) · |ε_i(t)| ]* when Σ_i e_i > E_threshold
- Add edge (i,j) when |ε_i(t) · ε_j(t)| > θ_new
New nodes spawn where energy is available and surprise is high.
| Property | Emerges From |
|---|---|
| Inhibition | Negative w_ij in Equation 1 |
| Homeostasis | λ decay + energy floor in Equations 3 & 4 |
| STDP asymmetry | Transmission delay τ in Equation 4 |
| Criticality | Balance point β|ε| = ρ |
| Scale-free topology | Preferential survival of high-|w| edges in Equation 5 |
| Oscillations | Resonance in delay-τ recurrent loops in Equation 1 |
-
Energy-bounded complexity: At equilibrium, K ≤ (Σ_i ρ_i) / γ*. K* is not fixed at design time — it is determined by the energy-to-cost ratio. A harder environment automatically tightens the generalization bound.
-
PAC-Bayes: For a SPEG with at most K* active edges trained on n samples:
E[error_test] ≤ E[error_train] + √( (K ln|V| + ln(1/δ)) / 2n )* -
Transfer via structural overlap: Define ΔK = |E_B \ E_A| + |E_A \ E_B|. Then transfer efficiency is 1 − ΔK/K*. If ΔK ≈ 0: same structure solves both tasks (zero-shot transfer). If ΔK = K*: no shared structure. The structural overlap metric 1 − ΔK/K predicts transfer efficiency* — the central falsifiable prediction.
| Experiment | Goal | Success Criteria |
|---|---|---|
| 1 — Energy Pruning | Verify that energy-based pruning self-organises a sparse, functional graph | K(T) << K(0); H(ε) decreases; power-law degree distribution P(k) emerges |
| 2 — Seasonal Variance | Show that oscillating environment forces temporal memory vs. static | Survival_rate(B) > Survival_rate(A) on unseen season |
| 3 — Cross-Task Transfer | Validate that transfer efficiency T is predicted by 1 − ΔK/K* | Linear relationship T vs (1 − ΔK/K*) across task pairs |
| 4 — Cross-Modal | SPEG learns modality-invariant causal nodes without multimodal objective | I(V_Z ; M1), I(V_Z ; M2) > 0 for shared cause nodes |
| 5 — Zero-Shot Transfer | Evolved structure generalises to never-seen environment without fine-tuning | SPEG zero-shot > baseline; small ΔK for full adaptation |
This repository implements Experiments 1–3 (Phase 1–3 of the roadmap).
├── speg/ # Core SPEG implementation
│ ├── graph.py # SPEGGraph (COO sparse, delays, energy, μ, ε)
│ ├── core.py # SPEGEngine — Equations 1–5 in one step()
│ ├── topology.py # Equation 5: node/edge birth and death
│ └── metrics.py # K(t), H(ε), degree distribution, power-law fit, ΔK, T
├── experiments/
│ ├── config.py # SPEGParams
│ ├── exp1_pruning.py # Experiment 1 — energy pruning, P(k), power-law
│ ├── exp2_seasonal.py # Experiment 2 — seasonal ρ, survival comparison
│ ├── exp3_transfer.py # Experiment 3 — task pairs, ΔK, transfer efficiency
│ └── plotting.py # Shared plots (edge count, entropy, degree, transfer)
├── tests/
│ ├── test_equations.py # Unit tests for Eqs 1–4
│ └── test_topology.py # Unit tests for Eq 5
├── SPEG_Experiment_Design.docx # Full research design (source)
├── EXPERIMENT1_WORK_LOG.md # Detailed work log for Exp 1 (19 takes, bugs, results)
└── pyproject.toml
Requires Python ≥ 3.10.
git clone https://github.com/Techotkxy/SPEG.git
cd SPEG
pip install -e .
# Optional: pip install -e ".[dev]" # for pytestDependencies: torch, torchvision, torchaudio, matplotlib, numpy, scipy.
pytest tests/ -vpython -m experiments.exp1_pruningOutputs: results/exp1/edge_count.png, entropy.png, degree_dist.png, results.pt. Success: K(T) << K(0), power-law P(k) with plausible fit (KS < 0.1).
python -m experiments.exp2_seasonalpython -m experiments.exp3_transfer- Paper 1 (NeurIPS / ICML): Prediction Error Propagation with Energy-Bounded Topology Improves Sample Efficiency and Transfer in Sparse Networks — Experiments 1–3, PAC-Bayes bound, T vs 1 − ΔK/K*.
- Paper 2 (NeurIPS / Nature MI): Full SPEG with game environment, co-evolving predators, cross-modal and zero-shot transfer (Experiments 4–5).
The structural overlap metric 1 − ΔK/K predicts transfer efficiency across modalities* — without any explicit multimodal training objective.
If the plot of T vs (1 − ΔK/K*) is clean and linear across 4+ task pairs spanning different modalities, that single figure is the paper. Everything else is scaffolding around it.
Research design & experiment plan — 2025. See SPEG_Experiment_Design.docx for the full design document and EXPERIMENT1_WORK_LOG.md for detailed Experiment 1 debugging and results.