Research Note: This is an experimental simulation probing whether neuro-chemical modulation (Dopamine, Serotonin, Cortisol) can provide useful inductive biases for RL agents, or if it merely acts as stochastic noise.
This repository serves as a testbed for a "dual-process" cognitive architecture. It simulates internal hormonal dynamics to drive agent behavior, attempting to model intrinsic motivation and homeostasis rather than purely extrinsic reward maximization.
Can an agent driven by internal homeostatic regulation (balancing "boredom" and "satisfaction") effectively explore and learn in sparse-reward environments better than standard epsilon-greedy or entropy-regularized baselines?
Current Status: The system demonstrates distinct behavioral modes (exploration vs. exploitation) driven by simulated hormones, but it remains unproven whether this complexity yields a statistically significant advantage over standard meta-learning approaches on general tasks.
The system modularizes "cognition" into four interacting components. Note that the anthropomorphic naming conventions (Soul, Heart, etc.) are internal metaphors for the code modules, not claims of biological fidelity.
- Function: Maps latent states to continuous action parameters.
- Mechanism: Dual-head neural network outputting logits (discrete action type) and parameters (continuous x, y, scale).
- Goal: Test end-to-end learning of dexterity without pre-defined action templates.
- Function: Relational reasoning.
- Mechanism: GAT (Graph Attention Network) processing object-oriented representations of the visual input.
- Goal: Infer causal relationships between objects to inform decision-making.
- Function: Intrinsic reward shaping.
- Mechanism: Simulated hormone levels acting as dynamic hyperparameters.
- Dopamine: Correlates with prediction error/surprise. Modulates learning rate and exploration.
- Serotonin: Correlates with stability/low-energy states. Modulates "satisfaction" (stopping criteria).
- Cortisol: Correlates with stagnation or high entropy. Increases randomness/escape behavior.
- Function: catastrophic forgetting mitigation.
- Mechanism: Triggered when "Serotonin" thresholds are met (representing a stable, solved state), locking weights to preserve current capabilities.
We evaluated configurations on a mini-ARC task suite.
Ablation Study (N=10 seeds):
| Config | Description | Energy Reduction | Notes |
|---|---|---|---|
| Baseline | Random policy | 0% | Reference |
| System-1 | Energy minimization | ~13% | Heuristic driven |
| System-1+2 | With planner rollouts | ~26% | Classical planning helps significantly |
| Full System | With meta-learner | ~30% | Marginal gain over Planner, higher variance |
Observation: While the full system achieves the lowest energy state, the "Meta-Learning" component (System 3) adds significant complexity for a relatively small marginal gain (approx. 5%) over the standard Planner (System 2). This suggests the neuro-chemical modulation might be over-parameterized.
- Complexity Overhead: The interaction between three hormone signals creates a chaotic internal state space that is difficult to tune. The agent often oscillates between "panic" (high Cortisol) and "apathy" (low Dopamine) without finding a stable learning groove.
- Anthropomorphic Bias: The architecture assumes that biological metaphors (like "boredom") map cleanly to mathematical optimization. This assumption is strong and often leads to opaque failure modes where the agent "refuses" to act due to internal state rather than environmental constraints.
- Scalability: The Graph Attention Manifold (
manifold.py) scales quadratically with the number of visual objects, making it slow for complex scenes.
pip install torch numpy pytest pytest-cov matplotlibRun the basic simulation:
python main_system.pyRun deterministic experiment harness:
python experiments/run_experiment.py --steps 50 --seed 1If you use this code for research into intrinsic motivation or bio-inspired RL, please cite:
@software{kwag2024advanced,
author = {Kwag, Sung Hun},
title = {Advanced AI Meta-Cognition System: Experimental Neuro-Chemical RL},
year = {2024},
url = {https://github.com/sunghunkwag/Advanced-AI-Meta-Cognition-System}
}MIT License