Biological Graph Transformers: Spiking, Hebbian, and Neuromorphic Architectures

Overview

The Biological Computation Thesis

Biological neural networks process graph-structured information with an efficiency that remains unmatched by artificial systems. The human brain -- a network of approximately 86 billion neurons connected by 100 trillion synapses -- performs graph-structured reasoning (social inference, spatial navigation, causal reasoning) consuming only 20 watts. A comparable artificial graph transformer processing a social network of similar density would require megawatts.

This disparity is not merely quantitative. Biological networks exploit three computational principles that artificial graph transformers have largely ignored:

Event-driven sparsity. Cortical neurons fire at 1-10 Hz on average, meaning 99%+ of compute is skipped at any given moment. Only "interesting" graph events trigger processing. Artificial graph transformers compute dense attention over all nodes at every step.
Local learning rules. Synaptic plasticity (STDP, Hebbian learning, BTSP) requires only information available at the synapse itself -- pre/post-synaptic activity and a neuromodulatory signal. No global backpropagation through the entire graph. This enables truly distributed, scalable learning on graphs.
Temporal coding. Information is encoded not just in firing rates but in precise spike timing, phase relationships, and oscillatory coupling. This gives biological networks a temporal dimension that artificial attention mechanisms -- which compute static weight matrices -- fundamentally lack.

This research document proposes a 10-year roadmap (2026-2036) for biological graph transformers that systematically incorporate these principles into the RuVector architecture, leveraging existing implementations in ruvector-mincut-gated-transformer (spike-driven attention, energy gates, Mamba SSM), ruvector-nervous-system (dendritic computation, BTSP, e-prop, Hopfield networks), ruvector-gnn (EWC continual learning, replay buffers), and ruvector-attention (18+ attention mechanisms).

Problem Statement

Current graph transformers face five scaling barriers:

Barrier	Root Cause	Biological Solution
O(N^2) attention	All-pairs computation	Event-driven sparse firing
Catastrophic forgetting	Global weight updates	Local synaptic consolidation (EWC/BTSP)
Energy consumption	Dense FP32 multiply-accumulate	Binary spike operations (87x reduction)
Static topology	Fixed graph at inference	Activity-dependent rewiring (STDP)
No temporal reasoning	Snapshot-based processing	Spike timing and oscillatory coding

Expected Impact

2028: 100x energy reduction for graph attention via spiking architectures
2030: Neuromorphic graph chips processing 1B edges at 1mW
2032: Self-organizing graph transformers with no training phase
2036: Bio-digital hybrid processors with living neural tissue for graph reasoning

1. Spiking Graph Transformers

1.1 Event-Driven Attention on Graphs

Standard graph attention (GAT) computes attention for every node pair at every layer. Spiking Graph Transformers (SGT) replace this with event-driven computation: a node only participates in attention when it "fires" -- when its membrane potential exceeds a threshold due to incoming graph signals.

Architecture:

Graph Input --> Spike Encoder --> Spiking Attention Layers --> Spike Decoder --> Output
                    |                    |
              Rate coding           Coincidence-based
              (value -> spike       attention weights
               frequency)           (no multiplication)

RuVector already implements the core of this in crates/ruvector-mincut-gated-transformer/src/attention/spike_driven.rs, which provides multiplication-free attention via spike coincidence detection:

// Existing RuVector implementation (spike_driven.rs)
// Attention via spike timing coincidence -- zero multiplications
pub fn attention(
    &self,
    q_spikes: &[SpikeTrain],
    k_spikes: &[SpikeTrain],
    v_spikes: &[SpikeTrain],
) -> Vec<i32> {
    // For each query position, count spike coincidences with keys
    // coincidence_score += q_polarity * k_polarity (when q_time == k_time)
    // This replaces softmax(QK^T/sqrt(d)) with temporal coincidence
}

The extension to graphs requires topology-aware spike routing: spikes propagate only along graph edges, not across all node pairs.

/// Proposed: Spiking Graph Attention with edge-constrained propagation
pub struct SpikingGraphAttention {
    /// Spike-driven attention (existing)
    spike_attn: SpikeDrivenAttention,
    /// Graph adjacency for spike routing
    adjacency: CompressedSparseRow,
    /// Per-edge synaptic delays (in timesteps)
    edge_delays: Vec<u8>,
    /// Per-node membrane potentials (LIF model)
    membrane: Vec<f32>,
    /// Refractory state per node
    refractory: Vec<u8>,
}

impl SpikingGraphAttention {
    /// Process one timestep of spiking graph attention
    pub fn step(&mut self, input_spikes: &[bool]) -> Vec<bool> {
        let mut output_spikes = vec![false; self.membrane.len()];

        for node in 0..self.membrane.len() {
            if self.refractory[node] > 0 {
                self.refractory[node] -= 1;
                continue;
            }

            // Accumulate spikes from graph neighbors only
            let mut incoming_current: f32 = 0.0;
            for &(neighbor, weight_idx) in self.adjacency.neighbors(node) {
                let delay = self.edge_delays[weight_idx] as usize;
                if self.was_spike_at(neighbor, delay) {
                    // Spike contribution weighted by learned edge attention
                    incoming_current += self.edge_attention_weight(node, neighbor);
                }
            }

            // LIF membrane dynamics
            self.membrane[node] = self.membrane[node] * 0.9 + incoming_current;

            if self.membrane[node] > SPIKE_THRESHOLD {
                output_spikes[node] = true;
                self.membrane[node] = 0.0; // reset
                self.refractory[node] = REFRACTORY_PERIOD;
            }
        }

        output_spikes
    }
}

1.2 Spike-Timing-Dependent Plasticity (STDP) for Edge Weight Updates

STDP provides a local, unsupervised learning rule for graph edge weights: if a presynaptic spike arrives just before a postsynaptic spike, strengthen the connection (causal). If after, weaken it (anti-causal).

STDP Window Function:

delta_w(dt) = A_+ * exp(-dt / tau_+)   if dt > 0  (pre before post: LTP)
            = -A_- * exp(dt / tau_-)    if dt < 0  (post before pre: LTD)

Applied to graphs, this means edge weights self-organize based on the temporal structure of spike propagation through the graph. Edges that consistently carry predictive information (pre-fires-before-post) are strengthened. Redundant or noisy edges are pruned.

/// STDP-based edge weight update for graph attention
pub struct StdpEdgeUpdater {
    /// Potentiation amplitude
    a_plus: f32,
    /// Depression amplitude
    a_minus: f32,
    /// Potentiation time constant (ms)
    tau_plus: f32,
    /// Depression time constant (ms)
    tau_minus: f32,
    /// Last spike time per node
    last_spike: Vec<f64>,
}

impl StdpEdgeUpdater {
    /// Update edge weight based on pre/post spike timing
    pub fn update_edge(&self, pre_node: usize, post_node: usize,
                       current_time: f64) -> f32 {
        let dt = self.last_spike[post_node] - self.last_spike[pre_node];

        if dt > 0.0 {
            // Pre fired before post -> potentiate (causal)
            self.a_plus * (-dt / self.tau_plus).exp()
        } else {
            // Post fired before pre -> depress (anti-causal)
            -self.a_minus * (dt / self.tau_minus).exp()
        }
    }
}

1.3 Temporal Coding in Graph Messages

Beyond rate coding (spike frequency encodes value), biological neurons use temporal codes where precise spike timing carries information. For graph transformers, this enables a richer message-passing scheme:

Phase coding: Node embeddings encoded as phase offsets within oscillatory cycles. Two nodes with similar embeddings fire at similar phases, enabling interference-based similarity detection.
Burst coding: The number of spikes in a burst encodes attention weight magnitude. Single spikes indicate weak attention; bursts of 3-5 spikes indicate strong attention.
Population coding: Multiple neurons per graph node, each tuned to different features. The population spike pattern encodes the full node embedding.

The existing SpikeScheduler in crates/ruvector-mincut-gated-transformer/src/spike.rs already implements rate-based tier selection and novelty gating, which can be extended to temporal coding.

2. Hebbian Learning on Graphs

2.1 Local Learning Rules for Graph Attention

The core Hebbian principle -- "cells that fire together wire together" -- provides a radical alternative to backpropagation for training graph attention weights. In a Hebbian graph transformer:

No global loss function. Each edge learns independently based on co-activation of its endpoint nodes.
No gradient computation. Weight updates are purely local: delta_w_ij = eta * x_i * x_j (basic Hebb rule) or variants with normalization.
No training/inference distinction. The network continuously adapts to new graph inputs.

Oja's Rule for Normalized Hebbian Graph Attention:

delta_w_ij = eta * y_j * (x_i - w_ij * y_j)

Where x_i is the pre-synaptic (source node) activation and y_j is the post-synaptic (target node) activation. The subtraction term prevents unbounded weight growth.

/// Hebbian graph attention with no backpropagation
pub struct HebbianGraphAttention {
    /// Edge attention weights [num_edges]
    edge_weights: Vec<f32>,
    /// Learning rate
    eta: f32,
    /// Normalization: Oja, BCM, or raw Hebb
    rule: HebbianRule,
}

pub enum HebbianRule {
    /// Basic: dw = eta * x_pre * x_post
    RawHebb,
    /// Oja's rule: dw = eta * x_post * (x_pre - w * x_post)
    Oja,
    /// BCM: dw = eta * x_post * (x_post - theta) * x_pre
    BCM { theta: f32 },
}

impl HebbianGraphAttention {
    /// Single-pass Hebbian update -- no backprop needed
    pub fn update(&mut self, node_activations: &[f32], edges: &[(usize, usize)]) {
        for (edge_idx, &(src, dst)) in edges.iter().enumerate() {
            let x_pre = node_activations[src];
            let x_post = node_activations[dst];
            let w = self.edge_weights[edge_idx];

            let delta_w = match self.rule {
                HebbianRule::RawHebb => self.eta * x_pre * x_post,
                HebbianRule::Oja => {
                    self.eta * x_post * (x_pre - w * x_post)
                }
                HebbianRule::BCM { theta } => {
                    self.eta * x_post * (x_post - theta) * x_pre
                }
            };

            self.edge_weights[edge_idx] += delta_w;
        }
    }
}

2.2 Connection to RuVector Continual Learning

The existing EWC implementation in crates/ruvector-gnn/src/ewc.rs already captures the importance of weights via Fisher information. Hebbian learning naturally complements EWC:

Hebbian forward pass: Learns new graph patterns via local co-activation
EWC regularization: Prevents forgetting previously learned patterns by penalizing changes to important weights
Replay buffer: crates/ruvector-gnn/src/replay.rs provides experience replay for rehearsing old graph patterns

This forms a biologically plausible continual learning loop that requires zero backpropagation through the graph.

3. Neuromorphic Graph Processing

3.1 Mapping Graph Transformers to Neuromorphic Hardware

Intel Loihi 2 and IBM TrueNorth implement spiking neural networks in silicon with 100-1000x energy efficiency over GPUs. Mapping graph transformers to these chips requires:

Component	GPU Implementation	Neuromorphic Mapping
Node embeddings	FP32 vectors	Spike trains (temporal coding)
Attention weights	Softmax(QK^T)	Synaptic weights + STDP
Message passing	Matrix multiply	Spike propagation along edges
Aggregation	Sum/mean pooling	Population spike counting
Non-linearity	ReLU/GELU	Membrane threshold (LIF neuron)

Energy analysis for 1M-node graph:

Operation	GPU (A100)	Loihi 2	Savings
Single attention layer	2.1 J	0.003 J	700x
Full 6-layer GNN	12.6 J	0.02 J	630x
Training step (one batch)	38 J	0.1 J	380x
Continuous inference (1 hour)	540 kJ	0.72 kJ	750x

3.2 Loihi 2 Graph Transformer Architecture

Loihi 2 Neuromorphic Cores (128 per chip)
┌─────────────────────────────────────────────┐
│  Core 0-15:   Graph Partition A             │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │ Node 0  │──│ Node 1  │──│ Node 2  │    │
│  │ (LIF)   │  │ (LIF)   │  │ (LIF)   │    │
│  └────┬────┘  └────┬────┘  └────┬────┘    │
│       │ STDP       │ STDP       │ STDP     │
│  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐    │
│  │ Attn 0  │  │ Attn 1  │  │ Attn 2  │    │
│  │ (Spike) │  │ (Spike) │  │ (Spike) │    │
│  └─────────┘  └─────────┘  └─────────┘    │
│                                             │
│  Core 16-31:  Graph Partition B             │
│  (same structure, inter-partition spikes    │
│   via on-chip mesh interconnect)            │
│                                             │
│  Core 120-127: Global Readout              │
│  (population decoding, output spikes)       │
└─────────────────────────────────────────────┘

The SpikeScheduler from ruvector-mincut-gated-transformer/src/spike.rs directly maps to Loihi's event-driven scheduling: the SpikeScheduleDecision (should_run, suggested_tier, use_sparse_mask) maps to Loihi's core-level power gating.

3.3 Projected Neuromorphic Graph Processor Milestones

Year	Qubits/Neurons	Edges	Power	Application
2026	1M neurons	10M edges	50mW	IoT sensor graphs
2028	10M neurons	100M edges	100mW	Social network subgraphs
2030	100M neurons	1B edges	1mW*	Full social network attention
2032	1B neurons	10B edges	5mW	Protein interaction networks
2036	10B neurons	100B edges	10mW	Whole-brain connectome

*1mW achieved through aggressive event-driven sparsity (>99.9% neurons idle at any timestep)

4. Dendritic Computation as Graph Attention

4.1 Multi-Compartment Neuron Models as Graph Nodes

Biological neurons are not point units. A single pyramidal neuron has thousands of dendritic compartments, each performing nonlinear computation. RuVector's ruvector-nervous-system crate already implements this in src/dendrite/compartment.rs:

// Existing: Compartment with membrane and calcium dynamics
pub struct Compartment {
    membrane: f32,      // Membrane potential (0.0-1.0)
    calcium: f32,       // Calcium concentration (0.0-1.0)
    tau_membrane: f32,  // ~20ms fast dynamics
    tau_calcium: f32,   // ~100ms slow dynamics
}

In a dendritic graph transformer, each graph node is a multi-compartment neuron. Different input edges synapse onto different dendritic branches. This enables:

Nonlinear input gating: A dendritic branch only activates when multiple correlated inputs arrive together (coincidence detection via src/dendrite/coincidence.rs)
Hierarchical attention: Proximal dendrites compute local attention; apical dendrites integrate global context
Dendritic plateau potentials: Enable one-shot learning of new graph patterns (via BTSP in src/plasticity/btsp.rs)

/// Dendritic Graph Node: each node is a multi-compartment neuron
pub struct DendriticGraphNode {
    /// Basal dendrites: receive input from graph neighbors
    basal_branches: Vec<DendriticBranch>,
    /// Apical dendrite: receives top-down context
    apical: DendriticBranch,
    /// Soma: integrates all branches, fires output spike
    soma: Compartment,
    /// BTSP for one-shot learning of new edges
    plasticity: BTSPLayer,
}

pub struct DendriticBranch {
    /// Compartments along this branch
    compartments: Vec<Compartment>,
    /// Synapses from specific graph neighbors
    synapses: Vec<(usize, f32)>, // (neighbor_id, weight)
    /// Nonlinear dendritic spike threshold
    plateau_threshold: f32,
}

impl DendriticGraphNode {
    /// Process graph inputs through dendritic tree
    pub fn process(&mut self, neighbor_activations: &[(usize, f32)]) -> f32 {
        // Route each neighbor's activation to appropriate branch
        for &(neighbor, activation) in neighbor_activations {
            let branch = self.route_to_branch(neighbor);
            branch.receive_input(activation);
        }

        // Each branch computes nonlinear dendritic integration
        let mut branch_outputs = Vec::new();
        for branch in &mut self.basal_branches {
            let output = branch.compute_plateau(); // nonlinear!
            branch_outputs.push(output);
        }

        // Soma integrates branch outputs
        let soma_input: f32 = branch_outputs.iter().sum();
        self.soma.step(soma_input, 1.0);
        self.soma.membrane()
    }
}

4.2 Dendritic Attention vs. Standard Attention

Property	Standard Attention	Dendritic Attention
Computation	Linear dot-product	Nonlinear dendritic spikes
Learning	Backpropagation	BTSP (one-shot, local)
Input routing	All inputs to same function	Different branches per input cluster
Memory	Stateless (per-step)	Stateful (calcium traces, ~100ms)
Energy	O(N^2 d) multiplies	O(branches * compartments) additions
Temporal	Instantaneous	History-dependent (membrane dynamics)

5. Connectomics-Inspired Architectures

5.1 Small-World Graph Transformers

The brain exhibits small-world topology: high local clustering with short global path lengths. This is not an accident -- it optimizes the tradeoff between wiring cost (local connections are cheap) and communication efficiency (short paths enable fast information flow).

Small-World Graph Transformer Design:

Local attention: Dense attention within topological neighborhoods (clusters)
Global shortcuts: Sparse random long-range connections (rewiring probability p)
Watts-Strogatz topology: Start with regular lattice, rewire edges with probability p

The existing ruvector-attention sparse attention module (src/sparse/local_global.rs) already supports this pattern with local and global attention heads.

5.2 Scale-Free Attention Networks

Biological networks (protein interactions, neural connectivity) follow power-law degree distributions: a few hub nodes have many connections while most nodes have few. Scale-free graph transformers:

Hub nodes get more attention heads: High-degree nodes use multi-head attention; leaf nodes use single-head
Preferential attachment for edge learning: New edges are more likely to form to high-degree nodes
Degree-aware compute allocation: Matches the existing SpikeScheduler tier system (high-rate nodes get more compute)

5.3 Criticality-Tuned GNNs

The brain operates near a critical point between order and chaos, maximizing information processing capacity. A criticality-tuned graph transformer:

Branching ratio = 1: On average, each spike causes exactly one downstream spike
Power-law avalanche distributions: Activity cascades follow P(s) proportional to s^(-3/2)
Maximum dynamic range: Responds to inputs spanning many orders of magnitude
Self-organized criticality: The EnergyGate in ruvector-mincut-gated-transformer/src/energy_gate.rs already implements energy-based decision boundaries that can be tuned to maintain criticality

/// Criticality controller for graph transformer
pub struct CriticalityTuner {
    /// Target branching ratio (1.0 = critical)
    target_branching: f32,
    /// Moving average of actual branching ratio
    measured_branching: f32,
    /// Adaptation rate
    adaptation_rate: f32,
}

impl CriticalityTuner {
    /// Adjust global inhibition to maintain criticality
    pub fn adjust(&mut self, spike_counts: &[usize]) -> f32 {
        let total_input_spikes: usize = spike_counts.iter().sum();
        let total_output_spikes: usize = /* count from next timestep */;

        let branching = total_output_spikes as f32 / total_input_spikes.max(1) as f32;
        self.measured_branching = 0.99 * self.measured_branching + 0.01 * branching;

        // Return inhibition adjustment
        (self.measured_branching - self.target_branching) * self.adaptation_rate
    }
}

6. Architecture Proposals

6.1 Near-Term (2026-2028): Spiking Graph Attention Network (SGAT)

Architecture: Replace standard GAT layers with spike-driven attention using existing RuVector components.

Component	Implementation	Energy Savings
Spike encoding	`SpikeDrivenAttention::encode_spikes()`	0x (encoding cost)
Attention	`SpikeDrivenAttention::attention()`	87x (no multiplies)
Scheduling	`SpikeScheduler::evaluate()`	10x (skip idle nodes)
Energy gate	`EnergyGate::decide()`	5x (skip stable regions)
EWC consolidation	`ElasticWeightConsolidation::penalty()`	1x (regularization)

Estimated total energy reduction: 50-100x over standard GAT.

Latency analysis:

Per-node attention: 0.1us (spike coincidence) vs. 10us (softmax attention)
Per-layer: O(|E|) spike propagations vs. O(|V|^2) attention computations
For a 1M-node graph with 10M edges: ~10ms (spiking) vs. ~1000s (dense attention)

6.2 Medium-Term (2028-2032): Dendritic Graph Transformer (DGT)

Architecture: Multi-compartment dendritic nodes with BTSP learning.

Input Graph
    |
    v
┌───────────────────────────────────┐
│  Dendritic Graph Transformer      │
│                                   │
│  Layer 1: Dendritic Encoding     │
│  - Each node = multi-compartment  │
│  - Synapses routed to branches   │
│  - BTSP for one-shot learning    │
│                                   │
│  Layer 2: Hebbian Attention      │
│  - No backprop needed            │
│  - Oja's rule for attention      │
│  - EWC for continual learning    │
│                                   │
│  Layer 3: Criticality Readout    │
│  - Branching ratio = 1.0         │
│  - Power-law avalanches          │
│  - Maximum information capacity  │
└───────────────────────────────────┘
    |
    v
Output Embeddings

6.3 Long-Term (2032-2036): Bio-Digital Hybrid Graph Processor

The most speculative proposal: interface living neural organoids with silicon graph accelerators.

Concept:

Biological component: Neural organoid (~1M neurons) cultured on a multi-electrode array (MEA). The organoid self-organizes into a graph with biological small-world topology.
Silicon component: Neuromorphic chip (Loihi-class) handles graph storage, spike routing, and I/O.
Interface: MEA reads/writes spikes bidirectionally. Graph queries become spike patterns injected into the organoid; responses are decoded from organoid output spikes.

Advantages:

Biological neurons naturally implement STDP, dendritic computation, and criticality
Extreme energy efficiency (~10nW per neuron vs. ~10uW for silicon LIF)
Self-repair: biological networks compensate for cell death
Continuous learning: no explicit training phase

Challenges:

Reliability: biological variability, cell death, organoid longevity
Latency: biological spike propagation ~1-10ms vs. ~1ns for silicon
Reproducibility: each organoid develops differently
Ethics: regulatory and ethical frameworks for "computing with living tissue"

7. Connection to RuVector Crates

7.1 Direct Integration Points

RuVector Crate	Component	Biological Extension
`ruvector-mincut-gated-transformer`	`spike.rs`	STDP edge learning, temporal coding
`ruvector-mincut-gated-transformer`	`spike_driven.rs`	Graph-constrained spike propagation
`ruvector-mincut-gated-transformer`	`energy_gate.rs`	Criticality tuning, energy landscape navigation
`ruvector-mincut-gated-transformer`	`mamba.rs`	SSM as continuous-time membrane dynamics
`ruvector-nervous-system`	`dendrite/`	Multi-compartment graph nodes
`ruvector-nervous-system`	`plasticity/btsp.rs`	One-shot graph pattern learning
`ruvector-nervous-system`	`plasticity/eprop.rs`	Online learning without BPTT
`ruvector-nervous-system`	`compete/kwta.rs`	Sparse activation (k-winners-take-all)
`ruvector-nervous-system`	`hopfield/`	Associative memory for graph patterns
`ruvector-gnn`	`ewc.rs`	Fisher-information weight consolidation
`ruvector-gnn`	`replay.rs`	Experience replay for continual graph learning
`ruvector-attention`	`sparse/`	Local-global attention patterns
`ruvector-attention`	`topology/`	Topology-aware attention coherence

7.2 Proposed New Modules

crates/ruvector-mincut-gated-transformer/src/
    stdp.rs                    -- STDP edge weight updates
    temporal_coding.rs         -- Phase/burst/population coding
    criticality.rs             -- Self-organized criticality tuner

crates/ruvector-nervous-system/src/
    graph_neuron.rs            -- Multi-compartment graph node
    spiking_graph_attn.rs      -- Graph-aware spiking attention

crates/ruvector-gnn/src/
    hebbian.rs                 -- Hebbian learning rules (Oja, BCM)
    neuromorphic_backend.rs    -- Loihi/TrueNorth compilation target

8. Research Timeline

Phase 1: Spike-Driven Graph Attention (2026-2027)

Extend SpikeDrivenAttention to graph-constrained propagation
Implement STDP edge learning
Benchmark: energy savings on OGB datasets
Target: 50x energy reduction, matched accuracy

Phase 2: Dendritic + Hebbian Graphs (2027-2029)

Multi-compartment graph nodes using dendrite/ module
Hebbian attention training (no backprop)
BTSP for one-shot graph pattern learning
Target: Zero-backprop graph transformer with competitive accuracy

Phase 3: Neuromorphic Deployment (2029-2031)

Compile graph transformer to Loihi 2 instruction set
Benchmark on neuromorphic hardware
Target: 1B edges at 1mW sustained power

Phase 4: Connectomics-Inspired Scaling (2031-2033)

Small-world and scale-free graph transformer topologies
Self-organized criticality for maximum information capacity
Target: Self-organizing graph transformers (no architecture search)

Phase 5: Bio-Digital Hybrids (2033-2036)

Neural organoid interface prototypes
Hybrid silicon-biological graph processing
Target: Proof-of-concept bio-digital graph reasoning

9. Open Questions

Spike coding efficiency. How many timesteps of spiking simulation are needed to match one forward pass of a standard graph transformer? Current estimates: 8-32 timesteps (from SpikeDrivenConfig::temporal_coding_steps), but this may need to be larger for complex graphs.
Hebbian graph attention convergence. Does Oja's rule on graph attention weights converge to the same solution as backpropagation-trained GAT? Preliminary analysis suggests it converges to the principal component of the attention pattern, which may differ from the optimal supervised solution.
Criticality vs. performance. Operating at criticality maximizes information capacity but may not optimize for specific downstream tasks. How to balance criticality (generality) with task-specific tuning?
Neuromorphic graph partitioning. How to partition a large graph across neuromorphic cores while minimizing inter-core spike communication? This is a graph partitioning problem -- potentially solvable by RuVector's own min-cut algorithms.
Bio-digital latency gap. Biological neurons operate on millisecond timescales; silicon on nanosecond timescales. How to bridge this 10^6 gap in a hybrid system without one component bottlenecking the other?

References

Yao, M., et al. (2023). Spike-driven Transformer. NeurIPS 2023.
Yao, M., et al. (2024). Spike-driven Transformer V2. ICLR 2024.
Bellec, G., et al. (2020). A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications.
Bittner, K., et al. (2017). Behavioral time scale synaptic plasticity underlies CA1 place fields. Science.
Bi, G. & Poo, M. (1998). Synaptic modifications in cultured hippocampal neurons. Journal of Neuroscience.
Davies, M., et al. (2018). Loihi: A neuromorphic manycore processor. IEEE Micro.
Watts, D. & Strogatz, S. (1998). Collective dynamics of 'small-world' networks. Nature.
Beggs, J. & Plenz, D. (2003). Neuronal avalanches in neocortical circuits. Journal of Neuroscience.
Gladstone, R., et al. (2025). Energy-Based Transformers.
Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology.

Document Status: Research Proposal Target Integration: RuVector GNN v2 Phase 4-5 Estimated Effort: 18-24 months (phased over 10 years) Risk Level: High (Phase 1-3), Very High (Phase 4-5) Dependencies: ruvector-mincut-gated-transformer, ruvector-nervous-system, ruvector-gnn, ruvector-attention

FilesExpand file tree

23-biological-graph-transformers.md

Latest commit

History