This document surveys cutting-edge GNN architectures that push beyond traditional message passing to better capture the interplay between latent space representations and graph topology. We focus on architectures particularly relevant to hierarchical graphs like HNSW.
Key Themes: Graph Transformers, Hyperbolic GNNs, Neural ODEs, Equivariant Networks, Generative Models
Limitations of Message Passing:
- Limited receptive field (k-hop with k layers)
- Over-smoothing with many layers
- Difficulty capturing long-range dependencies
Solution: Replace message passing with full attention
Key Innovation: Structural encodings + Transformer attention
Paper: Ying et al. (2021) - "Do Transformers Really Perform Bad for Graph Representation?"
Architecture:
Input: Graph G = (V, E) with features X
Output: Node embeddings H
1. Centrality Encoding:
z_v = Embed(degree(v))
2. Spatial Encoding (Shortest Path):
b_ij = Embed(SP_distance(i, j))
3. Edge Encoding:
e_ij = Embed(edge_features(i, j))
4. Transformer Attention:
Attention(Q, K, V) = softmax((QK^T + B) / √d) V
where B[i,j] = b_ij (spatial bias)
5. Multi-layer stacking with LayerNorm
Implementation Sketch:
pub struct Graphormer {
num_layers: usize,
hidden_dim: usize,
num_heads: usize,
// Encoding layers
centrality_embedding: Embedding,
spatial_embedding: Embedding,
edge_embedding: Embedding,
// Transformer layers
transformer_layers: Vec<GraphTransformerLayer>,
}
pub struct GraphTransformerLayer {
attention: MultiHeadAttention,
ffn: FeedForwardNetwork,
norm1: LayerNorm,
norm2: LayerNorm,
}
impl Graphormer {
fn forward(
&self,
node_features: &[Vec<f32>],
edge_index: &[(usize, usize)],
edge_features: &[Vec<f32>],
shortest_paths: &Array2<usize>, // Precomputed SP distances
degrees: &[usize],
) -> Vec<Vec<f32>> {
// 1. Add centrality encoding
let mut h: Vec<Vec<f32>> = node_features.iter()
.zip(degrees.iter())
.map(|(feat, °)| {
let cent_enc = self.centrality_embedding.forward(deg);
concatenate(feat, ¢_enc)
})
.collect();
// 2. Compute spatial and edge biases
let spatial_bias = self.compute_spatial_bias(shortest_paths);
let edge_bias = self.compute_edge_bias(edge_index, edge_features);
// 3. Transformer layers
for layer in &self.transformer_layers {
h = layer.forward(&h, &spatial_bias, &edge_bias);
}
h
}
fn compute_spatial_bias(&self, shortest_paths: &Array2<usize>) -> Array2<f32> {
let n = shortest_paths.nrows();
let mut bias = Array2::zeros((n, n));
for i in 0..n {
for j in 0..n {
let sp_dist = shortest_paths[(i, j)];
let sp_encoding = self.spatial_embedding.forward(sp_dist);
bias[(i, j)] = sp_encoding[0]; // Scalar bias
}
}
bias
}
fn compute_edge_bias(
&self,
edge_index: &[(usize, usize)],
edge_features: &[Vec<f32>],
) -> HashMap<(usize, usize), f32> {
edge_index.iter()
.zip(edge_features.iter())
.map(|(&(i, j), feat)| {
let edge_enc = self.edge_embedding.forward_features(feat);
((i, j), edge_enc[0])
})
.collect()
}
}
impl GraphTransformerLayer {
fn forward(
&self,
x: &[Vec<f32>],
spatial_bias: &Array2<f32>,
edge_bias: &HashMap<(usize, usize), f32>,
) -> Vec<Vec<f32>> {
// 1. Multi-head attention with structural biases
let attn_out = self.attention.forward_with_bias(x, spatial_bias, edge_bias);
// 2. Residual + Norm
let x_norm1 = self.norm1.forward(&add_residual(x, &attn_out));
// 3. Feed-forward
let ffn_out = self.ffn.forward(&x_norm1);
// 4. Residual + Norm
self.norm2.forward(&add_residual(&x_norm1, &ffn_out))
}
}Benefits for HNSW:
- Global Attention: All nodes can attend to all others
- Structural Encoding: Shortest paths encode HNSW layer information
- Edge Features: Naturally incorporates edge weights/attributes
Challenges:
- O(n²) complexity: Expensive for large graphs
- Memory: Quadratic attention matrix
- Loss of Inductive Bias: Needs more data than message passing
Paper: Rampášek et al. (2022)
Key Idea: Combine message passing + attention
GPS Layer = Message Passing + Global Attention + FFN
h_v^{l+1} = h_v^l + MLP(MP(h_v^l) || GlobalAttn(h_v^l))
Advantages:
- Best of both worlds (local + global)
- More efficient than pure attention
- Strong inductive bias from message passing
Implementation:
pub struct GPSLayer {
local_mp: RuvectorLayer, // Local message passing
global_attn: MultiHeadAttention, // Global attention
fusion: Linear, // Combine local + global
ffn: FeedForwardNetwork,
norm: LayerNorm,
}
impl GPSLayer {
fn forward(
&self,
node_features: &[Vec<f32>],
neighbor_indices: &[Vec<usize>],
all_node_features: &[Vec<f32>], // For global attention
) -> Vec<Vec<f32>> {
let n = node_features.len();
let mut outputs = Vec::new();
for (i, features) in node_features.iter().enumerate() {
// 1. Local message passing
let neighbors: Vec<Vec<f32>> = neighbor_indices[i].iter()
.map(|&j| all_node_features[j].clone())
.collect();
let local_out = self.local_mp.forward(
features,
&neighbors,
&vec![1.0; neighbors.len()],
);
// 2. Global attention (attend to all nodes)
let global_out = self.global_attn.forward(
features,
all_node_features,
all_node_features,
);
// 3. Fusion
let combined = self.fusion.forward(
&concatenate(&local_out, &global_out)
);
// 4. FFN + Residual
let ffn_out = self.ffn.forward(&combined);
let output = self.norm.forward(&add(features, &ffn_out));
outputs.push(output);
}
outputs
}
}Hierarchical Graphs (like HNSW) are better represented in hyperbolic space:
- Tree-like structures
- Exponential growth (volume ∝ e^r)
- Low distortion embeddings
Euclidean Space: Volume ∝ r³ (polynomial) Hyperbolic Space: Volume ∝ e^r (exponential)
Paper: Chami et al. (2019) - "Hyperbolic Graph Convolutional Neural Networks"
Key Operations in Poincaré Ball:
1. Möbius Addition (⊕):
x ⊕ y = [(1 + 2⟨x,y⟩ + ||y||²)x + (1 - ||x||²)y] / [1 + 2⟨x,y⟩ + ||x||²||y||²]
2. Exponential Map (exp_x):
exp_x(v) = x ⊕ [tanh(λ_x ||v|| / 2) · v / ||v||]
where λ_x = 2 / (1 - ||x||²) # Conformal factor
3. Logarithmic Map (log_x):
log_x(y) = (2 / λ_x) · arctanh(||−x ⊕ y||) · (−x ⊕ y) / ||−x ⊕ y||
Implementation:
pub struct HyperbolicGCN {
curvature: f32, // Negative curvature (e.g., -1.0)
layers: Vec<HyperbolicLayer>,
}
pub struct HyperbolicLayer {
weight: Array2<f32>,
bias: Vec<f32>,
curvature: f32,
}
impl HyperbolicLayer {
// Hyperbolic linear transformation
fn linear(&self, x: &[f32]) -> Vec<f32> {
// 1. Map to tangent space at origin
let x_tangent = self.log_map_origin(x);
// 2. Apply Euclidean linear transformation
let y_tangent = self.weight.dot(&Array1::from_vec(x_tangent)).to_vec();
// 3. Map back to hyperbolic space
self.exp_map_origin(&y_tangent)
}
// Hyperbolic aggregation
fn aggregate(&self, neighbors: &[Vec<f32>]) -> Vec<f32> {
if neighbors.is_empty() {
return vec![0.0; neighbors[0].len()];
}
// Use Einstein midpoint (hyperbolic mean)
self.einstein_midpoint(neighbors)
}
// Exponential map from origin
fn exp_map_origin(&self, v: &[f32]) -> Vec<f32> {
let norm = l2_norm(v);
let c = self.curvature.abs();
if norm < 1e-10 {
return v.to_vec();
}
let coef = (c.sqrt() * norm).tanh() / (c.sqrt() * norm);
v.iter().map(|&x| coef * x).collect()
}
// Logarithmic map to origin
fn log_map_origin(&self, x: &[f32]) -> Vec<f32> {
let norm = l2_norm(x);
let c = self.curvature.abs();
if norm < 1e-10 {
return x.to_vec();
}
let coef = (c.sqrt() * norm).atanh() / (c.sqrt() * norm);
x.iter().map(|&xi| coef * xi).collect()
}
// Möbius addition
fn mobius_add(&self, x: &[f32], y: &[f32]) -> Vec<f32> {
let c = self.curvature.abs();
let x_norm_sq = l2_norm_squared(x);
let y_norm_sq = l2_norm_squared(y);
let xy_dot = dot_product(x, y);
let numerator_x_coef = 1.0 + 2.0 * c * xy_dot + c * y_norm_sq;
let numerator_y_coef = 1.0 - c * x_norm_sq;
let denominator = 1.0 + 2.0 * c * xy_dot + c * c * x_norm_sq * y_norm_sq;
x.iter().zip(y.iter())
.map(|(&xi, &yi)| {
(numerator_x_coef * xi + numerator_y_coef * yi) / denominator
})
.collect()
}
// Einstein midpoint (hyperbolic mean)
fn einstein_midpoint(&self, points: &[Vec<f32>]) -> Vec<f32> {
if points.is_empty() {
return vec![];
}
let dim = points[0].len();
let mut mean = vec![0.0; dim];
for point in points {
mean = self.mobius_add(&mean, point);
}
// Scale by 1/n in tangent space
let mean_tangent = self.log_map_origin(&mean);
let scaled_tangent: Vec<f32> = mean_tangent.iter()
.map(|&x| x / points.len() as f32)
.collect();
self.exp_map_origin(&scaled_tangent)
}
fn forward(
&self,
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
) -> Vec<f32> {
// 1. Aggregate neighbors in hyperbolic space
let aggregated = self.aggregate(neighbor_embeddings);
// 2. Combine with self (Möbius addition)
let combined = self.mobius_add(node_embedding, &aggregated);
// 3. Hyperbolic linear transformation
let transformed = self.linear(&combined);
// 4. Hyperbolic activation (e.g., identity, or hyperbolic ReLU)
transformed
}
}Benefits for HNSW:
- Natural Hierarchies: Higher HNSW layers = closer to origin (root)
- Exponential Capacity: Fit large trees with low distortion
- Distance Preservation: Hyperbolic distance ≈ tree distance
Challenges:
- Numerical Instability: Near boundary (||x|| → 1)
- Complex Gradients: Riemannian optimization required
- Full Pipeline: All operations must be hyperbolic-aware
Idea: Different graph components have different geometries
Embedding space: R^d₁ × H^d₂ × S^d₃
where:
R^d₁: Euclidean (local, grid-like structures)
H^d₂: Hyperbolic (hierarchies)
S^d₃: Spherical (cyclic, clustered)
Implementation:
pub enum ManifoldComponent {
Euclidean(Vec<f32>),
Hyperbolic(Vec<f32>),
Spherical(Vec<f32>),
}
pub struct ProductManifoldEmbedding {
components: Vec<ManifoldComponent>,
}
impl ProductManifoldEmbedding {
fn distance(&self, other: &Self) -> f32 {
self.components.iter()
.zip(other.components.iter())
.map(|(c1, c2)| match (c1, c2) {
(ManifoldComponent::Euclidean(x), ManifoldComponent::Euclidean(y)) =>
l2_distance(x, y),
(ManifoldComponent::Hyperbolic(x), ManifoldComponent::Hyperbolic(y)) =>
hyperbolic_distance(x, y, -1.0),
(ManifoldComponent::Spherical(x), ManifoldComponent::Spherical(y)) =>
spherical_distance(x, y),
_ => panic!("Mismatched manifold types"),
})
.sum::<f32>()
}
// Aggregate in product space
fn aggregate(&self, embeddings: &[ProductManifoldEmbedding]) -> ProductManifoldEmbedding {
let mut aggregated_components = Vec::new();
for (i, component) in self.components.iter().enumerate() {
let component_values: Vec<_> = embeddings.iter()
.map(|emb| &emb.components[i])
.collect();
let aggregated = match component {
ManifoldComponent::Euclidean(_) =>
ManifoldComponent::Euclidean(euclidean_mean(&component_values)),
ManifoldComponent::Hyperbolic(_) =>
ManifoldComponent::Hyperbolic(hyperbolic_mean(&component_values)),
ManifoldComponent::Spherical(_) =>
ManifoldComponent::Spherical(spherical_mean(&component_values)),
};
aggregated_components.push(aggregated);
}
ProductManifoldEmbedding {
components: aggregated_components,
}
}
}Motivation: GNN layers are discrete steps of a continuous diffusion process
Standard GNN:
h^{(l+1)} = h^{(l)} + GNN(h^{(l)}, G)
Neural ODE:
dh/dt = f(h(t), G, θ)
h(T) = h(0) + ∫₀^T f(h(t), G, θ) dt
Benefits:
- Adaptive Depth: Network learns optimal "time" T
- Memory Efficient: Backprop via adjoint method
- Smooth Representations: Continuous trajectory in latent space
Implementation:
pub struct GraphNeuralODE {
dynamics: RuvectorLayer, // f(h, G, θ)
ode_solver: ODESolver,
}
impl GraphNeuralODE {
fn forward(
&self,
initial_embeddings: &[Vec<f32>],
graph_structure: &GraphStructure,
time_horizon: f32,
) -> Vec<Vec<f32>> {
// Solve ODE: h(T) = h(0) + ∫₀^T f(h(t), G) dt
self.ode_solver.solve(
initial_embeddings,
|h, t| self.dynamics_function(h, graph_structure),
0.0,
time_horizon,
)
}
fn dynamics_function(
&self,
h: &[Vec<f32>],
graph: &GraphStructure,
) -> Vec<Vec<f32>> {
// dh/dt = GNN(h, G)
h.iter()
.enumerate()
.map(|(i, embedding)| {
let neighbors: Vec<_> = graph.neighbors(i)
.iter()
.map(|&j| h[j].clone())
.collect();
self.dynamics.forward(
embedding,
&neighbors,
&vec![1.0; neighbors.len()],
)
})
.collect()
}
}
// ODE Solver (e.g., Runge-Kutta 4th order)
pub struct ODESolver;
impl ODESolver {
fn solve<F>(
&self,
y0: &[Vec<f32>],
f: F,
t0: f32,
tf: f32,
) -> Vec<Vec<f32>>
where
F: Fn(&[Vec<f32>], f32) -> Vec<Vec<f32>>,
{
let num_steps = 10;
let dt = (tf - t0) / num_steps as f32;
let mut y = y0.to_vec();
for step in 0..num_steps {
let t = t0 + step as f32 * dt;
// RK4: k1 = f(t, y)
let k1 = f(&y, t);
// k2 = f(t + dt/2, y + k1*dt/2)
let y_k1 = add_scaled(&y, &k1, dt / 2.0);
let k2 = f(&y_k1, t + dt / 2.0);
// k3 = f(t + dt/2, y + k2*dt/2)
let y_k2 = add_scaled(&y, &k2, dt / 2.0);
let k3 = f(&y_k2, t + dt / 2.0);
// k4 = f(t + dt, y + k3*dt)
let y_k3 = add_scaled(&y, &k3, dt);
let k4 = f(&y_k3, t + dt);
// y_{n+1} = y_n + (dt/6) * (k1 + 2k2 + 2k3 + k4)
y = add_rk4_increment(&y, &k1, &k2, &k3, &k4, dt);
}
y
}
}Adjoint Method for Memory-Efficient Backprop:
// Instead of storing all intermediate states, solve backwards ODE
fn backward_ode(
&self,
final_state: &[Vec<f32>],
adjoint_final: &[Vec<f32>],
time_horizon: f32,
) -> (Vec<Vec<f32>>, Vec<f32>) { // (gradients, parameter gradients)
// Solve backward: da/dt = -∂f/∂h · a
// where a is adjoint variable
self.ode_solver.solve_backward(adjoint_final, time_horizon, 0.0)
}Motivation: Geometric graphs (molecules, point clouds) require invariance to rotations/translations
Equivariance Property:
f(T(x)) = T(f(x))
where T is a transformation (e.g., rotation)
EGNN (E(n) Equivariant GNN):
Paper: Satorras et al. (2021)
Node features: h_v ∈ R^d (invariant)
Node positions: x_v ∈ R^3 (equivariant)
Message: m_ij = φ_e(h_i, h_j, ||x_i - x_j||², a_ij)
Aggregation: m_i = Σ_j m_ij
Update features: h'_i = φ_h(h_i, m_i)
Update positions: x'_i = x_i + Σ_j (x_i - x_j) φ_x(m_ij)
Key: Distances and relative positions are used (rotationally invariant)
Implementation:
pub struct EquivariantGNN {
message_mlp: MLP,
node_mlp: MLP,
coord_mlp: MLP,
}
impl EquivariantGNN {
fn forward(
&self,
node_features: &[Vec<f32>],
node_positions: &[Vec<f32>], // 3D coordinates
edges: &[(usize, usize)],
) -> (Vec<Vec<f32>>, Vec<Vec<f32>>) { // (updated features, updated positions)
let n = node_features.len();
let mut new_features = node_features.to_vec();
let mut new_positions = node_positions.to_vec();
for &(i, j) in edges {
// 1. Compute edge features (rotationally invariant)
let rel_pos = subtract(&node_positions[i], &node_positions[j]);
let dist_sq = l2_norm_squared(&rel_pos);
let edge_input = concatenate3(
&node_features[i],
&node_features[j],
&[dist_sq],
);
// 2. Message
let message = self.message_mlp.forward(&edge_input);
// 3. Update features (invariant)
new_features[i] = self.node_mlp.forward(
&concatenate(&node_features[i], &message)
);
// 4. Update positions (equivariant)
let coord_weight = self.coord_mlp.forward(&message)[0];
for k in 0..3 {
new_positions[i][k] += coord_weight * rel_pos[k];
}
}
(new_features, new_positions)
}
}Application to HNSW:
- If embeddings have geometric interpretation
- Preserves graph structure under transformations
- Useful for 3D data (e.g., protein structures)
Goal: Learn latent distribution of graphs, enable generation
Encoder: G → q(z | G)
Decoder: z → p(G | z)
Loss: ELBO = E[log p(G | z)] - β KL(q(z | G) || p(z))
Implementation:
pub struct GraphVAE {
encoder: RuvectorLayer,
mu_layer: Linear,
logvar_layer: Linear,
decoder: GraphDecoder,
}
impl GraphVAE {
fn encode(&self, graph: &Graph) -> (Vec<f32>, Vec<f32>) {
// Encode each node
let node_embeddings: Vec<_> = (0..graph.num_nodes())
.map(|v| {
let neighbors = graph.neighbor_embeddings(v);
self.encoder.forward(&graph.features(v), &neighbors, &[])
})
.collect();
// Pool to graph-level
let graph_embedding = mean_pool(&node_embeddings);
// Reparameterization parameters
let mu = self.mu_layer.forward(&graph_embedding);
let logvar = self.logvar_layer.forward(&graph_embedding);
(mu, logvar)
}
fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
let eps: Vec<f32> = (0..mu.len())
.map(|_| rand::thread_rng().sample(StandardNormal))
.collect();
mu.iter().zip(std.iter()).zip(eps.iter())
.map(|((&m, &s), &e)| m + s * e)
.collect()
}
fn decode(&self, z: &[f32], num_nodes: usize) -> AdjacencyMatrix {
// Generate node embeddings from latent z
let node_embeddings = self.decoder.generate_node_embeddings(z, num_nodes);
// Generate edges via pairwise scoring
let mut adj = AdjacencyMatrix::new(num_nodes);
for i in 0..num_nodes {
for j in i+1..num_nodes {
let score = dot_product(&node_embeddings[i], &node_embeddings[j]);
let prob = sigmoid(score);
if rand::random::<f32>() < prob {
adj.add_edge(i, j);
}
}
}
adj
}
fn loss(
&self,
graph: &Graph,
beta: f32, // KL weight
) -> f32 {
// Encode
let (mu, logvar) = self.encode(graph);
// Sample
let z = self.reparameterize(&mu, &logvar);
// Decode
let reconstructed_adj = self.decode(&z, graph.num_nodes());
// Reconstruction loss
let recon_loss = bce_loss(&reconstructed_adj, graph.adjacency());
// KL divergence
let kl_loss: f32 = mu.iter().zip(logvar.iter())
.map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
.sum();
recon_loss + beta * kl_loss
}
}Idea: Gradually denoise from Gaussian noise to graph structure
Forward process: G → G_1 → ... → G_T (pure noise)
Reverse process: G_T → ... → G_1 → G (denoised graph)
Graph Denoising Diffusion:
pub struct GraphDiffusionModel {
denoiser: RuvectorLayer,
num_steps: usize,
beta_schedule: Vec<f32>, // Noise schedule
}
impl GraphDiffusionModel {
fn forward_diffusion(
&self,
graph: &Graph,
t: usize,
) -> NoisyGraph {
// Add noise to graph structure
let beta_t = self.beta_schedule[t];
let alpha_t = 1.0 - beta_t;
// Perturb adjacency matrix
let noisy_adj = graph.adjacency()
.mapv(|a| a * alpha_t.sqrt() + rand::random::<f32>() * beta_t.sqrt());
NoisyGraph::new(noisy_adj, t)
}
fn reverse_diffusion_step(
&self,
noisy_graph: &NoisyGraph,
) -> NoisyGraph {
// Predict noise using GNN
let predicted_noise = self.denoiser.forward_graph(noisy_graph);
// Denoise
let t = noisy_graph.timestep;
let beta_t = self.beta_schedule[t];
let alpha_t = 1.0 - beta_t;
let denoised_adj = (noisy_graph.adjacency - predicted_noise * beta_t.sqrt())
/ alpha_t.sqrt();
NoisyGraph::new(denoised_adj, t - 1)
}
fn generate(&self, num_nodes: usize) -> Graph {
// Start from pure noise
let mut noisy_graph = NoisyGraph::random(num_nodes, self.num_steps);
// Iteratively denoise
for t in (0..self.num_steps).rev() {
noisy_graph = self.reverse_diffusion_step(&noisy_graph);
}
noisy_graph.to_graph()
}
}| Architecture | Receptive Field | Complexity | Geometry | Use Case |
|---|---|---|---|---|
| RuVector (Current) | K-hop | O(d·h²) | Euclidean | General HNSW |
| Graphormer | Global | O(n²·h) | Euclidean | Small-medium graphs |
| GPS | Global + Local | O(n²·h + d·h²) | Euclidean | Best of both |
| HGCN | K-hop | O(d·h²) | Hyperbolic | Hierarchical HNSW |
| Mixed-Curvature | K-hop | O(d·h²) | Mixed | Heterogeneous |
| Neural ODE | Continuous | O(T·d·h²) | Euclidean | Smooth dynamics |
| EGNN | 1-hop | O(d·h²) | Geometric | 3D geometric data |
| GVAE | K-hop | O(d·h²) | Euclidean | Generative tasks |
| Diffusion | Iterative | O(T·d·h²) | Euclidean | High-quality generation |
Immediate (1-2 months):
- GPS Layers: Add global attention to current message passing
- Hyperbolic Embeddings: For HNSW higher layers (hierarchical structure)
Short-Term (3-6 months): 3. Graph Transformers: Full Graphormer implementation for comparison 4. Neural ODE: Continuous-depth variant for adaptive receptive field
Long-Term (6-12 months): 5. Mixed-Curvature: Product manifolds for heterogeneous graph patterns 6. Generative Models: GVAE for data augmentation, anomaly detection
// Extend RuvectorLayer with global attention
pub struct GPSRuvectorLayer {
local: RuvectorLayer,
global: MultiHeadAttention,
fusion: Linear,
}// Hyperbolic version for upper HNSW layers
pub enum RuvectorLayerVariant {
Euclidean(RuvectorLayer),
Hyperbolic(HyperbolicLayer),
}// Continuous-depth GNN
pub struct ContinuousRuvector {
dynamics: RuvectorLayer,
ode_solver: ODESolver,
learnable_time: f32, // Adaptive depth
}Graph Transformers:
- Ying et al. (2021) - "Do Transformers Really Perform Bad for Graph Representation?" (Graphormer)
- Rampášek et al. (2022) - "Recipe for a General, Powerful, Scalable Graph Transformer" (GPS)
- Kreuzer et al. (2021) - "Rethinking Graph Transformers with Spectral Attention"
Hyperbolic GNNs: 4. Chami et al. (2019) - "Hyperbolic Graph Convolutional Neural Networks" (HGCN) 5. Liu et al. (2019) - "Hyperbolic Graph Attention Network" 6. Gu et al. (2018) - "Learning Mixed-Curvature Representations in Product Spaces"
Neural ODEs: 7. Chen et al. (2018) - "Neural Ordinary Differential Equations" 8. Poli et al. (2019) - "Graph Neural Ordinary Differential Equations"
Equivariant Networks: 9. Satorras et al. (2021) - "E(n) Equivariant Graph Neural Networks" (EGNN) 10. Thomas et al. (2018) - "Tensor Field Networks"
Generative Models: 11. Kipf & Welling (2016) - "Variational Graph Auto-Encoders" (GVAE) 12. Vignac et al. (2022) - "DiGress: Discrete Denoising Diffusion for Graph Generation"
Document Version: 1.0 Last Updated: 2025-11-30 Author: RuVector Research Team