You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: app/projects/hypformer/page.mdx
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,31 +119,31 @@ By defining these operations through HRC (and HTC for linear transformations), w
119
119
Framework of Hypformer. Input data (text, images, graphs) are projected onto the Lorentz model, then transformed via HTC. The result passes through the hyperbolic linear attention block with positional encoding, followed by a Feedforward layer (builtbyHTC) and LayerNorm (builtbyHRC). This serves as an encoder which can optionally incorporate a GNN.
120
120
For classification tasks in this study, the decoder is the fully connected layer. Dropout, activation, and residual connections are omitted for brevity.
121
121
122
-

122
+

123
123
124
124
125
125
126
126
## 3.Experiments
127
127
### 3.1 Experiments on Large-scale Graphs
128
128
129
129
We first evaluate Hypformer on diverse large-scale graphs for node classification, with node counts ranging from millions to billions, including ogbn-arxiv, ogbn-protein, and Papers100M.
Hypformer consistently outperforms other models across various large-scale graph datasets, demonstrating substantial improvements. It is worth noting that models, such asGraphFormer, GraphTrans, and GraphGPS, HAN, HNN++ and F-HNN, have difficulty operating effectively on large-scale graph data.
133
133
In addition, our method significantly outperforms the recent approaches such as, SGFormer and NodeFormer across all tested scenarios, highlighting its superior effectiveness. Importantly, Hypformer exhibits robust scalability, maintaining its performance advantage even on the largest dataset, ogbn-papers100M, where previous Transformer-based models have encountered limitations.
134
134
135
135
### 3.2 Experiments on Medium/Small-scale Graphs
136
136
To complement our large-scale evaluations, we assessed Hypformer on small- and medium-scale graph datasets. This additional testing allows for a more comprehensive comparison against current state-of-the-art models, including GNNs, graph transformers, and hyperbolic approaches that may not scale effectively to larger datasets. By expanding our evaluation scope, we aim to isolate Hypformer's effectiveness in graph learning from its scalability advantages.

139
139
140
140
Our findings suggest that the proposed method significantly surpasses both standard GNNs and hyperbolic GNN models by a substantial margin.
141
141
Importantly, the method exhibits effectiveness not only in scenarios with hyperbolic datasets (like Disease, Airport) but also in situations with non-hyperbolic dataset (like Cora, CiteSeer and PubMed).
142
142
143
143
### 3.3 Comparisons on Text and Vision Datasets
144
144
Additionally, we apply our model to semi-supervised image and text classification tasks on the Mini-ImageNet and 20News-Groups datasets. We also construct a graph using k-NN (based on input node features) to utilize graph model. These experiments are conducted closely in Nodeformer.
145
145
146
-

146
+

147
147
148
148
Hypformer outperforms in seven out of eight cases. In contrast, the performance of competing baselines models varying significantly with different k values, while our method demonstrates greater stability.
149
149
@@ -152,7 +152,7 @@ Hypformer outperforms in seven out of eight cases. In contrast, the performance
152
152
**Scalability**
153
153
We conducted additional tests on the model’s scalability regarding the number of nodes in a single batch. The Amazon2M dataset was used, and we randomly selected a subset of nodes, with the number of nodes varying from 10K to 200K. We made a comparison between softmax attention defined by Equation (3) and linear attention, keeping all other parameters the same. As depicted in Figure 5, the memory usage of the proposed method exhibits a linear increase with the size of the graph. When the node count exceeds 40K, the softmax attention experiences an out-of-memory (OOM) issue. However, the proposed method continues to function effectively, resulting in a 10X reduction in GPU cost.
0 commit comments