add figure title

marlin-codes · marlin-codes · commit 77c6c4fef85a · 2024-12-22T16:04:01.000-05:00
diff --git a/app/projects/hypformer/page.mdx b/app/projects/hypformer/page.mdx
@@ -119,31 +119,31 @@ By defining these operations through HRC (and HTC for linear transformations), w
 Framework of Hypformer. Input data (text, images, graphs) are projected onto the Lorentz model, then transformed via HTC. The result passes through the hyperbolic linear attention block with positional encoding, followed by a Feedforward layer (built by HTC) and LayerNorm (built by HRC). This serves as an encoder which can optionally incorporate a GNN.
 For classification tasks in this study, the decoder is the fully connected layer. Dropout, activation, and residual connections are omitted for brevity.
 
-![framework](./assets/framework2024.jpg)
+![Framework](./assets/framework2024.jpg)
 
 
 
 ## 3.Experiments
 ### 3.1 Experiments on Large-scale Graphs
 
 We first evaluate Hypformer on diverse large-scale graphs for node classification, with node counts ranging from millions to billions, including ogbn-arxiv, ogbn-protein, and Papers100M.
-![title|scale=0.5](./assets/exp1_large_scale.png)
+![Experiments on Large-scale Graphs|scale=0.5](./assets/exp1_large_scale.png)
 
  Hypformer consistently outperforms other models across various large-scale graph datasets, demonstrating substantial improvements. It is worth noting that models, such as GraphFormer, GraphTrans, and GraphGPS, HAN, HNN++ and F-HNN, have difficulty operating effectively on large-scale graph data. 
 In addition, our method significantly outperforms the recent approaches such as, SGFormer and NodeFormer across all tested scenarios, highlighting its superior effectiveness. Importantly, Hypformer exhibits robust scalability, maintaining its performance advantage even on the largest dataset, ogbn-papers100M, where previous Transformer-based models have encountered limitations.
 
 ### 3.2 Experiments on Medium/Small-scale Graphs
 To complement our large-scale evaluations, we assessed Hypformer on small- and medium-scale graph datasets. This additional testing allows for a more comprehensive comparison against current state-of-the-art models, including GNNs, graph transformers, and hyperbolic approaches that may not scale effectively to larger datasets. By expanding our evaluation scope, we aim to isolate Hypformer's effectiveness in graph learning from its scalability advantages.
 
-![title|scale=0.5](./assets/exp2_medium_scale.png)
+![Experiments on Medium/Small-scale Graphs|scale=0.5](./assets/exp2_medium_scale.png)
 
 Our findings suggest that the proposed method significantly surpasses both standard GNNs and hyperbolic GNN models by a substantial margin. 
 Importantly, the method exhibits effectiveness not only in scenarios with hyperbolic datasets (like Disease, Airport) but also in situations with non-hyperbolic dataset (like Cora, CiteSeer and PubMed). 
 
 ### 3.3 Comparisons on Text and Vision Datasets
 Additionally, we apply our model to semi-supervised image and text classification tasks on the Mini-ImageNet and 20News-Groups datasets. We also construct a graph using k-NN (based on input node features) to utilize graph model. These experiments are conducted closely in Nodeformer.
 
-![title|scale=0.8](./assets/exp3_image_text.png)
+![Comparisons on Text and Vision Datasets|scale=0.8](./assets/exp3_image_text.png)
 
 Hypformer outperforms in seven out of eight cases. In contrast, the performance of competing baselines models varying significantly with different k values, while our method demonstrates greater stability.
 
@@ -152,7 +152,7 @@ Hypformer outperforms in seven out of eight cases. In contrast, the performance
 **Scalability**
 We conducted additional tests on the model’s scalability regarding the number of nodes in a single batch. The Amazon2M dataset was used, and we randomly selected a subset of nodes, with the number of nodes varying from 10K to 200K. We made a comparison between softmax attention defined by Equation (3) and linear attention, keeping all other parameters the same. As depicted in Figure 5, the memory usage of the proposed method exhibits a linear increase with the size of the graph. When the node count exceeds 40K, the softmax attention experiences an out-of-memory (OOM) issue. However, the proposed method continues to function effectively, resulting in a 10X reduction in GPU cost.
 
-![title|scale=0.5](./assets/gpucost.png)
+![Scalability|scale=0.5](./assets/gpucost.png)
 
 
 ## 4.Conclusion