This is a benchmark dataset for evaluating hybrid graph (a unified definition for higher-order graphs, including hypergraphs and hierarchical graphs) learning algorithms. It contains:
- 23 real-world hybrid graph datasets from the domains of biology, social media, and e-commerce
- Built-in functionalities for preprocessing hybrid graphs
- An extensible framework to easily train and evaluate Graph Neural Networks
First, install the required PyTorch packages. You will need to know the compatible CUDA version for the PyTorch version you want to use. Replace ${TORCH} and ${CUDA} with these versions in the following commands:
# TORCH=2.0.1 if use newest stable torch version
# CUDA=cpu if cuda is not available
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-geometric==2.2.0Once these dependencies are installed, you can install this package with one of the following:
pip install hybrid-graph
# or pip install git+https://github.com/Zehui127/hybrid-graph-benchmark.gitgit clone https://github.com/Zehui127/hybrid-graph-benchmark.git
cd hybrid-graph-benchmark
pip install -e .HGB provides both the hybrid graph datasets and efficient training/evaluation capabilities:
we use torch_geometric.data.Data to wrap the graphs with additional adjacency matrix for hyperedge representation.
from hg.datasets import Facebook, HypergraphSAINTNodeSampler
# download data to the path 'data/facebook'
data = Facebook('data/facebook')
print(data[0]) # Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], hyperedge_index=[2, 2344151], num_hyperedges=236663)
# create a sampler which sample 1000 nodes from the graph for 5 times
sampler = HypergraphSAINTNodeSampler(data[0],batch_size=1000,num_steps=5)
batch = next(iter(sampler))
print(batch) # Data(num_nodes=918, edge_index=[2, 7964], hyperedge_index=[2, 957528], num_hyperedges=210718, x=[918, 128], y=[918])Data Loaders can also be obtained using hg.hybrid_graph.io.get_dataset:
from hg.hybrid_graph.io import get_dataset
name = 'musae_Facebook'
train_loader, valid_loader, test_loader,data_info = get_dataset(name)You can use the command hybrid-graph to train and evaluate GNNs predefined in HGB directly, if you have installed HGB via pip.
Training can be triggered with the following command. It takes only a few minutes to train a GCN even on a CPU.
#-a=gpu,cpu,tpu
hybrid-graph train grand_Lung gcn -a=cpuEvaluation can be triggered with
# load the saved checkpoint from the path 'lightning_logs/version_0/checkpoints/best.ckpt'
hybrid-graph eval grand_lung gcn -load='lightning_logs/version_0/checkpoints/best.ckpt' -a=cpuIn order to add customized models, you need to install HGB from source.
cd hybrid-graph-benchmark/hg/hybrid_graph/models/gnn
touch customize_model.pyWithin customize_model.py, define your customized GNN that correctly handles the input feature size, prediction size and task type.
Below is an example using the definition of vanila Graph Convolutional Networks (GCNs)
from torch_geometric.nn import GCNConv
class CustomizeGNN(torch.nn.Module):
def __init__(
self, info, *args, **kwargs):
super().__init__()
dim = 32
self.conv1 = GCNConv(info["num_node_features"], dim)
self.is_regression = info["is_regression"]
if info["is_regression"]:
self.conv2 = GCNConv(dim, dim)
self.head = nn.Linear(dim, 1)
else:
self.conv2 = GCNConv(dim, info["num_classes"])
def forward(self, data, *args, **kargs):
x, edge_index = data.x, data.edge_index
x = F.relu(self.conv1(x, edge_index))
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
if self.is_regression:
x = self.head(x).squeeze()
else:
x = F.log_softmax(x, dim=1)
return xFinally, you should register you model the factory dictionary in hybrid-graph-benchmark/hg/hybrid_graph/models/__init__.py
...
from .gnn.customize_model import CustomizeGNN
factory = {
...
'gcn':CustomizeGNN, # abbreviation: ClassName,
}@article{Li2023HybridGraph,
title={Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs},
author={Zehui Li and
Xiangyu Zhao and
Mingzhu Shen and
Guy-Bart Stan and
Pietro Li{\`o} and
Yiren Zhao},
journal={arXiv preprint arXiv:2306.05108},
year={2023}
}
