Installation

This is a benchmark dataset for evaluating hybrid graph (a unified definition for higher-order graphs, including hypergraphs and hierarchical graphs) learning algorithms. It contains:

23 real-world hybrid graph datasets from the domains of biology, social media, and e-commerce
Built-in functionalities for preprocessing hybrid graphs
An extensible framework to easily train and evaluate Graph Neural Networks

Installation

Requirements

First, install the required PyTorch packages. You will need to know the compatible CUDA version for the PyTorch version you want to use. Replace ${TORCH} and ${CUDA} with these versions in the following commands:

# TORCH=2.0.1 if use newest stable torch version
# CUDA=cpu if cuda is not available
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
python -m pip install torch-geometric==2.2.0

Once these dependencies are installed, you can install this package with one of the following:

Install via pip

pip install hybrid-graph
# or pip install git+https://github.com/Zehui127/hybrid-graph-benchmark.git

Install from source

git clone https://github.com/Zehui127/hybrid-graph-benchmark.git
cd hybrid-graph-benchmark
pip install -e .

Usage

HGB provides both the hybrid graph datasets and efficient training/evaluation capabilities:

1. Access the Datasets

we use torch_geometric.data.Data to wrap the graphs with additional adjacency matrix for hyperedge representation.

from hg.datasets import Facebook, HypergraphSAINTNodeSampler
# download data to the path 'data/facebook'
data = Facebook('data/facebook')
print(data[0]) # Data(x=[22470, 128], edge_index=[2, 342004], y=[22470], hyperedge_index=[2, 2344151], num_hyperedges=236663)

# create a sampler which sample 1000 nodes from the graph for 5 times
sampler = HypergraphSAINTNodeSampler(data[0],batch_size=1000,num_steps=5)
batch = next(iter(sampler))
print(batch)  # Data(num_nodes=918, edge_index=[2, 7964], hyperedge_index=[2, 957528], num_hyperedges=210718, x=[918, 128], y=[918])

Data Loaders can also be obtained using hg.hybrid_graph.io.get_dataset:

from hg.hybrid_graph.io import get_dataset
name = 'musae_Facebook'
train_loader, valid_loader, test_loader,data_info = get_dataset(name)

2. Train/Evaluate Pre-defined GNNs with `hybrid-graph`

You can use the command hybrid-graph to train and evaluate GNNs predefined in HGB directly, if you have installed HGB via pip.

Training can be triggered with the following command. It takes only a few minutes to train a GCN even on a CPU.

#-a=gpu,cpu,tpu
hybrid-graph train grand_Lung gcn -a=cpu

Evaluation can be triggered with

# load the saved checkpoint from the path 'lightning_logs/version_0/checkpoints/best.ckpt'
hybrid-graph eval grand_lung gcn -load='lightning_logs/version_0/checkpoints/best.ckpt' -a=cpu

3. Add Your Customized Models

In order to add customized models, you need to install HGB from source.

cd hybrid-graph-benchmark/hg/hybrid_graph/models/gnn
touch customize_model.py

Within customize_model.py, define your customized GNN that correctly handles the input feature size, prediction size and task type. Below is an example using the definition of vanila Graph Convolutional Networks (GCNs)

from torch_geometric.nn import GCNConv
class CustomizeGNN(torch.nn.Module):
    def __init__(
            self, info, *args, **kwargs):
        super().__init__()
        dim = 32
        self.conv1 = GCNConv(info["num_node_features"], dim)
        self.is_regression = info["is_regression"]
        if info["is_regression"]:
            self.conv2 = GCNConv(dim, dim)
            self.head = nn.Linear(dim, 1)
        else:
            self.conv2 = GCNConv(dim, info["num_classes"])

    def forward(self, data, *args, **kargs):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        if self.is_regression:
            x = self.head(x).squeeze()
        else:
            x = F.log_softmax(x, dim=1)
        return x

Finally, you should register you model the factory dictionary in hybrid-graph-benchmark/hg/hybrid_graph/models/__init__.py

...
from .gnn.customize_model import CustomizeGNN

factory = {
    ...
    'gcn':CustomizeGNN, # abbreviation: ClassName,
}

Cite This Project

@article{Li2023HybridGraph,
    title={Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs},
    author={Zehui Li and 
            Xiangyu Zhao and 
            Mingzhu Shen and
            Guy-Bart Stan and
            Pietro Li{\`o} and
            Yiren Zhao},
    journal={arXiv preprint arXiv:2306.05108},
    year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.github/workflows		.github/workflows
docs		docs
hg		hg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
setup.py		setup.py
test_data.py		test_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Requirements

Install via pip

Install from source

Usage

1. Access the Datasets

2. Train/Evaluate Pre-defined GNNs with `hybrid-graph`

3. Add Your Customized Models

Cite This Project

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Requirements

Install via pip

Install from source

Usage

1. Access the Datasets

2. Train/Evaluate Pre-defined GNNs with hybrid-graph

3. Add Your Customized Models

Cite This Project

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

2. Train/Evaluate Pre-defined GNNs with `hybrid-graph`

Packages