Skip to content

VuWhite/DGNN-MPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DGNN-MPI: Distributed Graph Neural Network Training Framework

A scalable distributed-memory Graph Neural Network (GNN) training framework using MPI for inter-process communication. Implements GraphSAGE-style message passing with vertex partitioning and halo exchange.

Features

  • Distributed Training: Scale across multiple MPI ranks (1, 2, 5+ nodes)
  • GraphSAGE Implementation: Efficient graph convolution with mean aggregation
  • Vertex Partitioning: Graph partitioned across workers with halo exchange
  • MPI Communication: Distributed gradient synchronization using PyTorch distributed (gloo/nccl backends)
  • Flexible Backend: Supports both single-node and multi-node execution

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • NumPy
  • Open Graph Benchmark (OGB)

Installation

pip install -r requirements.txt

Usage

Unified Benchmark (Recommended)

Run training with full performance metrics and accuracy tracking:

# Single node with demo graph (for testing)
USE_DEMO=true NUM_EPOCHS=10 python -u run_benchmark.py

# Single node with real dataset
NUM_EPOCHS=20 python -u run_benchmark.py

# Distributed training (2 nodes)
torchrun --nproc_per_node=2 run_benchmark.py

Output includes:

  • Training/validation/test accuracy
  • Epoch timing (forward/backward pass breakdown)
  • Memory usage (peak RSS)
  • Throughput (samples/second)
  • Scaling metrics for distributed runs

Results are saved to ./results/benchmark_TIMESTAMP.json and ./results/benchmark_summary.txt.

Training Only

For training with accuracy metrics only:

python -u run.py

Performance Benchmark Only

For timing/memory metrics without accuracy:

python -u benchmark.py

Distributed Training

Set the number of processes via environment variables:

export RANK=0
export WORLD_SIZE=2
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29500

torchrun --nproc_per_node=2 run_benchmark.py

Environment Variables

Variable Description Default
NUM_EPOCHS Number of training epochs 10
PARTITION_METHOD Graph partitioning method hash
BATCH_SIZE Training batch size 1024
LEARNING_RATE Learning rate 0.01
HIDDEN_DIM Hidden layer dimension 256
NUM_LAYERS Number of GNN layers 2
MPI_BACKEND PyTorch distributed backend gloo
USE_DEMO Use demo graph instead of Papers100M false

Project Structure

DGNN-MPI/
├── comm/               # MPI communication utilities
│   ├── mpi_utils.py    # Distributed primitives
│   ├── gradient_sync.py
│   └── halo_exchange.py
├── data/               # Data loading
│   ├── dataset_loader.py
│   └── batch_sampler.py
├── model/              # GNN models
│   ├── gnn_model.py   # GraphSAGENet
│   ├── graphsage.py   # GraphSAGE convolution
│   └── distributed_gnn.py
├── partition/         # Graph partitioning
│   ├── graph_partitioner.py
│   ├── partition_loader.py
│   └── halo_builder.py
├── train/             # Training utilities
│   ├── trainer.py
│   ├── evaluator.py
│   └── logger.py
├── results/           # Output directory
│   ├── benchmark_*.json
│   └── benchmark_summary.txt
├── config.py          # Configuration
├── run_benchmark.py   # Unified benchmark (recommended)
├── run.py            # Training entry point
└── benchmark.py      # Performance benchmark

Architecture

The framework uses a vertex partitioning approach:

  1. Each MPI rank holds a graph partition (subset of vertices + edges)
  2. Local node features are stored per rank
  3. Forward pass is computed locally
  4. Boundary node embeddings are exchanged via halo exchange
  5. Backward pass computes local gradients
  6. Gradients are synchronized across ranks using MPI all-reduce

Dataset

Default dataset: [OGBN-Papers100M](https://ogb.stanford.edu/#leader datasets)

  • ~111M nodes
  • ~1.6B edges
  • 172 classes
  • 128-dimensional features

A smaller demo graph is available for testing.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages