Add RelSC benchmark datasets and tests by MarcusVukojevic · Pull Request #10630 · pyg-team/pytorch_geometric

MarcusVukojevic · 2026-03-06T14:13:48Z

This PR introduces the RelSC-H (homogeneous) and RelSC-M (multi-relational) datasets, a new benchmark for graph-level regression. Unlike most existing benchmarks focused on molecules or citations, RelSC provides large, directed program graphs extracted from Java source code to predict execution-time cost.

Key Features:

RelSC-H: A homogeneous variant providing rich node features on flow-augmented Abstract Syntax Trees (ASTs).

RelSC-M: A multi-relational variant that preserves semantic relationships by categorizing nodes into 7 semantic groups with up to 49 unique relation types.

Domain: Software Engineering / Performance Prediction.

Academic Context:
The associated paper, "A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants", is currently in the final stage of review at the Journal of Data-centric Machine Learning Research (DMLR).

Preprint: arXiv:2505.23875

Resources & Reproducibility:

Official Project Page: https://github.com/MarcusVukojevic/graph_regression_datasets

The project page includes comprehensive tutorials, scripts to reproduce paper results, and tools to build custom versions of the dataset from source code.

Implementation Details:

Both variants are implemented in a single relsc.py file to share data loading and download logic.

Unit tests are included in test/datasets/test_relsc.py, using generated dummy data to ensure CI passes without requiring large Zenodo downloads.

CHANGELOG.md and datasets/init.py have been updated.

Checklist:

[x] I have updated the CHANGELOG.md

[x] I have added unit tests

[x] I have updated torch_geometric/datasets/init.py

[x] Documentation follows the PyG style guide and includes dataset statistics.

MarcusVukojevic added 2 commits March 6, 2026 15:09

Add RelSC benchmark datasets and tests

6e7f73c

Add RelSC benchmark datasets and tests

a854209

MarcusVukojevic requested review from akihironitta, rusty1s and wsad1 as code owners March 6, 2026 14:13

MarcusVukojevic added 2 commits March 6, 2026 15:14

Add RelSC benchmark datasets and tests

d1ffa8b

Fix RTD build: lazy import pandas and sklearn

4420b8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RelSC benchmark datasets and tests#10630

Add RelSC benchmark datasets and tests#10630
MarcusVukojevic wants to merge 4 commits intopyg-team:masterfrom
MarcusVukojevic:add-relsc-dataset

MarcusVukojevic commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MarcusVukojevic commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant