Skip to content

Add RelSC benchmark datasets and tests#10630

Open
MarcusVukojevic wants to merge 4 commits intopyg-team:masterfrom
MarcusVukojevic:add-relsc-dataset
Open

Add RelSC benchmark datasets and tests#10630
MarcusVukojevic wants to merge 4 commits intopyg-team:masterfrom
MarcusVukojevic:add-relsc-dataset

Conversation

@MarcusVukojevic
Copy link

This PR introduces the RelSC-H (homogeneous) and RelSC-M (multi-relational) datasets, a new benchmark for graph-level regression. Unlike most existing benchmarks focused on molecules or citations, RelSC provides large, directed program graphs extracted from Java source code to predict execution-time cost.

Key Features:

RelSC-H: A homogeneous variant providing rich node features on flow-augmented Abstract Syntax Trees (ASTs).

RelSC-M: A multi-relational variant that preserves semantic relationships by categorizing nodes into 7 semantic groups with up to 49 unique relation types.

Domain: Software Engineering / Performance Prediction.

Academic Context:
The associated paper, "A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants", is currently in the final stage of review at the Journal of Data-centric Machine Learning Research (DMLR).

Preprint: arXiv:2505.23875

Resources & Reproducibility:

Official Project Page: https://github.com/MarcusVukojevic/graph_regression_datasets

The project page includes comprehensive tutorials, scripts to reproduce paper results, and tools to build custom versions of the dataset from source code.

Implementation Details:

Both variants are implemented in a single relsc.py file to share data loading and download logic.

Unit tests are included in test/datasets/test_relsc.py, using generated dummy data to ensure CI passes without requiring large Zenodo downloads.

CHANGELOG.md and datasets/init.py have been updated.

Checklist:

[x] I have updated the CHANGELOG.md

[x] I have added unit tests

[x] I have updated torch_geometric/datasets/init.py

[x] Documentation follows the PyG style guide and includes dataset statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant