Add from_relbench utility to convert RelBench databases to HeteroData by AJamal27891 · Pull Request #10628 · pyg-team/pytorch_geometric

AJamal27891 · 2026-03-04T15:44:35Z

Description

This PR is Part 1 of 4 in splitting the monolithic Warehouse Intelligence system (#10353) into modular, composable pieces, as requested by the core maintainers.

This PR introduces the from_relbench utility to torch_geometric.utils.relbench. It allows users to convert complex, multi-table databases from the RelBench (Relational Deep Learning Benchmark) environment directly into PyG's native HeteroData format.

Addressing Assessor Feedback:
This specifically addresses @wsad1's feedback from #10353: "why do we need a RelBenchDataset?"
Based on that guidance, I have completely dropped the custom RelBenchDataset class wrapper. Instead, to align with PyG's stateless data processing philosophy and avoid reinventing the wheel, this PR only introduces a pure utility function. It decouples PyG's dataset classes from RelBench's internal state.

Proposed Changes

Added torch_geometric/utils/relbench.py housing the from_relbench conversion utility.
Added exhaustive unit tests in test/utils/test_relbench.py utilizing dummy fallback flags to ensure rapid CI execution without massive database downloads.
Updated CHANGELOG.md.

(Note: Parts 2, 3, and 4—covering the Warehouse Transforms, SAGEConv multi-task models, and LLM G-Retriever integrations—will follow in subsequent linked PRs once this foundational data layer is approved.)

Ref: #10353
Partially Closes #9839

codecov · 2026-03-04T15:51:39Z

Codecov Report

❌ Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.21%. Comparing base (c211214) to head (9aa7a26).
⚠️ Report is 185 commits behind head on master.

Files with missing lines	Patch %	Lines
torch_geometric/utils/relbench.py	97.72%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10628      +/-   ##
==========================================
- Coverage   86.11%   84.21%   -1.91%     
==========================================
  Files         496      511      +15     
  Lines       33655    36058    +2403     
==========================================
+ Hits        28981    30365    +1384     
- Misses       4674     5693    +1019

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

puririshi98

can you add an example of training a GNN+LLM system on this based on the code from examples/llm/txt2kg_rag.py

AJamal27891 · 2026-03-09T06:23:52Z

can you add an example of training a GNN+LLM system on this based on the code from examples/llm/txt2kg_rag.py

This PR includes examples/relbench_example.py — a lightweight hetero GNN example that demonstrates from_relbench end-to-end:
from_relbench → HeteroData → SAGEConv + to_hetero() → node-level regression (championship points, MAE 9.2 → 2.1 over 30 epochs, <30s on CPU).
Adding GRetriever end-to-end here would require bridging heterogeneous → homogeneous graphs (from_relbench produces HeteroData with multiple node/edge types, but GRetriever expects homogeneous input (x, edge_index, batch)). That bridging involves projecting all node types to a common dimension, concatenating, and remapping edges — model-level code that belongs in a dedicated PR, not the data utility PR.
The full GNN+LLM integration with GRetriever (using to_homogeneous) will land in a follow-up PR as part of the split from #10353:

PR	Scope	Status
#10628 (this)	`from_relbench` utility + tests + hetero GNN example	✅ Ready
PR 2	`to_hetero_edges` bridging utility	Planned
PR 3	GNN+LLM models + `GRetriever` example (w/ `to_homogeneous`)	Planned
PR 4	End-to-end RAG pipeline	Planned

The end-to-end code already exists in #10353 this split keeps each PR focused and independently reviewable suggested by @wsad1 .

AJamal27891 · 2026-03-09T06:25:21Z

CI update: The pytest failures were caused by tabulate 0.10.0 (unrelated to this PR). Resolved in #10634 by @rusty1s. Branch rebased on latest master.

…us GNN training

puririshi98

lgtm can you just share a log of running the example to convergence

AJamal27891 · 2026-03-11T03:25:50Z

lgtm can you just share a log of running the example to convergence

Loading RelBench rel-f1 dataset...
Done in 0.26 seconds.
Graph: 9 node types, 26 edge types

Training 30 epochs on "standings" point prediction...
Target stats (train): mean=6.25, std=13.49

Epoch: 001, Loss: 0.8340, Train MAE: 7.29, Val MAE: 7.36, Test MAE: 7.31 points
Epoch: 005, Loss: 0.2883, Train MAE: 5.02, Val MAE: 4.90, Test MAE: 4.89 points
Epoch: 010, Loss: 0.1753, Train MAE: 2.94, Val MAE: 2.91, Test MAE: 2.90 points
Epoch: 015, Loss: 0.1473, Train MAE: 2.78, Val MAE: 2.76, Test MAE: 2.72 points
Epoch: 020, Loss: 0.1134, Train MAE: 2.56, Val MAE: 2.50, Test MAE: 2.49 points
Epoch: 025, Loss: 0.1037, Train MAE: 2.44, Val MAE: 2.37, Test MAE: 2.37 points
Epoch: 030, Loss: 0.0890, Train MAE: 2.22, Val MAE: 2.19, Test MAE: 2.18 points

Final — Train MAE: 2.22, Val MAE: 2.19, Test MAE: 2.18 points

puririshi98

lgtm, @akihironitta @wsad1 to merge

AJamal27891 requested review from akihironitta, rusty1s and wsad1 as code owners March 4, 2026 15:44

AJamal27891 force-pushed the pr-10353-part1-relbench-base branch 2 times, most recently from e1cb0cb to f8c76ea Compare March 4, 2026 16:19

puririshi98 requested changes Mar 8, 2026

View reviewed changes

AJamal27891 force-pushed the pr-10353-part1-relbench-base branch from 7b68c2b to 4148f88 Compare March 8, 2026 15:35

AJamal27891 mentioned this pull request Mar 9, 2026

Fix brittle test_summary_with_to_hetero_model assertion broken by tabulate 0.10.0 #10633

Closed

AJamal27891 requested a review from puririshi98 March 9, 2026 06:26

AJamal27891 added 2 commits March 10, 2026 10:54

Add from_relbench utility to convert RelBench databases to HeteroData

d65b73c

Add relbench_example.py to demonstrate from_relbench with heterogeneo…

fe3e67c

…us GNN training

AJamal27891 force-pushed the pr-10353-part1-relbench-base branch from 4148f88 to fe3e67c Compare March 10, 2026 08:54

puririshi98 requested changes Mar 10, 2026

View reviewed changes

AJamal27891 requested a review from puririshi98 March 11, 2026 03:29

puririshi98 mentioned this pull request Mar 11, 2026

Add RelBench integration for PyTorch Geometric GNN+LLM applications #10353

Closed

puririshi98 approved these changes Mar 11, 2026

View reviewed changes

Merge branch 'master' into pr-10353-part1-relbench-base

9aa7a26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add from_relbench utility to convert RelBench databases to HeteroData#10628

Add from_relbench utility to convert RelBench databases to HeteroData#10628
AJamal27891 wants to merge 3 commits intopyg-team:masterfrom
AJamal27891:pr-10353-part1-relbench-base

AJamal27891 commented Mar 4, 2026

Uh oh!

codecov bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

puririshi98 left a comment

Uh oh!

AJamal27891 commented Mar 9, 2026

Uh oh!

AJamal27891 commented Mar 9, 2026 •

edited

Loading

Uh oh!

puririshi98 left a comment

Uh oh!

AJamal27891 commented Mar 11, 2026

Uh oh!

puririshi98 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AJamal27891 commented Mar 4, 2026

Description

Proposed Changes

Uh oh!

codecov bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

AJamal27891 commented Mar 9, 2026

Uh oh!

AJamal27891 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

AJamal27891 commented Mar 11, 2026

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 4, 2026 •

edited

Loading

AJamal27891 commented Mar 9, 2026 •

edited

Loading