Add from_relbench utility to convert RelBench databases to HeteroData#10628
Add from_relbench utility to convert RelBench databases to HeteroData#10628AJamal27891 wants to merge 3 commits intopyg-team:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10628 +/- ##
==========================================
- Coverage 86.11% 84.21% -1.91%
==========================================
Files 496 511 +15
Lines 33655 36058 +2403
==========================================
+ Hits 28981 30365 +1384
- Misses 4674 5693 +1019 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e1cb0cb to
f8c76ea
Compare
puririshi98
left a comment
There was a problem hiding this comment.
can you add an example of training a GNN+LLM system on this based on the code from examples/llm/txt2kg_rag.py
7b68c2b to
4148f88
Compare
This PR includes
The end-to-end code already exists in #10353 this split keeps each PR focused and independently reviewable suggested by @wsad1 . |
4148f88 to
fe3e67c
Compare
puririshi98
left a comment
There was a problem hiding this comment.
lgtm can you just share a log of running the example to convergence
Loading RelBench rel-f1 dataset...
Done in 0.26 seconds.
Graph: 9 node types, 26 edge types
Training 30 epochs on "standings" point prediction...
Target stats (train): mean=6.25, std=13.49
Epoch: 001, Loss: 0.8340, Train MAE: 7.29, Val MAE: 7.36, Test MAE: 7.31 points
Epoch: 005, Loss: 0.2883, Train MAE: 5.02, Val MAE: 4.90, Test MAE: 4.89 points
Epoch: 010, Loss: 0.1753, Train MAE: 2.94, Val MAE: 2.91, Test MAE: 2.90 points
Epoch: 015, Loss: 0.1473, Train MAE: 2.78, Val MAE: 2.76, Test MAE: 2.72 points
Epoch: 020, Loss: 0.1134, Train MAE: 2.56, Val MAE: 2.50, Test MAE: 2.49 points
Epoch: 025, Loss: 0.1037, Train MAE: 2.44, Val MAE: 2.37, Test MAE: 2.37 points
Epoch: 030, Loss: 0.0890, Train MAE: 2.22, Val MAE: 2.19, Test MAE: 2.18 points
Final — Train MAE: 2.22, Val MAE: 2.19, Test MAE: 2.18 points |
puririshi98
left a comment
There was a problem hiding this comment.
lgtm, @akihironitta @wsad1 to merge
Description
This PR is Part 1 of 4 in splitting the monolithic Warehouse Intelligence system (#10353) into modular, composable pieces, as requested by the core maintainers.
This PR introduces the
from_relbenchutility totorch_geometric.utils.relbench. It allows users to convert complex, multi-table databases from the RelBench (Relational Deep Learning Benchmark) environment directly into PyG's nativeHeteroDataformat.Addressing Assessor Feedback:
This specifically addresses @wsad1's feedback from #10353: "why do we need a RelBenchDataset?"
Based on that guidance, I have completely dropped the custom
RelBenchDatasetclass wrapper. Instead, to align with PyG's stateless data processing philosophy and avoid reinventing the wheel, this PR only introduces a pure utility function. It decouples PyG's dataset classes from RelBench's internal state.Proposed Changes
torch_geometric/utils/relbench.pyhousing thefrom_relbenchconversion utility.test/utils/test_relbench.pyutilizing dummy fallback flags to ensure rapid CI execution without massive database downloads.CHANGELOG.md.(Note: Parts 2, 3, and 4—covering the Warehouse Transforms, SAGEConv multi-task models, and LLM G-Retriever integrations—will follow in subsequent linked PRs once this foundational data layer is approved.)
Ref: #10353
Partially Closes #9839