Skip to content

[FEA] Add New Unsupervised Learning Example#371

Merged
rapids-bot[bot] merged 16 commits intorapidsai:mainfrom
alexbarghi-nv:add-mag-examples
Jan 27, 2026
Merged

[FEA] Add New Unsupervised Learning Example#371
rapids-bot[bot] merged 16 commits intorapidsai:mainfrom
alexbarghi-nv:add-mag-examples

Conversation

@alexbarghi-nv
Copy link
Member

@alexbarghi-nv alexbarghi-nv commented Dec 12, 2025

Adds a new unsupervised learning example that can learn embeddings. Closes #364

@alexbarghi-nv alexbarghi-nv self-assigned this Dec 12, 2025
@alexbarghi-nv alexbarghi-nv added feature request New feature or request non-breaking Introduces a non-breaking change labels Dec 12, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 12, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@alexbarghi-nv alexbarghi-nv marked this pull request as ready for review January 5, 2026 16:17
@alexbarghi-nv alexbarghi-nv requested a review from a team as a code owner January 5, 2026 16:17
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Overview

Greptile Summary

Adds a comprehensive unsupervised learning example for the MAG dataset, implementing link prediction with GNN embeddings that can be used downstream for node classification.

Key Changes

  • mag_lp_mnmg.py: Multi-node multi-GPU implementation that trains a heterogeneous GNN (TransformerConv + SAGEConv) on the OGBN-MAG dataset for link prediction, computes betweenness centrality as edge features, and exports learned node embeddings to parquet files
  • xgb.py: Companion script that loads the generated embeddings and trains an XGBoost classifier for node classification, demonstrating the downstream use of unsupervised embeddings

Implementation Details

The main script supports two modes:

  1. Learned embeddings mode (--learn_embeddings): Uses WholeMemory distributed embeddings
  2. Message passing mode (default): Derives embeddings through neighborhood aggregation

The pipeline enriches the graph with betweenness centrality as edge attributes, trains an encoder-decoder architecture for link prediction, and exports concatenated original features + learned embeddings for downstream tasks.

Minor Issues

  • Some hard-coded device="cuda" references could use x_paper.device for better flexibility (non-critical style suggestion)

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • The implementation follows established patterns from similar examples in the codebase (movielens_mnmg.py, rgcn_link_class_mnmg.py), uses appropriate distributed training infrastructure, and includes proper initialization/cleanup. Only minor style improvements suggested around hard-coded device references that don't affect functionality.
  • No files require special attention

Important Files Changed

Filename Overview
python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py Implements multi-node multi-GPU link prediction with graph neural networks on the MAG dataset, includes model training and embedding generation with minor hard-coded device references
python/cugraph-pyg/cugraph_pyg/examples/xgb.py Loads pre-generated embeddings and trains an XGBoost classifier with clean, straightforward implementation

Sequence Diagram

sequenceDiagram
    participant User
    participant mag_lp_mnmg.py
    participant Model
    participant GraphStore
    participant FeatureStore
    participant xgb.py
    participant XGBoost

    User->>mag_lp_mnmg.py: Run training script
    mag_lp_mnmg.py->>mag_lp_mnmg.py: Initialize distributed env (torchrun)
    mag_lp_mnmg.py->>GraphStore: Load MAG dataset
    mag_lp_mnmg.py->>FeatureStore: Add node/edge features
    mag_lp_mnmg.py->>mag_lp_mnmg.py: Compute betweenness centrality
    mag_lp_mnmg.py->>FeatureStore: Store centrality as edge features
    mag_lp_mnmg.py->>Model: Create Classifier (Encoder+Decoder)
    loop Training Epochs
        mag_lp_mnmg.py->>Model: Train on link prediction task
        mag_lp_mnmg.py->>Model: Evaluate on test set
    end
    mag_lp_mnmg.py->>Model: Extract learned embeddings
    mag_lp_mnmg.py->>mag_lp_mnmg.py: Write embeddings to parquet (x)
    mag_lp_mnmg.py->>mag_lp_mnmg.py: Write labels to parquet (y)
    
    User->>xgb.py: Run XGBoost script
    xgb.py->>xgb.py: Read embeddings (x) and labels (y)
    xgb.py->>xgb.py: Join data and split train/test
    xgb.py->>XGBoost: Train classifier on embeddings
    xgb.py->>XGBoost: Evaluate on test set
    xgb.py->>User: Report accuracy
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 96 (link)

    logic: global_rank is not defined in the scope of the Classifier.__init__ method

    To fix this, you'll need to pass global_rank as a parameter to the Classifier constructor.

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 111 (link)

    logic: global_rank is not defined in the scope of the Classifier.__init__ method

    To fix this, you'll need to pass global_rank as a parameter to the Classifier constructor.

  3. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 278 (link)

    logic: global_rank is not defined in the scope of the train function

    To fix this, you'll need to pass global_rank as a parameter to the train function.

  4. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 687-690 (link)

    logic: doubled the embedding instead of adding x_paper residual

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 681-684 (link)

    logic: duplicates x_dict["paper"] addition - should add x_paper residual instead

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 201 (link)

    style: missing @torch.no_grad() decorator - inference should disable gradient computation for performance and memory efficiency

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 643 (link)

    logic: drop_last=True causes embeddings for the last batch to not be computed, leaving uninitialized values in the pre-allocated tensor at line 646-648

  3. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 272 (link)

    style: global_rank accessed from global scope - consider passing it as a parameter for better function isolation and testability

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 300 (link)

    syntax: Typo: 'torchrunshould betorchrun` (missing opening backtick)

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 547-548 (link)

    logic: train_sz and test_sz are tensors, need conversion to int for slicing

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 548-549 (link)

    logic: Tensor used as index instead of scalar value. train_sz and test_sz are tensors but need to be converted to integers for slicing.

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 492 (link)

    syntax: "betweeness" is misspelled

  3. python/cugraph-pyg/cugraph_pyg/examples/xgb.py, line 92 (link)

    logic: Type consistency issue: predictions_computed is a cupy array but dfy_test_computed is likely still a cudf Series/DataFrame

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@alexbarghi-nv
Copy link
Member Author

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 548-549 (link)
    logic: Tensor used as index instead of scalar value. train_sz and test_sz are tensors but need to be converted to integers for slicing.
  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 492 (link)
    syntax: "betweeness" is misspelled
  3. python/cugraph-pyg/cugraph_pyg/examples/xgb.py, line 92 (link)
    logic: Type consistency issue: predictions_computed is a cupy array but dfy_test_computed is likely still a cudf Series/DataFrame

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

These are not isues - I've tested both files.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Copy link
Member

@tingyu66 tingyu66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left some nitpicks.

@alexbarghi-nv
Copy link
Member Author

/merge

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +150 to +162
"author": torch.zeros(
batch["author"].n_id.numel(), self.hidden_channels, device="cuda"
),
"institution": torch.zeros(
batch["institution"].n_id.numel(),
self.hidden_channels,
device="cuda",
),
"field_of_study": torch.zeros(
batch["field_of_study"].n_id.numel(),
self.hidden_channels,
device="cuda",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded device="cuda" makes code less flexible

Suggested change
"author": torch.zeros(
batch["author"].n_id.numel(), self.hidden_channels, device="cuda"
),
"institution": torch.zeros(
batch["institution"].n_id.numel(),
self.hidden_channels,
device="cuda",
),
"field_of_study": torch.zeros(
batch["field_of_study"].n_id.numel(),
self.hidden_channels,
device="cuda",
),
"author": torch.zeros(
batch["author"].n_id.numel(), self.hidden_channels, device=x_paper.device
),
"institution": torch.zeros(
batch["institution"].n_id.numel(),
self.hidden_channels,
device=x_paper.device
),
"field_of_study": torch.zeros(
batch["field_of_study"].n_id.numel(),
self.hidden_channels,
device=x_paper.device
),

Comment on lines +649 to +664
"paper": x_paper,
"author": torch.zeros(
batch["author"].n_id.numel(),
model.module.hidden_channels,
device="cuda",
),
"institution": torch.zeros(
batch["institution"].n_id.numel(),
model.module.hidden_channels,
device="cuda",
),
"field_of_study": torch.zeros(
batch["field_of_study"].n_id.numel(),
model.module.hidden_channels,
device="cuda",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded device="cuda" makes code less flexible (use x_paper.device instead)

@rapids-bot rapids-bot bot merged commit 75cd001 into rapidsai:main Jan 27, 2026
141 of 144 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Single-GPU Link Prediction Examples

3 participants