Skip to content

Conversation

@alexbarghi-nv
Copy link
Member

@alexbarghi-nv alexbarghi-nv commented Dec 12, 2025

Adds a new unsupervised learning example that can learn embeddings. Closes #364

@alexbarghi-nv alexbarghi-nv self-assigned this Dec 12, 2025
@alexbarghi-nv alexbarghi-nv added feature request New feature or request non-breaking Introduces a non-breaking change labels Dec 12, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 12, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@alexbarghi-nv alexbarghi-nv marked this pull request as ready for review January 5, 2026 16:17
@alexbarghi-nv alexbarghi-nv requested a review from a team as a code owner January 5, 2026 16:17
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Added two new example scripts for unsupervised learning on smaller datasets like ogbn-mag:

  • mag_lp_mnmg.py: Multi-GPU link prediction example that trains a heterogeneous GNN with encoder/decoder architecture using betweenness centrality as edge features, then exports learned embeddings and labels to parquet files
  • xgb.py: XGBoost classifier example that loads the exported embeddings and trains a multi-class classifier, demonstrating how to use GNN embeddings for downstream tasks

The examples provide a complete workflow from unsupervised embedding learning to supervised classification, addressing issue #364's request for single-GPU examples on smaller datasets.

Confidence Score: 5/5

  • This PR is safe to merge with no critical issues found
  • The code follows established patterns from existing examples in the repository, implements a well-structured ML pipeline with proper distributed training setup, includes comprehensive error handling, and correctly uses distributed feature stores with global indexing
  • No files require special attention

Important Files Changed

Filename Overview
python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py Adds comprehensive multi-GPU link prediction example with encoder/decoder architecture, betweenness centrality features, and embedding export to parquet
python/cugraph-pyg/cugraph_pyg/examples/xgb.py Adds XGBoost classifier example that loads embeddings from parquet files and trains a multi-class classifier

Sequence Diagram

sequenceDiagram
    participant User
    participant mag_lp_mnmg as mag_lp_mnmg.py
    participant GraphStore
    participant Model
    participant Output as Parquet Files
    participant xgb as xgb.py
    participant XGBoost

    User->>mag_lp_mnmg: Run with torchrun
    mag_lp_mnmg->>mag_lp_mnmg: Initialize distributed workers (NCCL, cuGraph, WholeGraph)
    mag_lp_mnmg->>GraphStore: Load ogbn-mag dataset
    mag_lp_mnmg->>GraphStore: Add nodes and edges (with reverse edges)
    mag_lp_mnmg->>GraphStore: Calculate betweenness centrality
    mag_lp_mnmg->>GraphStore: Add betweenness features to edges
    mag_lp_mnmg->>Model: Create Classifier with Encoder/Decoder
    mag_lp_mnmg->>Model: Train with LinkNeighborLoader
    mag_lp_mnmg->>Model: Evaluate on test set
    mag_lp_mnmg->>Model: Generate paper embeddings
    mag_lp_mnmg->>Output: Export embeddings (x) and labels (y) to parquet
    mag_lp_mnmg->>mag_lp_mnmg: Shutdown workers
    
    User->>xgb: Run with --data_dir
    xgb->>xgb: Create LocalCUDACluster
    xgb->>Output: Read embeddings (x) and labels (y)
    xgb->>xgb: Join data and split train/test
    xgb->>XGBoost: Train multi-class classifier
    xgb->>XGBoost: Evaluate on test set
    xgb->>User: Display accuracy results
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 96 (link)

    logic: global_rank is not defined in the scope of the Classifier.__init__ method

    To fix this, you'll need to pass global_rank as a parameter to the Classifier constructor.

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 111 (link)

    logic: global_rank is not defined in the scope of the Classifier.__init__ method

    To fix this, you'll need to pass global_rank as a parameter to the Classifier constructor.

  3. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 278 (link)

    logic: global_rank is not defined in the scope of the train function

    To fix this, you'll need to pass global_rank as a parameter to the train function.

  4. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 687-690 (link)

    logic: doubled the embedding instead of adding x_paper residual

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 681-684 (link)

    logic: duplicates x_dict["paper"] addition - should add x_paper residual instead

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 201 (link)

    style: missing @torch.no_grad() decorator - inference should disable gradient computation for performance and memory efficiency

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 643 (link)

    logic: drop_last=True causes embeddings for the last batch to not be computed, leaving uninitialized values in the pre-allocated tensor at line 646-648

  3. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 272 (link)

    style: global_rank accessed from global scope - consider passing it as a parameter for better function isolation and testability

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 300 (link)

    syntax: Typo: 'torchrunshould betorchrun` (missing opening backtick)

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 547-548 (link)

    logic: train_sz and test_sz are tensors, need conversion to int for slicing

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 548-549 (link)

    logic: Tensor used as index instead of scalar value. train_sz and test_sz are tensors but need to be converted to integers for slicing.

  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 492 (link)

    syntax: "betweeness" is misspelled

  3. python/cugraph-pyg/cugraph_pyg/examples/xgb.py, line 92 (link)

    logic: Type consistency issue: predictions_computed is a cupy array but dfy_test_computed is likely still a cudf Series/DataFrame

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@alexbarghi-nv
Copy link
Member Author

Additional Comments (3)

  1. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 548-549 (link)
    logic: Tensor used as index instead of scalar value. train_sz and test_sz are tensors but need to be converted to integers for slicing.
  2. python/cugraph-pyg/cugraph_pyg/examples/mag_lp_mnmg.py, line 492 (link)
    syntax: "betweeness" is misspelled
  3. python/cugraph-pyg/cugraph_pyg/examples/xgb.py, line 92 (link)
    logic: Type consistency issue: predictions_computed is a cupy array but dfy_test_computed is likely still a cudf Series/DataFrame

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

These are not isues - I've tested both files.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

Copy link
Member

@tingyu66 tingyu66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left some nitpicks.

Comment on lines +220 to +231
pred_true_pos += (
((y_pred > 0.5).float() == 1.0) & (y_true.float() == 1.0)
).sum()
pred_false_pos += (
((y_pred > 0.5).float() == 1.0) & (y_true.float() == 0.0)
).sum()
pred_true_neg += (
((y_pred <= 0.5).float() == 1.0) & (y_true.float() == 0.0)
).sum()
pred_false_neg += (
((y_pred <= 0.5).float() == 1.0) & (y_true.float() == 1.0)
).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need .float() here? Can it be simplified as ((y_pred > 0.5) & (y_true == 1)).sum()

model,
optimizer,
wm_optimizer,
neg_ratio,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the negative ratio inside train() or test()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Single-GPU Link Prediction Examples

2 participants