Questions about the released KGC code: bf16 PNA aggregation, expected PyTorch version, and MTDEA text-data handling

Hi, thank you for releasing the code.

  While trying to run the official KGC code in kg/, I ran into a few points where I am not sure whether I am using the code as intended. I would really appreciate some clarification.

  ### 1. bf16 PNA aggregation in kg/src/model/pna.py

  In kg/script/lp_pretrain.py, training enables mixed precision:

  - kg/script/lp_pretrain.py:65-70

  accelerator = Accelerator(
      ...
      mixed_precision="bf16",
      ...
  )

  At the same time, the PNA text aggregation in kg/src/model/pna.py:22-29 computes:

  mean = token_embs.sum(axis=1) / token_lengths
  sq_mean = (token_embs\**2).sum(axis=1) / token_lengths
  std = (sq_mean - mean\**2).clamp(min=1e-6).sqrt()

  I am asking about this because in our reproduction attempts, this part appeared to produce NaN / Inf in aggregated text features on some FB datasets, which then led to clearly abnormal evaluation results. Since the code squares the text embeddings directly in the current dtype, I wanted to check whether this is the intended behavior under bf16 training, or whether this part is expected to be accumulated in float32 first and then cast back.

  ### 2. Expected PyTorch version for the public release

  I could not find an explicit PyTorch version in the README:

  - README.md:9-13

  The environment script seems to suggest something close to torch 2.2.0 + cu121 because of the PyG wheel source:

  - kg/env.sh:1-2

  I am asking because several parts of the released code load processed datasets / cached blocks / checkpoints with plain torch.load(...), for example:

  - kg/script/lp_pretrain.py:170
  - kg/script/lp_pretrain.py:269
  - kg/src/ultra/datasets.py:23
  - kg/src/ultra/datasets.py:1072
  - kg/src/data/duckdb.py:22
  - kg/src/data/duckdb.py:140
  - kg/src/data/duckdb.py:302

  In our environment, these loading paths were difficult to run directly under newer PyTorch behavior, especially when processed files contained cached objects rather than only plain weight tensors. So I wanted to confirm which PyTorch version the public KGC release is actually expected to work with.

  ### 3. MTDEA text-data handling

  In kg/src/data/datasets.py, I noticed several places where MTDEA-style dataset builders use the pattern:

  - kg/src/data/datasets.py:1179-1191

  For example:

  train_text_data = None
  test_text_data = self.text_store.desc_from_mapping(...)
  train_data = Data(..., text_data=train_text_data)

  Later, evaluation in kg/script/lp_eval.py:46-52 assumes data.text_data is ready and calls:

  data.text_data.load_emb_db(ent_path, rel_path, stage="test")

  I am asking because when extending evaluation to MTDEA-style datasets, this path looked unclear on our side: the dataset builder seems to initialize
  some graph objects with text_data=None, while the evaluation path later assumes text_data is available and ready to load embedding DBs. I wanted to
  check whether there is an additional preprocessing / initialization step required for MTDEA datasets before evaluation, or whether this path is
  already expected to work directly from the released code.

  If there is a recommended environment version or a minimal public KGC reproduction path, that would be very helpful.

  Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the released KGC code: bf16 PNA aggregation, expected PyTorch version, and MTDEA text-data handling #1

1. bf16 PNA aggregation in kg/src/model/pna.py

2. Expected PyTorch version for the public release

3. MTDEA text-data handling

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about the released KGC code: bf16 PNA aggregation, expected PyTorch version, and MTDEA text-data handling #1

Description

1. bf16 PNA aggregation in kg/src/model/pna.py

2. Expected PyTorch version for the public release

3. MTDEA text-data handling

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions