Skip to content

HeteroData silently accepts numpy arrays. Then fails during training with cryptic messageΒ #10597

@Gabriel-Kissin

Description

@Gabriel-Kissin

πŸ› Describe the bug

Hi PyG team, thanks for the great package! The other day I spent a good while debugging an issue that I think could be caught much earlier with better error messages.

When creating a HeteroData object from networkx/pandas (I'm guessing this is a fairly common workflow), numpy arrays can accidentally end up as node features instead of torch tensors. PyG silently accepts these numpy arrays, and the error only surfaces deep in model execution with n unhelpful message.

MRE

import numpy as np
import torch
import torch_geometric
from torch_geometric.data import HeteroData
from torch_geometric.nn import SAGEConv, to_hetero
from torch_geometric.transforms import ToUndirected

# Create HeteroData with numpy arrays forgetting to convert to tensor
data = HeteroData()
data['user'].x = np.random.randn(100, 64)  # numpy array, not tensor
data['item'].x = np.random.randn(100, 64)   # numpy array, not tensor
data['user', 'likes', 'item'].edge_index = torch.randint(0, 100, (2, 200))

print("validate():", data.validate())  # Returns True although dtypes are invalid

device = torch_geometric.device('auto')
print(device) # e.g. mps on my machine
data = data.to(device) # silently fails to move the numpy arrays

# Check what actually happened
print("x device after .to():", data['user'].x.device)  # Still shows 'cpu'
print("x type:", type(data['user'].x))  # numpy.ndarray!

# try to use it in a model and it failse
data = ToUndirected()(data)

class Encoder(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        self.conv = SAGEConv((-1, -1), hidden_channels)
    
    def forward(self, x, edge_index):
        return self.conv(x, edge_index).relu()

encoder = Encoder(32)
encoder = to_hetero(encoder, data.metadata(), aggr='sum')
encoder = encoder.to(device)

out = encoder(data.x_dict, data.edge_index_dict)

The encoder gets up to Aggregate stage and then fails with a cryptic AttributeError: 'NoneType' object has no attribute 'dim' πŸ˜•.

There are several places this could be caught / dealt with earlier:

  1. On assignment: data['user'].x = numpy_array should either auto-convert to torch or raise a clear error
  2. In .validate(): Should check that all x, edge_index, etc. are actually torch tensors - i.e. check dtypes as well as structure
  3. In .to(device): if it encounters nested numpy arrays it should either auto-convert to tenor or raise a clear error instead of silently skipping them.

It is trivial to convert to a torch tensor, if there is a clear error message, but much harder to debug a message like what I got...

Thougth this was worth sharing to save some other people time πŸ˜ƒ

Versions

PyTorch version: 2.9.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 26.2 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.3)
CMake version: Could not collect
Libc version: N/A

Python version: 3.13.1 (main, Dec  3 2024, 17:59:52) [Clang 16.0.0 (clang-1600.0.26.4)] (64-bit runtime)
Python platform: macOS-26.2-arm64-arm-64bit-Mach-O
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Apple M3 Pro

Versions of relevant libraries:
[pip3] flake8==7.3.0
[pip3] mypy==1.18.2
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.3.4
[pip3] torch==2.9.0
[pip3] torch-geometric==2.7.0
[conda] Could not collect```

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions