-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
🐛 Describe the bug
When you split an OnDiskDataset using index_select(), the get() method doesn't work as expected:
import torch
from torch_geometric.data import Data, OnDiskDataset
odds = OnDiskDataset("root/folder")
idxs = list(range(10))
for i in idxs:
odds.append(Data(x=torch.tensor(i)))
print(f"Dataset length: {len(odds)}")
train_ds = odds.index_select(idxs[:5])
valid_ds = odds.index_select(idxs[5:])
print(f"Train dataset length: {len(train_ds)}, Valid dataset length: {len(valid_ds)}")
# dataset.get()
print(f"Train dataset get(0): {train_ds.get(0)}")
print(f"Valid dataset get(0): {valid_ds.get(0)}")
# dataset[idx]
print(f"Train dataset [0]: {train_ds[0]}")
print(f"Valid dataset [0]: {valid_ds[0]}")
odds.close()Output
Dataset length: 10
Train dataset length: 5, Valid dataset length: 5
Train dataset get(0): Data(x=0)
Valid dataset get(0): Data(x=0) <- This is wrong
Train dataset [0]: Data(x=0)
Valid dataset [0]: Data(x=5) <- This is correct
Versions
Collecting environment information...
PyTorch version: 2.8.0+cu129
Is debug build: False
CUDA used to build PyTorch: 12.9
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Enterprise (10.0.26200 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.12 (main, Feb 12 2026, 00:40:26) [MSC v.1944 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-11-10.0.26200-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 581.29
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Name: Intel(R) Core(TM) i9-14900K
Manufacturer: GenuineIntel
Family: 207
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3200
MaxClockSpeed: 3200
L2CacheSize: 32768
L2CacheSpeed: None
Revision: None
Versions of relevant libraries:
[pip3] ml-torchserve-client==1.1.3
[pip3] numpy==2.2.6
[pip3] onnxruntime==1.19.2
[pip3] pytorch-lightning==2.6.1
[pip3] tbb==2021.7.1
[pip3] torch==2.8.0+cu129
[pip3] torch_cluster==1.6.3+pt28cu129
[pip3] torch-geometric==2.7.0
[pip3] torch_scatter==2.1.2+pt28cu129
[pip3] torch_sparse==0.6.18+pt28cu129
[pip3] torch_spline_conv==1.2.2+pt28cu129
[pip3] torchmetrics==1.8.2
[conda] Could not collect
Reactions are currently unavailable