Skip to content

OnDiskDataset.get() not working as expected after using index_select() #10621

@Anjum48

Description

@Anjum48

🐛 Describe the bug

When you split an OnDiskDataset using index_select(), the get() method doesn't work as expected:

import torch
from torch_geometric.data import Data, OnDiskDataset

odds = OnDiskDataset("root/folder")

idxs = list(range(10))

for i in idxs:
    odds.append(Data(x=torch.tensor(i)))

print(f"Dataset length: {len(odds)}")

train_ds = odds.index_select(idxs[:5])
valid_ds = odds.index_select(idxs[5:])

print(f"Train dataset length: {len(train_ds)}, Valid dataset length: {len(valid_ds)}")

# dataset.get()
print(f"Train dataset get(0): {train_ds.get(0)}")
print(f"Valid dataset get(0): {valid_ds.get(0)}")

# dataset[idx]
print(f"Train dataset [0]: {train_ds[0]}")
print(f"Valid dataset [0]: {valid_ds[0]}")

odds.close()

Output

Dataset length: 10
Train dataset length: 5, Valid dataset length: 5
Train dataset get(0): Data(x=0)
Valid dataset get(0): Data(x=0)  <- This is wrong
Train dataset [0]: Data(x=0)
Valid dataset [0]: Data(x=5) <- This is correct

Versions

Collecting environment information...
PyTorch version: 2.8.0+cu129
Is debug build: False
CUDA used to build PyTorch: 12.9
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Enterprise (10.0.26200 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.12 (main, Feb 12 2026, 00:40:26) [MSC v.1944 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-11-10.0.26200-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 581.29
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Name: Intel(R) Core(TM) i9-14900K
Manufacturer: GenuineIntel
Family: 207
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3200
MaxClockSpeed: 3200
L2CacheSize: 32768
L2CacheSpeed: None
Revision: None

Versions of relevant libraries:
[pip3] ml-torchserve-client==1.1.3
[pip3] numpy==2.2.6
[pip3] onnxruntime==1.19.2
[pip3] pytorch-lightning==2.6.1
[pip3] tbb==2021.7.1
[pip3] torch==2.8.0+cu129
[pip3] torch_cluster==1.6.3+pt28cu129
[pip3] torch-geometric==2.7.0
[pip3] torch_scatter==2.1.2+pt28cu129
[pip3] torch_sparse==0.6.18+pt28cu129
[pip3] torch_spline_conv==1.2.2+pt28cu129
[pip3] torchmetrics==1.8.2
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions