Why does batch_size in dataloader only create batches across graphs, and not across data samples? #2634

sidhantls · 2021-05-25T18:06:09Z

sidhantls
May 25, 2021

Is there any reason why the dataloader creates batches only across graphs but not across the data samples?

I'm using the WikiCS dataset, and using a dataloader here. This dataset only contains one graph. Because it has only 1 graph, when I'm using the dataloader, it only creates on batch of the same size of the dataset. If a dataset just contains one large graph, then wouldn't it be desired to create batches from this?

dataset = torch_geometric.datasets.WikiCS(data_dir)
print(dataset) # WikiCS()

loader = DataLoader(dataset, batch_size=16, shuffle=True)
print(len(loader)) # 1 

for batch in loader:
    break
print(batch) # Batch(batch=[11701], edge_index=[2, 297110], ptr=[2], stopping_mask=[11701, 20], test_mask=[11701], train_mask=[11701, 20], val_mask=[11701, 20], x=[11701, 300], y=[11701])

Answered by ChenYizhu97

May 26, 2021

For the dataset consists of one large graph, I think you might want to divide the graph into several partitions first, then use a proper dataloader such as Clusterloader so that each batch contains several partitions of the graph. By doing this, you train the model on a subset of the graph each step.

This notebook provided by torch_geometric document would be helpful.

View full answer

ChenYizhu97 · 2021-05-26T07:18:08Z

ChenYizhu97
May 26, 2021

For the dataset consists of one large graph, I think you might want to divide the graph into several partitions first, then use a proper dataloader such as Clusterloader so that each batch contains several partitions of the graph. By doing this, you train the model on a subset of the graph each step.

This notebook provided by torch_geometric document would be helpful.

1 reply

rusty1s May 26, 2021
Maintainer

Yes, this is correct. Note that there exists multiple techniques for creating mini-batches of single graphs. As such, DataLoader refers to creating mini-batches of a set of graphs, while other dataloaders try to tackle the task of creating mini-batches of a single graph, e.g., NeighborSampler, ClusterLoader, GraphSAINTSampler or ShaDowKHopSampler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does batch_size in dataloader only create batches across graphs, and not across data samples? #2634

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Why does batch_size in dataloader only create batches across graphs, and not across data samples? #2634

Uh oh!

Uh oh!

sidhantls May 25, 2021

Replies: 1 comment · 1 reply

Uh oh!

ChenYizhu97 May 26, 2021

Uh oh!

Uh oh!

rusty1s May 26, 2021 Maintainer

sidhantls
May 25, 2021

Replies: 1 comment 1 reply

ChenYizhu97
May 26, 2021

rusty1s May 26, 2021
Maintainer