High CPU Ulitization with RandomNodeSampler #3705

gebauerm · 2021-12-15T13:58:13Z

gebauerm
Dec 15, 2021

Hey,

I am currently using the framework to try out some GNN ideas. However as I in particular dont want to write everything myself I am thankful that the framework exists.
Unfortunately I am having some issues, when I want to run multiple models in parallel, as my CPU Utilization is extremely high and creates a bottleneck for other processes that are running in parallel, thus my GPU is barely used.
However I was able to track down the "leakage" and it seems to be related to the RandomNodeSampler. I attached a small code examples which illustrates the problem.

from torch_geometric.loader import RandomNodeSampler
from torchvision import datasets
from torchvision.transforms import ToTensor
import torch
import tqdm
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures
import psutil
import numpy as np


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())
data = dataset[0]
data_2 = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

loader = RandomNodeSampler(data, num_parts=5, shuffle=False)
cpu = []

for epoch in tqdm.tqdm(range(1, 50)):
    for batch in loader:
        batch.to(device)
        cpu += [psutil.cpu_percent()]
avg_cpu = np.mean(cpu)
print(f"Avg CPU utilization for PyG: {avg_cpu:.2f}")

loader_2 = torch.utils.data.DataLoader(
    data_2,
    batch_size=10, shuffle=False)

for epoch in tqdm.tqdm(range(1, 3)):
    for batch in loader_2:
        cpu += [psutil.cpu_percent()]
        x, y = batch
avg_cpu = np.mean(cpu)
print(f"Avg CPU utilization for Pytorch: {avg_cpu:.2f}")

I do not use one of these datasets in my own scenario, they just serve as an example.
When I execude the code above, the CPU Utilization for the PyG Loader is at ~25% (in my scenario with a custom Data Object it is above 50%), whereas the Pytorch Loader is at ~6%, which prevents me from having a bottleneck in the latter case.
I am aware of a Graph requiring more computations in order to obtain a Subgraph, which serves as a batch than independent datapoints. I am just wondering that the computational cost is that high, that it leads to such problematic behavior when parallelized. Am I doing something wrong or is this behavior unavoidable?

Please let me know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High CPU Ulitization with RandomNodeSampler #3705

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

High CPU Ulitization with RandomNodeSampler #3705

Uh oh!

gebauerm Dec 15, 2021

Replies: 0 comments

gebauerm
Dec 15, 2021