How to create and save my own graph data? #2129

hkim716 · 2021-02-17T03:36:33Z

hkim716
Feb 17, 2021

I'm trying to create my own graph datasets for a GNN model using PyG.
When I load citeseer dataset, I can see the information as Data(edge_index=[2, 9104], test_mask=[3327], train_mask=[3327], val_mask=[3327], x=[3327, 3703], y=[3327]). And there were processed and raw folders that contain several files. e.g. ind.citeseer.allx, ind.citeseer.graph, and so on.

Suppose that I would like to make graph datasets, with 100 nodes and 3 features at each node. I have feature and node connection information in the form of numpy array which is npz file.
How can I efficiently make my own datasets that are compatible with PyTorch Geometric? Is it possible to make the graph datasets using networkx, and do you recommend it? Then how can I save the graph datasets such that we could use .npz file for image data?

rusty1s · 2021-02-17T06:53:42Z

rusty1s
Feb 17, 2021
Maintainer

I do not recommend using networkx if your data is already present in the form of numpy arrays.
In dataset.process, you will need to take care of converting those numpy files to the PyG data object though. Then, your dataset may look like something like this:

class MyDataset(InMemoryDataset):
    def __init__(self, root, raw_filename, transform)
        self.raw_filename = raw_filename
        super().__init__(root, transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def processed_file_names(self):
        return 'data.pt'

    def process(self):
        data_list = []  # Read raw files and create data list out of them.
        torch.save(self.collate(data_list), self.processed_paths[0])

14 replies

rusty1s Feb 26, 2021
Maintainer

In general, global readout functions perform mostly just as well as sophisticated pooling layers, which coarsen the graph after each iteration. The majority of GNN implementations just perform global pooling rather than iterative pooling. You currently already do this via torch.mean(...), but in order to work on mini-batches of batch_size > 1, you will need to replace that call with torch_geometric.nn.global_mean_pool.

hkim716 Feb 28, 2021
Author

Hey Matt,
I would like to reduce the dimensionality to 10 nodes using pooling layer.
I added a global_mean_pool, but I cannot work with it.
Why I get an error NameError: name 'batch' is not defined
Can you fix my code?

class MyModel(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MyModel, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, 3)
        self.conv2 = pyg_nn.GCNConv(3, out_channels)
        self.conv3 = pyg_nn.GCNConv(out_channels, 3)
        self.pool3 = pyg_nn.global_mean_pool(3, batch, 10)
        self.conv4 = pyg_nn.GCNConv(3, in_channels)
        
  
    def forward(self, data):
        
        x, edge_index, batch = data.pos, data.edge_index, data.batch
        x = x.float()
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        emb = x
        x = F.relu(x)
        x = self.conv3(x, edge_index)
        x = F.relu(x)
        x = pool3(x)
        x = self.conv4(x, edge_index)

        return emb, x

    def loss(self, pred, label):

        return F.mse_loss(pred, label.to(torch.float32))

rusty1s Feb 28, 2021
Maintainer

The global_mean_pool method is not a module, but a function. You can simply call it inside your forward function:

class MyModel(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MyModel, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, 3)
        self.conv2 = pyg_nn.GCNConv(3, out_channels)
        self.conv3 = pyg_nn.GCNConv(out_channels, in_channels)
        
  
    def forward(self, data):
        x, edge_index, batch = data.pos, data.edge_index, data.batch
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        emb = x
        x = F.relu(x)
        x = self.conv3(x, edge_index)
        x = pyg.nn.global_mean_pool(x, batch)

        return emb, x

hkim716 Mar 1, 2021
Author

What is the shape of dataset from global_mean_pool? If I want to reduce my 100 nodes to 30 nodes, how can I define global_mean_pool?

rusty1s Mar 1, 2021
Maintainer

global_mean_pool will aggregate all node features into a single vector representation. In case you work on a single graph, it will output a [1, num_features] matrix. In case you work on multiple graphs, it will output a [batch_size, num_features] matrix.

If you want to slowly coarsen your graph, e.g., going from 100 to 30 nodes, I suggest you to look into the top-k pooling approaches provided by PyG.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to create and save my own graph data? #2129

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 14 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to create and save my own graph data? #2129

Uh oh!

hkim716 Feb 17, 2021

Replies: 1 comment · 14 replies

Uh oh!

rusty1s Feb 17, 2021 Maintainer

Uh oh!

rusty1s Feb 26, 2021 Maintainer

Uh oh!

hkim716 Feb 28, 2021 Author

Uh oh!

rusty1s Feb 28, 2021 Maintainer

Uh oh!

hkim716 Mar 1, 2021 Author

Uh oh!

rusty1s Mar 1, 2021 Maintainer

hkim716
Feb 17, 2021

Replies: 1 comment 14 replies

rusty1s
Feb 17, 2021
Maintainer

rusty1s Feb 26, 2021
Maintainer

hkim716 Feb 28, 2021
Author

rusty1s Feb 28, 2021
Maintainer

hkim716 Mar 1, 2021
Author

rusty1s Mar 1, 2021
Maintainer