Problem about deal with new dataset #2721

jianchaoji · 2021-06-10T12:37:53Z

jianchaoji
Jun 10, 2021

I got a new dataset that consist of two files. One is relation.txt and the other one is feature.txt

The relation file is representing the connection relationship between different nodes like this:
1 0
2 0
3 0
2 1
3 1

And the feature file is the one hot features of each node like this:
1 0 0 1 1 0 0 0
2 0 1 0 0 0 1 0

Can I get some advice about how to transform them to the format like "planetoid" (https://github.com/kimiyoung/planetoid/tree/master/data)

Thank you so much!

Answered by saiden89

Jun 10, 2021

Hi,
it would probably be best if you take a look here for a well detailed example on how to structure your data for Pytorch Geometric.
In a nutshell, your features are just a torch.tensor with shape (num_nodes, num_features). In your case:

x = torch.tensor([[1, 0, 0, 1, 1, 0, 0, 0], 
                  [2, 0, 1, 0, 0, 0, 1, 0]], dtype=torch.long)

As for your edge_list (what you call connection relationship), that's another torch.tensor of shape (2, num_edges). If you're working with an undirected graph, the index should report both directions. Like so:

edge_attr = torch.tensor([[1, 0],
                          [0, 1], 
                          [2, 0],
                          [0, 2],   …

View full answer

saiden89 · 2021-06-10T15:17:05Z

saiden89
Jun 10, 2021

Hi,
it would probably be best if you take a look here for a well detailed example on how to structure your data for Pytorch Geometric.
In a nutshell, your features are just a torch.tensor with shape (num_nodes, num_features). In your case:

x = torch.tensor([[1, 0, 0, 1, 1, 0, 0, 0], 
                  [2, 0, 1, 0, 0, 0, 1, 0]], dtype=torch.long)

As for your edge_list (what you call connection relationship), that's another torch.tensor of shape (2, num_edges). If you're working with an undirected graph, the index should report both directions. Like so:

edge_attr = torch.tensor([[1, 0],
                          [0, 1], 
                          [2, 0],
                          [0, 2],                           
                          [3, 0],
                          [0, 3],
                          [2, 1],
                          [1, 2],
                          [3, 1],
                          [1, 3]], dtype=torch.long)

That's just the basics, but you will find the Pytorch Geometric is very consistent when it comes to data structures. For a more detailed explanation feel free to browse the linked documentation as well as the worked examples in this GitHub repository. There are also some Colab notebooks if you prefer.

2 replies

rusty1s Jun 11, 2021
Maintainer

To follow-up on @saiden89, you can read your txt files in via standard Python file reading, or more advanced techniques such as pandas.read_csv. This will get you the aforementioned tensors as numpy arrays.

jianchaoji Jun 11, 2021
Author

I will have a try. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem about deal with new dataset #2721

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Problem about deal with new dataset #2721

Uh oh!

jianchaoji Jun 10, 2021

Replies: 1 comment · 2 replies

Uh oh!

saiden89 Jun 10, 2021

Uh oh!

rusty1s Jun 11, 2021 Maintainer

Uh oh!

jianchaoji Jun 11, 2021 Author

jianchaoji
Jun 10, 2021

Replies: 1 comment 2 replies

saiden89
Jun 10, 2021

rusty1s Jun 11, 2021
Maintainer

jianchaoji Jun 11, 2021
Author