Creating Dataset #2328
Replies: 2 comments 7 replies
-
In your example, the node features would have shape If you want to do graph-classification on subgraphs, you need to specify which nodes belong to a given cluster/label. You can think of this as a (sparse) assignment matrix, where each node is mapped to a given cluster. In case you have 4 nodes in one cluster, and 3 nodes in another, this can be represented as a sparse matrix of shape x = ... # GNN output node features
row, col = cluster_assignment_index
z = scatter_add(x[row], col, dim=0) # z has shape [num_clusters, num_features] |
Beta Was this translation helpful? Give feedback.
-
transaction_data_100.npz.zip Shapes of the numpy array makes sense. I would also be adding edge features. But I guess that will also be a numpy array of the shape (no. of edges, no. of features). Do I need to save these all arrays in a single file (.npz)? I was following the discussion I was able to create the 'npz' object. I have attached it here. Now I was working on the MyDataset class. Getting an error in the process function Traceback (most recent call last): I actually don't want to do processing because I have in the 'npz' object, edge_index, train_mask, test_mask, y, edge_attr files and I do see processed folder with data.pt file. Below is the MyDataset class
For the Graph Classification dataset, you are saying that I should have label for each node. Then how is it different from the dataset for node classification(in terms of the shapes of numpy array)? I thought there will be label for each sub-graph. For a subgraph with 3 nodes(node1, node2, node3) belonging to class 1, and subgraph of 4 nodes(node4, node5, node6, node7) belonging to class 2, will I have matrix of shape [2,2] (2 subgraphs and corresponding class) or [7,2] (class of each node). P.S. If I generate dataset for node classification, then what changes I need to do to use the same dataset for the task of Graph classification? Currently the dataset I am trying to convert has the label for each node. If there is say a sub-graph that belongs to say class1, then all the nodes in the subgraph will have label 1. As you can see we can apply node-classification task on this easily. I also want to apply Graph classification task. For that I am first segregating all the subgraphs and assigning label to the subgraph and not node. Can you suggest how should I transform the dataset for the node classification task to graph classification task? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Matt,
I am working on the task of anomalous detection in financial transaction using Graph based methods. I am currently using AMLSim from IBM repository for creating dataset. The idea is to represents transaction in a graph like structure and use node classification or graph classification methods from PyG to classify fraudulent accounts(node classification) or series of fraudulent transactions(Graph classification). The dataset currently is in csv format which I was thinking of converting in numpy array. Can you maybe guide me on how the numpy array should look like?
The dataset has information for each account in a accounts.csv and all the transaction between them in another transaction.csv. One row in account.csv will have information of one unique account and one row in transaction.csv will have one transaction information between any two accounts. Account.csv will form my node features, and transaction.csv will form my edge index, but would like to know on what the numpy array shape would be.
Eg- what will be the numpy array shape be if there are 10 accounts each with 5 features and there are say 25 transactions among the 10 accounts for node classification as well as graph classification task?
Also, there will be lot of unconnected graphs as it might be that there are say 4 nodes(accounts) in one cluster which are connected to each other because there is a transaction happening between them and say 3 nodes(accounts) in another cluster. Will I need to specify all these subgraphs if I want to apply graph classification task?
Beta Was this translation helpful? Give feedback.
All reactions