In order to use your own data, you have to create a dictionary with the following data format and compress it as a joblib dump file.
- a list of adjacency matrices.
- Format: a list of a sparse adjacency matrix.
- A sparse matrix is represented as a tuple ('ind', 'value', 'shape'), where 'ind' expresses the indices of the matrix as a pair of row-col vectors (rows, cols), 'value' is a vector of the entries of the matrix, and 'shape' is a shape of the matrix, that is a pair of the number of rows and the number of cols.
- Format: a scalar value of the maximum number of nodes in a graph.
- Format: a list of M by D feature matrices (D is the number of features per node).
- Format: a list of E binary label matrices (E is the number of classes).
- Format: a scalar value of the number of all nodes in all graph (= N)
- Format: a list of a vector for indices of nodes in a graph. (0<= node index < N)
The following optoins are optional for multimodal mode (e.g. GCN and DNN)
- Format: a list of symbolic sequences as a integer matrix (the number of graphs x the maximum length of sequences)
- Each element is represented as an integer encoding a symbol (1<= element <=S).
- Format: a list of lengths of sequences. A length of this list should be the number of graphs.
- Format: a scalar value of the number of symbols in sequences (= S).
- Format: a list of symbolic sequences as a integer matrix (the number of graphs x the maximum length of sequences)
- Each element is represented as an integer encoding a symbol (1<= element <=S).
- Format: a list of vectors as a floating matrix (the number of graphs x the dimension of features)
- "profeat", "dragon", and "ecfp" are processed as the same way.