|
| 1 | +# OpenGNN |
| 2 | + |
| 3 | +OpenGNN is a machine learning library for learning over graph-structured data. It was built with generality in mind and supports tasks such as: |
| 4 | + |
| 5 | +* graph regression |
| 6 | +* graph-to-sequence mapping |
| 7 | + |
| 8 | +It supports various graph encoders including GGNNs, GCNs, SequenceGNNs and other variations of [neural graph message passing](https://arxiv.org/pdf/1704.01212.pdf). |
| 9 | + |
| 10 | +This library's design and usage patterns are inspired from [OpenNMT](https://github.com/OpenNMT/OpenNMT-tf) and uses the recent [Dataset](https://www.tensorflow.org/programmers_guide/datasets) and [Estimator](https://www.tensorflow.org/programmers_guide/estimators) APIs. |
| 11 | + |
| 12 | +## Installation |
| 13 | + |
| 14 | +OpenGNN requires |
| 15 | + |
| 16 | +* Python (>= 3.5) |
| 17 | +* Tensorflow (>= 1.10 < 2.0) |
| 18 | + |
| 19 | +To install the library aswell as the command-line entry points run |
| 20 | + |
| 21 | +``` pip install -e .``` |
| 22 | + |
| 23 | +## Getting Started |
| 24 | + |
| 25 | +To experiment with the library, you can use one datasets provided in the [data](/data) folder. |
| 26 | +For example, to experiment with the chemical dataset, first install the `rdkit` library that |
| 27 | +can be obtained by running `conda install -c rdkit rdkit`. |
| 28 | +Then, in the [data/chem](/data/chem) folder, run `python get_data.py` to download the dataset. |
| 29 | + |
| 30 | +After getting the data, generate a node and edge vocabulary for them using |
| 31 | +```bash |
| 32 | +ognn-build-vocab --field_name node_labels --save_vocab node.vocab \ |
| 33 | + molecules_graphs_train.jsonl |
| 34 | +ognn-build-vocab --no_pad_token --field_name edges --string_index 0 --save_vocab edge.vocab \ |
| 35 | + molecules_graphs_train.jsonl |
| 36 | +``` |
| 37 | + |
| 38 | +### Command Line |
| 39 | + |
| 40 | +The main entry point to the library is the `ognn-main` command |
| 41 | + |
| 42 | +```bash |
| 43 | +ognn-main <run_type> --model_type <model> --config <config_file.yml> |
| 44 | +``` |
| 45 | + |
| 46 | +Currently there are two run types: `train_and_eval` and `infer` |
| 47 | + |
| 48 | +For example, to train a model on the previously extracted chemical data |
| 49 | +(again inside [data/chem](/data/chem)) using a predefined model in the |
| 50 | +catalog |
| 51 | + |
| 52 | +```bash |
| 53 | +ognn-main train_and_eval --model_type chemModel --config config.yml |
| 54 | +``` |
| 55 | + |
| 56 | +You can also define your own model in a custom python script with a `model` function. |
| 57 | +For example, we can train using the a custom model in `model.py` using |
| 58 | + |
| 59 | +```bash |
| 60 | +ognn-main train_and_eval --model model.py --config config.yml |
| 61 | +``` |
| 62 | + |
| 63 | +While the training script doesn't log the training to the standard output, |
| 64 | +we can monitor training by using tensorboard on the model directory defined in |
| 65 | +[data/chem/config.yml](data/chem/config.yml). |
| 66 | + |
| 67 | +After training, we can perform inference on the valid file running |
| 68 | + |
| 69 | +``` |
| 70 | +ognn-main infer --model_type chemModel --config config.yml \ |
| 71 | + --features_file molecules_graphs_valid.jsonl |
| 72 | + --prediction_file molecules_predicted_valid.jsonl |
| 73 | +``` |
| 74 | + |
| 75 | + |
| 76 | +Examples of other config files can be found in the [data](/data) folder. |
| 77 | + |
| 78 | +### Library |
| 79 | + |
| 80 | +The library can also be easily integrated in your own code. |
| 81 | +The following example shows how to create a GGNN Encoder to encode a batch of random graphs. |
| 82 | + |
| 83 | +```python |
| 84 | +import tensorflow as tf |
| 85 | +import opengnn as ognn |
| 86 | + |
| 87 | +tf.enable_eager_execution() |
| 88 | + |
| 89 | +# build a batch of graphs with random initial features |
| 90 | +edges = tf.SparseTensor( |
| 91 | + indices=[ |
| 92 | + [0, 0, 0, 1], [0, 0, 1, 2], |
| 93 | + [1, 0, 0, 0], |
| 94 | + [2, 0, 1, 0], [2, 0, 2, 1], [2, 0, 3, 2], [2, 0, 4, 3]], |
| 95 | + values=[1, 1, 1, 1, 1, 1, 1], |
| 96 | + dense_shape=[3, 1, 5, 5]) |
| 97 | +node_features = tf.random_uniform((3, 5, 256)) |
| 98 | +graph_sizes = [3, 1, 5] |
| 99 | + |
| 100 | +encoder = ognn.encoders.GGNNEncoder(1, 256) |
| 101 | +outputs, state = encoder( |
| 102 | + edges, |
| 103 | + node_features, |
| 104 | + graph_sizes) |
| 105 | + |
| 106 | +print(outputs) |
| 107 | +``` |
| 108 | + |
| 109 | +Graphs are represented by a sparse adjency matrix with dimensionality |
| 110 | +`num_edge_types x num_nodes x num_nodes` and an initial distributed representation for each node. |
| 111 | + |
| 112 | +Similarly to sequences, when batching we need to pad the graphs to the maximum number of nodes in a graph |
| 113 | + |
| 114 | + |
| 115 | +## Acknowledgments |
| 116 | +The design of the library and implementations are based on |
| 117 | +* [OpenNMT-tf](https://github.com/OpenNMT/OpenNMT-tf) |
| 118 | +* [Gated Graph Neural Networks](https://github.com/Microsoft/gated-graph-neural-network-samples) |
| 119 | + |
| 120 | +Since most of the code adapted from OpenNMT-tf is spread across multiple files, the license for the |
| 121 | +library is located in the [base folder](/OPENNMT.LICENSE) rather than in the headers of the files. |
| 122 | + |
| 123 | +## Reference |
| 124 | + |
| 125 | +If you use this library in your own research, please cite |
| 126 | + |
| 127 | +``` |
| 128 | +@inproceedings{ |
| 129 | + pfernandes2018structsumm, |
| 130 | + title="Structured Neural Summarization", |
| 131 | + author={Patrick Fernandes and Miltiadis Allamanis and Marc Brockschmidt }, |
| 132 | + booktitle={Proceedings of the 7th International Conference on Learning Representations (ICLR)}, |
| 133 | + year={2019}, |
| 134 | + url={https://arxiv.org/abs/1811.01824}, |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | + |
0 commit comments