tensorflow
diff --git a/‎research/README.md‎
Lines changed: 22 additions & 8 deletions b/‎research/README.md‎
Lines changed: 22 additions & 8 deletions
diff --git a/‎research/kg_hyp_emb/README.md‎
Lines changed: 97 additions & 0 deletions b/‎research/kg_hyp_emb/README.md‎
Lines changed: 97 additions & 0 deletions
diff --git a/‎research/kg_hyp_emb/__init__.py‎ b/‎research/kg_hyp_emb/__init__.py‎
diff --git a/‎research/kg_hyp_emb/config.py‎
Lines changed: 60 additions & 0 deletions b/‎research/kg_hyp_emb/config.py‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎research/kg_hyp_emb/datasets/__init__.py‎ b/‎research/kg_hyp_emb/datasets/__init__.py‎
diff --git a/‎research/kg_hyp_emb/datasets/datasets.py‎
Lines changed: 81 additions & 0 deletions b/‎research/kg_hyp_emb/datasets/datasets.py‎
Lines changed: 81 additions & 0 deletions
diff --git a/‎research/kg_hyp_emb/datasets/download.sh‎
Lines changed: 19 additions & 0 deletions b/‎research/kg_hyp_emb/datasets/download.sh‎
Lines changed: 19 additions & 0 deletions
@@ -3,20 +3,34 @@
 Note that these research projects are not included in the prebuilt NSL pip
 package.
 
+## [Low-Dimensional Hyperbolic Knowledge Graph Embeddings](kg_hyp_emb)
+
+The implementations of Low-Dimensional Hyperbolic Knowledge Graph Embeddings [3]
+are provided in the `kg_hyp_emb` folder on a strict "as is" basis, without
+warranties or conditions of any kind. Also, these implementations may not be
+compatible with certain TensorFlow versions or Python versions.
+
+[3] Chami, Ines, et al. "Low-Dimensional Hyperbolic Knowledge Graph Embeddings."
+ACL 2020.
+
 ## [A2N](a2n): Attending to Neighbors for Knowledge Graph Inference
 
-The implementations of A2N [1] are provided in the `a2n` folder on a strict "as
+The implementations of A2N [2] are provided in the `a2n` folder on a strict "as
 is" basis, without warranties or conditions of any kind. Also, these
-implementations may not be compatible with certain TensorFlow versions (such as
-2.0 or above) or Python versions.
+implementations may not be compatible with certain TensorFlow versions or Python
+versions.
 
-[[1] T. Bansal, D. Juan, S. Ravi and A. McCallum. "A2N: Attending to Neighbors
+[[2] T. Bansal, D. Juan, S. Ravi and A. McCallum. "A2N: Attending to Neighbors
 for Knowledge Graph Inference." ACL
 2019](https://www.aclweb.org/anthology/P19-1431)
 
 ## [GAM](gam): Graph Agreement Models for Semi-Supervised Learning
 
-The implementations of Graph Agreement Models (GAMs) are provided in the `gam`
-folder on a strict "as is" basis, without warranties or conditions of any kind.
-Also, these implementations may not be compatible with certain TensorFlow
-versions (such as 2.0 or above) or Python versions.
+The implementations of Graph Agreement Models (GAMs) [1] are provided in the
+`gam` folder on a strict "as is" basis, without warranties or conditions of any
+kind. Also, these implementations may not be compatible with certain TensorFlow
+versions or Python versions.
+
+[[1] O. Stretcu, K. Viswanathan, D. Movshovitz-Attias, E.A. Platanios, S. Ravi,
+A. Tomkins. "Graph Agreement Models for Semi-Supervised Learning." NeurIPS
+2019](https://papers.nips.cc/paper/9076-graph-agreement-models-for-semi-supervised-learning)
@@ -0,0 +1,97 @@
+# Knowledge Graph (KG) Embedding Library
+
+This project is a Tensorflow 2.0 implementation of Hyperbolic KG embeddings [6]
+as well as multiple state-of-the-art KG embedding models which can be trained
+for the link prediction task.
+
+## Library Overview
+
+This implementation includes the following models:
+
+Complex embeddings:
+
+*   Complex [1]
+*   Complex-N3 [2]
+*   RotatE [3]
+
+Euclidean embeddings:
+
+*   CTDecomp [2]
+*   TransE [4]
+*   MurE [5]
+*   RotE (new)
+*   RefE (new)
+*   AttE (new)
+
+Hyperbolic embeddings:
+
+*   TransH (new)
+*   RotH (new)
+*   RefH (new)
+*   AttH (new)
+
+## Usage
+
+First, create a python 3.7 environment and install dependencies: From kgemb/
+
+```bash
+virtualenv -p python3.7 kgenv
+```
+
+```bash
+source kgenv/bin/activate
+```
+
+```bash
+pip install -r requirements.txt
+```
+
+Then, download and pre-process the datasets:
+
+```bash
+source datasets/download.sh
+```
+
+```bash
+python datasets/process.py
+```
+
+Add the package to your local path:
+
+```bash
+KG_DIR=$(pwd)/..
+```
+
+```bash
+export PYTHONPATH="$KG_DIR:$PYTHONPATH"
+```
+
+Then, train a model using the `train.py` script. We provide an example to train
+RefE on FB15k-237:
+
+```bash
+python train.py --max_epochs 100 --dataset FB237 --model RefE --loss_fn SigmoidCrossEntropy --neg_sample_size -1 --data_dir data --optimizer Adagrad --lr 5e-2 --save_dir logs --rank 500 --entity_reg 1e-5 --rel_reg 1e-5 --patience 10 --valid 5 --save_model=false --save_logs=true --regularizer L3 --initializer GlorotNormal
+```
+
+This model should achieve around 54% Hits@10 on the FB237 test set.
+
+## References
+
+[1] Trouillon, Théo, et al. "Complex embeddings for simple link prediction."
+International Conference on Machine Learning. 2016.
+
+[2] Lacroix, Timothee, et al. "Canonical Tensor Decomposition for Knowledge Base
+Completion." International Conference on Machine Learning. 2018.
+
+[3] Sun, Zhiqing, et al. "Rotate: Knowledge graph embedding by relational
+rotation in complex space." International Conference on Learning
+Representations. 2019.
+
+[4] Bordes, Antoine, et al. "Translating embeddings for modeling
+multi-relational data." Advances in neural information processing systems. 2013.
+
+[5] Balažević, Ivana, et al. "Multi-relational Poincaré Graph Embeddings."
+Advances in neural information processing systems. 2019.
+
+[6] Chami, Ines, et al. Low-Dimensional Hyperbolic Knowledge Graph Embeddings.
+Under submission. 2019.
@@ -0,0 +1,60 @@
+# Copyright 2020 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Default configuration parameters."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+CONFIG = {
+    'string': {
+        'dataset': ('Dataset', 'WN18RR'),
+        'model': ('Model', 'RotE'),
+        'data_dir': ('Path to data directory', 'data/'),
+        'save_dir': ('Path to logs directory', 'logs/'),
+        'loss_fn': ('Loss function to use', 'SigmoidCrossEntropy'),
+        'initializer': ('Which initializer to use', 'GlorotNormal'),
+        'regularizer': ('Regularizer', 'N3'),
+        'optimizer': ('Optimizer', 'Adam'),
+        'bias': ('Bias term', 'learn'),
+        'dtype': ('Precision to use', 'float32'),
+    },
+    'float': {
+        'lr': ('Learning rate', 1e-3),
+        'lr_decay': ('Learning rate decay', 0.96),
+        'min_lr': ('Minimum learning rate decay', 1e-5),
+        'gamma': ('Margin for distance-based losses', 0),
+        'entity_reg': ('Regularization weight for entity embeddings', 0),
+        'rel_reg': ('Regularization weight for relation embeddings', 0),
+    },
+    'integer': {
+        'patience': ('Number of validation steps before early stopping', 20),
+        'valid': ('Number of epochs before computing validation metrics', 5),
+        'checkpoint': ('Number of epochs before checkpointing the model', 5),
+        'max_epochs': ('Maximum number of epochs to train for', 400),
+        'rank': ('Embeddings dimension', 500),
+        'batch_size': ('Batch size', 500),
+        'neg_sample_size':
+            ('Negative sample size, -1 to use loss without negative sampling',
+             50),
+    },
+    'boolean': {
+        'train_c': ('Whether to train the hyperbolic curvature or not', True),
+        'debug': ('If debug is true, only use 1000 examples for'
+                  ' debugging purposes', False),
+        'save_logs':
+            ('Whether to save the training logs or print to stdout', True),
+        'save_model': ('Whether to save the model weights', False)
+    }
+}
@@ -0,0 +1,81 @@
+# Copyright 2020 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Dataset class for loading and processing KG datasets."""
+
+import os
+import pickle as pkl
+
+import numpy as np
+import tensorflow as tf
+
+
+class DatasetFn(object):
+  """Knowledge Graph dataset class."""
+
+  def __init__(self, data_path, debug):
+    """Creates KG dataset object for data loading.
+
+    Args:
+      data_path: Path to directory containing train/valid/test pickle files
+        produced by process.py.
+      debug: boolean indicating whether to use debug mode or not. If true, the
+        dataset will only contain 1000 examples for debugging.
+    """
+    self.data_path = data_path
+    self.debug = debug
+    self.data = {}
+    for split in ['train', 'test', 'valid']:
+      file_path = os.path.join(self.data_path, split + '.pickle')
+      with open(file_path, 'rb') as in_file:
+        self.data[split] = pkl.load(in_file)
+    filters_file = open(os.path.join(self.data_path, 'to_skip.pickle'), 'rb')
+    self.to_skip = pkl.load(filters_file)
+    filters_file.close()
+    max_axis = np.max(self.data['train'], axis=0)
+    self.n_entities = int(max(max_axis[0], max_axis[2]) + 1)
+    self.n_predicates = int(max_axis[1] + 1) * 2
+
+  def get_filters(self,):
+    """Return filter dict to compute ranking metrics in the filtered setting."""
+    return self.to_skip
+
+  def get_examples(self, split):
+    """Get examples in a split.
+
+    Args:
+      split: String indicating the split to use (train/valid/test).
+
+    Returns:
+      examples: tf.data.Dataset contatining KG triples in a split.
+    """
+    examples = self.data[split]
+    if split == 'train':
+      copy = np.copy(examples)
+      tmp = np.copy(copy[:, 0])
+      copy[:, 0] = copy[:, 2]
+      copy[:, 2] = tmp
+      copy[:, 1] += self.n_predicates // 2
+      examples = np.vstack((examples, copy))
+    if self.debug:
+      examples = examples[:1000]
+      examples = examples.astype(np.int64)
+    tf_dataset = tf.data.Dataset.from_tensor_slices(examples)
+    if split == 'train':
+      buffer_size = examples.shape[0]
+      tf_dataset.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
+    return tf_dataset
+
+  def get_shape(self):
+    """Returns KG dataset shape."""
+    return self.n_entities, self.n_predicates, self.n_entities
@@ -0,0 +1,19 @@
+#!/bin/bash
+# Copyright 2020 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Dataset download script using open source datasets from the kbc repository.
+wget https://dl.fbaipublicfiles.com/kbc/data.tar.gz
+tar -xvzf data.tar.gz
+rm data.tar.gz