Merge pull request #59 from chamii22:master

tensorflow-copybara · tensorflow-copybara · commit 995064233479 · 2020-04-30T15:12:03.000-07:00
PiperOrigin-RevId: 309308335
diff --git a/research/kg_hyp_emb/README.md b/research/kg_hyp_emb/README.md
@@ -2,19 +2,20 @@
 
 This project is a Tensorflow 2.0 implementation of Hyperbolic KG embeddings [6]
 as well as multiple state-of-the-art KG embedding models which can be trained
-for the link prediction task.
+for the link prediction task. A PyTorch implementation is also available at:
+[https://github.com/HazyResearch/KGEmb](https://github.com/HazyResearch/KGEmb)
 
 ## Library Overview
 
 This implementation includes the following models:
 
-Complex embeddings:
+#### Complex embeddings:
 
 *   Complex [1]
 *   Complex-N3 [2]
 *   RotatE [3]
 
-Euclidean embeddings:
+#### Euclidean embeddings:
 
 *   CTDecomp [2]
 *   TransE [4]
@@ -23,14 +24,14 @@ Euclidean embeddings:
 *   RefE [6]
 *   AttE [6]
 
-Hyperbolic embeddings:
+#### Hyperbolic embeddings:
 
 *   TransH [6]
 *   RotH [6]
 *   RefH [6]
 *   AttH [6]
 
-## Usage
+## Installation
 
 First, create a python 3.7 environment and install dependencies: From kgemb/
 
@@ -66,6 +67,8 @@ KG_DIR=$(pwd)/..
 export PYTHONPATH="$KG_DIR:$PYTHONPATH"
 ```
 
+## Example usage
+
 Then, train a model using the `train.py` script. We provide an example to train
 RefE on FB15k-237:
 
@@ -75,6 +78,27 @@ python train.py --max_epochs 100 --dataset FB237 --model RefE --loss_fn SigmoidC
 
 This model achieves 54% Hits@10 on the FB237 test set.
 
+## New models
+
+To add a new (complex/hyperbolic/Euclidean) Knowledge Graph embedding model,
+implement the corresponding query embedding under models/, e.g.:
+
+```
+def get_queries(self, input_tensor):
+    entity = self.entity(input_tensor[:, 0])
+    rel = self.rel(input_tensor[:, 1])
+    result = ### Do something here ###
+    return return result
+```
+
+## Citation
+
+If you use the codes, please cite the following paper [6]:
+
+```
+TODO: add bibtex
+```
+
 ## References
 
 [1] Trouillon, Théo, et al. "Complex embeddings for simple link prediction."
diff --git a/research/kg_hyp_emb/datasets/process.py b/research/kg_hyp_emb/datasets/process.py
@@ -108,16 +108,14 @@ def process_dataset(path):
               corresponding KG triples.
     filters: Dictionary containing filters for lhs and rhs predictions.
   """
-  lhs_skip = collections.defaultdict(set)
-  rhs_skip = collections.defaultdict(set)
   ent2idx, rel2idx = get_idx(dataset_path)
   examples = {}
-  for split in ['train', 'valid', 'test']:
+  splits = ['train', 'valid', 'test']
+  for split in splits:
     dataset_file = os.path.join(path, split)
     examples[split] = to_np_array(dataset_file, ent2idx, rel2idx)
-    lhs_filters, rhs_filters = get_filters(examples[split], len(rel2idx))
-    lhs_skip.update(lhs_filters)
-    rhs_skip.update(rhs_filters)
+  all_examples = np.concatenate([examples[split] for split in splits], axis=0)
+  lhs_skip, rhs_skip = get_filters(all_examples, len(rel2idx))
   filters = {'lhs': lhs_skip, 'rhs': rhs_skip}
   return examples, filters