HKUST-KnowComp
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 31 additions & 0 deletions b/‎LICENSE‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 142 additions & 0 deletions b/‎README.md‎
Lines changed: 142 additions & 0 deletions
diff --git a/‎config.py‎
Lines changed: 138 additions & 0 deletions b/‎config.py‎
Lines changed: 138 additions & 0 deletions
@@ -0,0 +1,6 @@
+*.pyc
+*.DS_Store
+*~
+data/
+*.tar.gz
+*.egg-info
@@ -0,0 +1,31 @@
+BSD License
+
+For MnemonicReader software
+
+Copyright (c) 2018-present, HKUST-KnowComp. 
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+    list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+       and/or other materials provided with the distribution.
+
+ * Neither name HKUST-KnowComp nor the names of its contributors may be used to
+    endorse or promote products derived from this software without specific
+       prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,142 @@
+# Mnemonic Reader
+The Mnemonic Reader is a deep learning model for Machine Comprehension task. You can get details from this [paper](https://arxiv.org/pdf/1705.02798.pdf). It combines advantages of [match-LSTM](https://arxiv.org/pdf/1608.07905), [R-Net](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) and [Document Reader](https://arxiv.org/abs/1704.00051) and utilizes a new unit, the Semantic Fusion Unit (SFU), to achieve state-of-the-art results (at that time).
+
+This model is a [PyTorch](http://pytorch.org/) implementation of Mnemonic Reader. At the same time, a PyTorch implementation of R-Net and a PyTorch implementation of Document Reader are also included to compare with the Mnemonic Reader. Pretrained models are also available in [release](https://github.com/HKUST-KnowComp/MnemonicReader/releases).
+
+This repo belongs to [HKUST-KnowComp](https://github.com/HKUST-KnowComp) and is under the [BSD LICENSE](LICENSE).
+
+Some codes are implemented based on [DrQA](https://github.com/facebookresearch/DrQA).
+
+Please feel free to contact with Xin Liu ([email protected]) if you have any question about this repo.
+
+### Evaluation on SQuAD
+
+| Model                                 | DEV_EM | DEV_F1 |
+| ------------------------------------- | ------ | ------ |
+| Document Reader (original paper)      | 69.5   | 78.8   |
+| Document Reader (trained model)       | 69.4   | 78.6   |
+| R-Net (original paper 1)              | 71.1   | 79.5   |
+| R-Net (original paper 2)              | 72.3   | 80.6   |
+| R-Net (trained model)                 | 70.2   | 79.2   |
+| Mnemonic Reader (original paper)      | 71.8   | 81.2   |
+| Mnemonic Reader + RL (original paper) | 72.1   | 81.6   |
+| Mnemonic Reader (trained model)       | 72.3   | 81.4   |
+
+![EM_F1](img/EM_F1.png)
+
+### Requirements
+
+* Python >= 3.4
+* PyTorch >= 0.31
+* spaCy >= 2.0.0
+* tqdm
+* ujson
+* numpy
+* prettytable
+
+### Prepare
+
+First of all, you need to download the dataset and pre-trained word vectors.
+
+```bash
+mkdir -p data/datasets
+wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -O data/datasets/SQuAD-train-v1.1.json
+wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O data/datasets/SQuAD-dev-v1.1.json
+```
+
+```bash
+mkdir -p data/embeddings
+wget http://nlp.stanford.edu/data/glove.840B.300d.zip -O data/embeddings/glove.840B.300d.zip
+cd data/embeddings
+unzip glove.840B.300d.zip
+```
+
+Then, you need to preprocess these data.
+
+```bash
+python script/preprocess data/datasets data/datasets --split SQuAD-train-v1.1
+python script/preprocess data/datasets data/datasets --split SQuAD-dev-v1.1
+```
+
+If you want to use multicores to speed up, you could add `--num-workers 4` in commands.
+
+### Train
+
+There are some parameters to set but default values are ready. If you are not interested in tuning parameters, you can use default values. Just run:
+
+```bash
+python script/train.py
+```
+
+After several hours, you will get the model in `data/models/`, e.g. `20180416-acc9d06d.mdl` and you can see the log file in `data/models/`, e.g. `20180416-acc9d06d.txt`.
+
+### Predict
+
+To evaluate the model you get, you should complete this part.
+
+```bash
+python script/predict.py --model data/models/20180416-acc9d06d.mdl
+```
+
+You need to change the model name in the command above.
+
+You will not get results directly but to use the official `evaluate-v1.1.py` in `data/script`.
+
+```bash
+python script/evaluate-v1.1.py data/predict/SQuAD-dev-v1.1-20180416-acc9d06d.preds data/datasets/SQuAD-dev-v1.1.json
+```
+
+### Interactivate
+
+In order to help those who are interested in QA systems, `script/interactivate.py` provides an easy but good demo.
+
+```bash
+python script/interactivate.py --model data/models/20180416-acc9d06d.mdl
+```
+
+Then you will drop into an interactive session. It looks like:
+
+```python
+* Interactive Module *
+
+* Repo: Mnemonic Reader (https://github.com/HKUST-KnowComp/MnemonicReader)
+
+* Implement based on Facebook's DrQA
+
+>>> process(document, question, candidates=None, top_n=1)
+>>> usage()
+
+>>> text = "Mary had a little lamb, whose fleece was white as snow. And everywhere that Mary went the lamb was sure to go."
+>>> question = "What color is Mary's lamb?"
+>>> process(text, question)
+
++------+-------+---------+
+| Rank |  Span |  Score  |
++------+-------+---------+
+|  1   | white | 0.78002 |
++------+-------+---------+
+```
+
+### More parameters
+
+If you want to tune parameters to achieve a higher score, you can get instructions about parameters via using
+
+```bash
+python script/preprocess.py --help
+```
+
+```bash
+python script/train.py --help
+```
+
+```bash
+python script/predict.py --help
+```
+
+```bash
+python script/interactivate.py --help
+```
+
+## License
+
+All codes in **Mnemonic Reader** are under [BSD LICENSE](LICENSE).
@@ -0,0 +1,138 @@
+#!/usr/bin/env python3
+# Copyright 2018-present, HKUST-KnowComp.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+"""Model architecture/optimization options for WRMCQA document reader."""
+
+import argparse
+import logging
+
+logger = logging.getLogger(__name__)
+
+# Index of arguments concerning the core model architecture
+MODEL_ARCHITECTURE = {
+    'model_type', 'embedding_dim', 'char_embedding_dim', 'hidden_size', 'char_hidden_size',
+    'doc_layers', 'question_layers', 'rnn_type', 'concat_rnn_layers', 'question_merge',
+    'use_qemb', 'use_exact_match', 'use_pos', 'use_ner', 'use_lemma', 'use_tf', 'hop'
+}
+
+# Index of arguments concerning the model optimizer/training
+MODEL_OPTIMIZER = {
+    'fix_embeddings', 'optimizer', 'learning_rate', 'momentum', 'weight_decay',
+    'rho', 'eps', 'max_len', 'grad_clipping', 'tune_partial', 
+    'rnn_padding', 'dropout_rnn', 'dropout_rnn_output', 'dropout_emb'
+}
+
+
+def str2bool(v):
+    return v.lower() in ('yes', 'true', 't', '1', 'y')
+
+
+def add_model_args(parser):
+    parser.register('type', 'bool', str2bool)
+
+    # Model architecture
+    model = parser.add_argument_group('WRMCQA Reader Model Architecture')
+    model.add_argument('--model-type', type=str, default='rnn',
+                       help='Model architecture type: rnn, r_net, mnemonic')
+    model.add_argument('--embedding-dim', type=int, default=300,
+                       help='Embedding size if embedding_file is not given')
+    model.add_argument('--char-embedding-dim', type=int, default=50,
+                       help='Embedding size if char_embedding_file is not given')
+    model.add_argument('--hidden-size', type=int, default=100,
+                       help='Hidden size of RNN units')
+    model.add_argument('--char-hidden-size', type=int, default=50,
+                       help='Hidden size of char RNN units')
+    model.add_argument('--doc-layers', type=int, default=3,
+                       help='Number of encoding layers for document')
+    model.add_argument('--question-layers', type=int, default=3,
+                       help='Number of encoding layers for question')
+    model.add_argument('--rnn-type', type=str, default='lstm',
+                       help='RNN type: LSTM, GRU, or RNN')
+
+    # Model specific details
+    detail = parser.add_argument_group('WRMCQA Reader Model Details')
+    detail.add_argument('--concat-rnn-layers', type='bool', default=True,
+                        help='Combine hidden states from each encoding layer')
+    detail.add_argument('--question-merge', type=str, default='self_attn',
+                        help='The way of computing the question representation')
+    detail.add_argument('--use-qemb', type='bool', default=True,
+                        help='Whether to use weighted question embeddings')
+    detail.add_argument('--use-exact-match', type='bool', default=True,
+                        help='Whether to use in_question_* features')
+    detail.add_argument('--use-pos', type='bool', default=True,
+                        help='Whether to use pos features')
+    detail.add_argument('--use-ner', type='bool', default=True,
+                        help='Whether to use ner features')
+    detail.add_argument('--use-lemma', type='bool', default=True,
+                        help='Whether to use lemma features')
+    detail.add_argument('--use-tf', type='bool', default=True,
+                        help='Whether to use term frequency features')
+    detail.add_argument('--hop', type=int, default=2,
+                        help='The number of hops for both aligner and the answer pointer in m-reader')
+
+    # Optimization details
+    optim = parser.add_argument_group('WRMCQA Reader Optimization')
+    optim.add_argument('--dropout-emb', type=float, default=0.2,
+                       help='Dropout rate for word embeddings')
+    optim.add_argument('--dropout-rnn', type=float, default=0.2,
+                       help='Dropout rate for RNN states')
+    optim.add_argument('--dropout-rnn-output', type='bool', default=True,
+                       help='Whether to dropout the RNN output')
+    optim.add_argument('--optimizer', type=str, default='adamax',
+                       help='Optimizer: sgd, adamax, adadelta')
+    optim.add_argument('--learning-rate', type=float, default=1.0,
+                       help='Learning rate for sgd, adadelta')
+    optim.add_argument('--grad-clipping', type=float, default=10,
+                       help='Gradient clipping')
+    optim.add_argument('--weight-decay', type=float, default=0,
+                       help='Weight decay factor')
+    optim.add_argument('--momentum', type=float, default=0,
+                       help='Momentum factor')
+    optim.add_argument('--rho', type=float, default=0.95,
+                       help='Rho for adadelta')
+    optim.add_argument('--eps', type=float, default=1e-6,
+                       help='Eps for adadelta')
+    optim.add_argument('--fix-embeddings', type='bool', default=True,
+                       help='Keep word embeddings fixed (use pretrained)')
+    optim.add_argument('--tune-partial', type=int, default=0,
+                       help='Backprop through only the top N question words')
+    optim.add_argument('--rnn-padding', type='bool', default=False,
+                       help='Explicitly account for padding in RNN encoding')
+    optim.add_argument('--max-len', type=int, default=15,
+                       help='The max span allowed during decoding')
+
+
+def get_model_args(args):
+    """Filter args for model ones.
+
+    From a args Namespace, return a new Namespace with *only* the args specific
+    to the model architecture or optimization. (i.e. the ones defined here.)
+    """
+    global MODEL_ARCHITECTURE, MODEL_OPTIMIZER
+    required_args = MODEL_ARCHITECTURE | MODEL_OPTIMIZER
+    arg_values = {k: v for k, v in vars(args).items() if k in required_args}
+    return argparse.Namespace(**arg_values)
+
+
+def override_model_args(old_args, new_args):
+    """Set args to new parameters.
+
+    Decide which model args to keep and which to override when resolving a set
+    of saved args and new args.
+
+    We keep the new optimation, but leave the model architecture alone.
+    """
+    global MODEL_OPTIMIZER
+    old_args, new_args = vars(old_args), vars(new_args)
+    for k in old_args.keys():
+        if k in new_args and old_args[k] != new_args[k]:
+            if k in MODEL_OPTIMIZER:
+                logger.info('Overriding saved %s: %s --> %s' %
+                            (k, old_args[k], new_args[k]))
+                old_args[k] = new_args[k]
+            else:
+                logger.info('Keeping saved %s: %s' % (k, old_args[k]))
+    return argparse.Namespace(**old_args)