Skip to content

Commit 3553379

Browse files
committed
[Update] First commit
0 parents  commit 3553379

20 files changed

+3947
-0
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*.pyc
2+
*.DS_Store
3+
*~
4+
data/
5+
*.tar.gz
6+
*.egg-info

LICENSE

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
BSD License
2+
3+
For MnemonicReader software
4+
5+
Copyright (c) 2018-present, HKUST-KnowComp.
6+
All rights reserved.
7+
8+
Redistribution and use in source and binary forms, with or without modification,
9+
are permitted provided that the following conditions are met:
10+
11+
* Redistributions of source code must retain the above copyright notice, this
12+
list of conditions and the following disclaimer.
13+
14+
* Redistributions in binary form must reproduce the above copyright notice,
15+
this list of conditions and the following disclaimer in the documentation
16+
and/or other materials provided with the distribution.
17+
18+
* Neither name HKUST-KnowComp nor the names of its contributors may be used to
19+
endorse or promote products derived from this software without specific
20+
prior written permission.
21+
22+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
23+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
24+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
25+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
26+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
27+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
28+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
29+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Mnemonic Reader
2+
The Mnemonic Reader is a deep learning model for Machine Comprehension task. You can get details from this [paper](https://arxiv.org/pdf/1705.02798.pdf). It combines advantages of [match-LSTM](https://arxiv.org/pdf/1608.07905), [R-Net](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) and [Document Reader](https://arxiv.org/abs/1704.00051) and utilizes a new unit, the Semantic Fusion Unit (SFU), to achieve state-of-the-art results (at that time).
3+
4+
This model is a [PyTorch](http://pytorch.org/) implementation of Mnemonic Reader. At the same time, a PyTorch implementation of R-Net and a PyTorch implementation of Document Reader are also included to compare with the Mnemonic Reader. Pretrained models are also available in [release](https://github.com/HKUST-KnowComp/MnemonicReader/releases).
5+
6+
This repo belongs to [HKUST-KnowComp](https://github.com/HKUST-KnowComp) and is under the [BSD LICENSE](LICENSE).
7+
8+
Some codes are implemented based on [DrQA](https://github.com/facebookresearch/DrQA).
9+
10+
Please feel free to contact with Xin Liu ([email protected]) if you have any question about this repo.
11+
12+
### Evaluation on SQuAD
13+
14+
| Model | DEV_EM | DEV_F1 |
15+
| ------------------------------------- | ------ | ------ |
16+
| Document Reader (original paper) | 69.5 | 78.8 |
17+
| Document Reader (trained model) | 69.4 | 78.6 |
18+
| R-Net (original paper 1) | 71.1 | 79.5 |
19+
| R-Net (original paper 2) | 72.3 | 80.6 |
20+
| R-Net (trained model) | 70.2 | 79.2 |
21+
| Mnemonic Reader (original paper) | 71.8 | 81.2 |
22+
| Mnemonic Reader + RL (original paper) | 72.1 | 81.6 |
23+
| Mnemonic Reader (trained model) | 72.3 | 81.4 |
24+
25+
![EM_F1](img/EM_F1.png)
26+
27+
### Requirements
28+
29+
* Python >= 3.4
30+
* PyTorch >= 0.31
31+
* spaCy >= 2.0.0
32+
* tqdm
33+
* ujson
34+
* numpy
35+
* prettytable
36+
37+
### Prepare
38+
39+
First of all, you need to download the dataset and pre-trained word vectors.
40+
41+
```bash
42+
mkdir -p data/datasets
43+
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -O data/datasets/SQuAD-train-v1.1.json
44+
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O data/datasets/SQuAD-dev-v1.1.json
45+
```
46+
47+
```bash
48+
mkdir -p data/embeddings
49+
wget http://nlp.stanford.edu/data/glove.840B.300d.zip -O data/embeddings/glove.840B.300d.zip
50+
cd data/embeddings
51+
unzip glove.840B.300d.zip
52+
```
53+
54+
Then, you need to preprocess these data.
55+
56+
```bash
57+
python script/preprocess data/datasets data/datasets --split SQuAD-train-v1.1
58+
python script/preprocess data/datasets data/datasets --split SQuAD-dev-v1.1
59+
```
60+
61+
If you want to use multicores to speed up, you could add `--num-workers 4` in commands.
62+
63+
### Train
64+
65+
There are some parameters to set but default values are ready. If you are not interested in tuning parameters, you can use default values. Just run:
66+
67+
```bash
68+
python script/train.py
69+
```
70+
71+
After several hours, you will get the model in `data/models/`, e.g. `20180416-acc9d06d.mdl` and you can see the log file in `data/models/`, e.g. `20180416-acc9d06d.txt`.
72+
73+
### Predict
74+
75+
To evaluate the model you get, you should complete this part.
76+
77+
```bash
78+
python script/predict.py --model data/models/20180416-acc9d06d.mdl
79+
```
80+
81+
You need to change the model name in the command above.
82+
83+
You will not get results directly but to use the official `evaluate-v1.1.py` in `data/script`.
84+
85+
```bash
86+
python script/evaluate-v1.1.py data/predict/SQuAD-dev-v1.1-20180416-acc9d06d.preds data/datasets/SQuAD-dev-v1.1.json
87+
```
88+
89+
### Interactivate
90+
91+
In order to help those who are interested in QA systems, `script/interactivate.py` provides an easy but good demo.
92+
93+
```bash
94+
python script/interactivate.py --model data/models/20180416-acc9d06d.mdl
95+
```
96+
97+
Then you will drop into an interactive session. It looks like:
98+
99+
```python
100+
* Interactive Module *
101+
102+
* Repo: Mnemonic Reader (https://github.com/HKUST-KnowComp/MnemonicReader)
103+
104+
* Implement based on Facebook's DrQA
105+
106+
>>> process(document, question, candidates=None, top_n=1)
107+
>>> usage()
108+
109+
>>> text = "Mary had a little lamb, whose fleece was white as snow. And everywhere that Mary went the lamb was sure to go."
110+
>>> question = "What color is Mary's lamb?"
111+
>>> process(text, question)
112+
113+
+------+-------+---------+
114+
| Rank | Span | Score |
115+
+------+-------+---------+
116+
| 1 | white | 0.78002 |
117+
+------+-------+---------+
118+
```
119+
120+
### More parameters
121+
122+
If you want to tune parameters to achieve a higher score, you can get instructions about parameters via using
123+
124+
```bash
125+
python script/preprocess.py --help
126+
```
127+
128+
```bash
129+
python script/train.py --help
130+
```
131+
132+
```bash
133+
python script/predict.py --help
134+
```
135+
136+
```bash
137+
python script/interactivate.py --help
138+
```
139+
140+
## License
141+
142+
All codes in **Mnemonic Reader** are under [BSD LICENSE](LICENSE).

config.py

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
#!/usr/bin/env python3
2+
# Copyright 2018-present, HKUST-KnowComp.
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under the license found in the
6+
# LICENSE file in the root directory of this source tree.
7+
"""Model architecture/optimization options for WRMCQA document reader."""
8+
9+
import argparse
10+
import logging
11+
12+
logger = logging.getLogger(__name__)
13+
14+
# Index of arguments concerning the core model architecture
15+
MODEL_ARCHITECTURE = {
16+
'model_type', 'embedding_dim', 'char_embedding_dim', 'hidden_size', 'char_hidden_size',
17+
'doc_layers', 'question_layers', 'rnn_type', 'concat_rnn_layers', 'question_merge',
18+
'use_qemb', 'use_exact_match', 'use_pos', 'use_ner', 'use_lemma', 'use_tf', 'hop'
19+
}
20+
21+
# Index of arguments concerning the model optimizer/training
22+
MODEL_OPTIMIZER = {
23+
'fix_embeddings', 'optimizer', 'learning_rate', 'momentum', 'weight_decay',
24+
'rho', 'eps', 'max_len', 'grad_clipping', 'tune_partial',
25+
'rnn_padding', 'dropout_rnn', 'dropout_rnn_output', 'dropout_emb'
26+
}
27+
28+
29+
def str2bool(v):
30+
return v.lower() in ('yes', 'true', 't', '1', 'y')
31+
32+
33+
def add_model_args(parser):
34+
parser.register('type', 'bool', str2bool)
35+
36+
# Model architecture
37+
model = parser.add_argument_group('WRMCQA Reader Model Architecture')
38+
model.add_argument('--model-type', type=str, default='rnn',
39+
help='Model architecture type: rnn, r_net, mnemonic')
40+
model.add_argument('--embedding-dim', type=int, default=300,
41+
help='Embedding size if embedding_file is not given')
42+
model.add_argument('--char-embedding-dim', type=int, default=50,
43+
help='Embedding size if char_embedding_file is not given')
44+
model.add_argument('--hidden-size', type=int, default=100,
45+
help='Hidden size of RNN units')
46+
model.add_argument('--char-hidden-size', type=int, default=50,
47+
help='Hidden size of char RNN units')
48+
model.add_argument('--doc-layers', type=int, default=3,
49+
help='Number of encoding layers for document')
50+
model.add_argument('--question-layers', type=int, default=3,
51+
help='Number of encoding layers for question')
52+
model.add_argument('--rnn-type', type=str, default='lstm',
53+
help='RNN type: LSTM, GRU, or RNN')
54+
55+
# Model specific details
56+
detail = parser.add_argument_group('WRMCQA Reader Model Details')
57+
detail.add_argument('--concat-rnn-layers', type='bool', default=True,
58+
help='Combine hidden states from each encoding layer')
59+
detail.add_argument('--question-merge', type=str, default='self_attn',
60+
help='The way of computing the question representation')
61+
detail.add_argument('--use-qemb', type='bool', default=True,
62+
help='Whether to use weighted question embeddings')
63+
detail.add_argument('--use-exact-match', type='bool', default=True,
64+
help='Whether to use in_question_* features')
65+
detail.add_argument('--use-pos', type='bool', default=True,
66+
help='Whether to use pos features')
67+
detail.add_argument('--use-ner', type='bool', default=True,
68+
help='Whether to use ner features')
69+
detail.add_argument('--use-lemma', type='bool', default=True,
70+
help='Whether to use lemma features')
71+
detail.add_argument('--use-tf', type='bool', default=True,
72+
help='Whether to use term frequency features')
73+
detail.add_argument('--hop', type=int, default=2,
74+
help='The number of hops for both aligner and the answer pointer in m-reader')
75+
76+
# Optimization details
77+
optim = parser.add_argument_group('WRMCQA Reader Optimization')
78+
optim.add_argument('--dropout-emb', type=float, default=0.2,
79+
help='Dropout rate for word embeddings')
80+
optim.add_argument('--dropout-rnn', type=float, default=0.2,
81+
help='Dropout rate for RNN states')
82+
optim.add_argument('--dropout-rnn-output', type='bool', default=True,
83+
help='Whether to dropout the RNN output')
84+
optim.add_argument('--optimizer', type=str, default='adamax',
85+
help='Optimizer: sgd, adamax, adadelta')
86+
optim.add_argument('--learning-rate', type=float, default=1.0,
87+
help='Learning rate for sgd, adadelta')
88+
optim.add_argument('--grad-clipping', type=float, default=10,
89+
help='Gradient clipping')
90+
optim.add_argument('--weight-decay', type=float, default=0,
91+
help='Weight decay factor')
92+
optim.add_argument('--momentum', type=float, default=0,
93+
help='Momentum factor')
94+
optim.add_argument('--rho', type=float, default=0.95,
95+
help='Rho for adadelta')
96+
optim.add_argument('--eps', type=float, default=1e-6,
97+
help='Eps for adadelta')
98+
optim.add_argument('--fix-embeddings', type='bool', default=True,
99+
help='Keep word embeddings fixed (use pretrained)')
100+
optim.add_argument('--tune-partial', type=int, default=0,
101+
help='Backprop through only the top N question words')
102+
optim.add_argument('--rnn-padding', type='bool', default=False,
103+
help='Explicitly account for padding in RNN encoding')
104+
optim.add_argument('--max-len', type=int, default=15,
105+
help='The max span allowed during decoding')
106+
107+
108+
def get_model_args(args):
109+
"""Filter args for model ones.
110+
111+
From a args Namespace, return a new Namespace with *only* the args specific
112+
to the model architecture or optimization. (i.e. the ones defined here.)
113+
"""
114+
global MODEL_ARCHITECTURE, MODEL_OPTIMIZER
115+
required_args = MODEL_ARCHITECTURE | MODEL_OPTIMIZER
116+
arg_values = {k: v for k, v in vars(args).items() if k in required_args}
117+
return argparse.Namespace(**arg_values)
118+
119+
120+
def override_model_args(old_args, new_args):
121+
"""Set args to new parameters.
122+
123+
Decide which model args to keep and which to override when resolving a set
124+
of saved args and new args.
125+
126+
We keep the new optimation, but leave the model architecture alone.
127+
"""
128+
global MODEL_OPTIMIZER
129+
old_args, new_args = vars(old_args), vars(new_args)
130+
for k in old_args.keys():
131+
if k in new_args and old_args[k] != new_args[k]:
132+
if k in MODEL_OPTIMIZER:
133+
logger.info('Overriding saved %s: %s --> %s' %
134+
(k, old_args[k], new_args[k]))
135+
old_args[k] = new_args[k]
136+
else:
137+
logger.info('Keeping saved %s: %s' % (k, old_args[k]))
138+
return argparse.Namespace(**old_args)

0 commit comments

Comments
 (0)