Skip to content
This repository was archived by the owner on Nov 8, 2022. It is now read-only.

Commit c07fc0a

Browse files
authored
Merge branch 'master' into set_expansion_PR
2 parents 72ac75e + 0a3e3a2 commit c07fc0a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+22527
-565
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ generated
1818
*.hdf5
1919
*.h5
2020
*.html
21-
!server/web_service/visualizer/displacy/*.html
2221
!solutions/set_expansion/ui/templates/*.html
22+
.vscode
23+
!server/web_service/static/*.html
2324
!tests/fixtures/data/server/*.gz
2425
*.log
2526
.idea/
@@ -31,3 +32,4 @@ pylint.txt
3132
flake8.txt
3233
nlp_architect/pipelines/bist-pretrained/*
3334
venv
35+
nlp_architect/api/ner-pretrained/*

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ test_prepare: test_requirements.txt $(ACTIVATE)
3535

3636
test: test_prepare $(ACTIVATE) dev
3737
@. $(ACTIVATE); spacy download en
38+
@. $(ACTIVATE); python -c 'from nlp_architect.api.ner_api import NerApi; NerApi(prompt=False)'
3839
@. $(ACTIVATE); py.test -rs -vv tests
3940

4041
flake: test_prepare

doc/source/api.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,8 @@ these will be placed into a central repository.
8989
nlp_architect.data.babi_dialog.BABI_Dialog
9090
nlp_architect.data.wikimovies.WIKIMOVIES
9191

92+
93+
9294
``nlp_architect.pipelines``
9395
---------------------------
9496
.. py:module:: nlp_architect.pipelines
@@ -103,3 +105,16 @@ NLP pipelines modules using models implemented from ``nlp_architect.models``.
103105
nlp_architect.pipelines.spacy_np_annotator.NPAnnotator
104106
nlp_architect.pipelines.spacy_np_annotator.SpacyNPAnnotator
105107

108+
109+
110+
``nlp_architect.server``
111+
------------------------
112+
.. py:module:: server
113+
114+
.. autosummary::
115+
:toctree: generated/
116+
:nosignatures:
117+
118+
server.serve
119+
server.service
120+

doc/source/assets/bist_service.png

100755100644
-258 KB
Loading

doc/source/assets/ner_service.png

43.1 KB
Loading
-77.1 KB
Loading

doc/source/service.rst

Lines changed: 42 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -28,30 +28,37 @@ Running NLP Architect server
2828
============================
2929
Some of the components, which we provide pre-trained models, are exposed through this server. In order to run the server, a user needs to specify which service, so NLP Archtiect serer will only upload the needed model.
3030

31-
Currently we provide 2 services:
31+
Currently we provide 3 services:
3232

3333
1. `bist` service which provides BIST Dependency parsing
3434
2. `spacy_ner` service which provides Spacy NER annotations.
35+
3. `ner` service which provides NER annotations without Spacy.
3536

36-
To run the server, simply run `serve.py` with the Parameter `--name` as the name of the service you wish to serve.
37-
Once the model is loaded, the server will run on `http://localhost:8080/{service_name}`.
37+
The server code is split into two pieces:
3838

39-
If you wish to use the server's visualization - enter `http://localhost:8080/{service_name}/demo.html`
39+
1. :py:class:`Service <server.service>` which is a representation of each model's API
40+
2. :py:mod:`Server <server.serve>` which handles processing of HTTP requests
41+
42+
To run the server, from the root directory simply run ``hug -p 8080 -f server/serve.py``, the server will run on `http://localhost:8080`.
43+
44+
If you wish to use the server's visualization - enter `http://localhost:8080`
4045

4146
Otherwise the expected Request for the server is the following:
4247

4348
.. code:: json
4449
45-
{"docs":
46-
[
47-
{"id": 1,
48-
"doc": "Time flies like an arrow. fruit flies like a banana."},
49-
{"id": 2,
50-
"doc": "the horse passed the barn fell"},
51-
{"id": 3,
52-
"doc": "the old man the boat"}
53-
]
54-
}
50+
{
51+
"model_name": "ner" | "spacy_ner" | "bist",
52+
"docs":
53+
[
54+
{"id": 1,
55+
"doc": "Time flies like an arrow. fruit flies like a banana."},
56+
{"id": 2,
57+
"doc": "the horse passed the barn fell"},
58+
{"id": 3,
59+
"doc": "the old man the boat"}
60+
]
61+
}
5562
5663
Request Headers
5764
---------------
@@ -64,49 +71,53 @@ The server supports 2 types of Responses (see `Annotation Structure Types - Serv
6471

6572
Examples for running NLP Architect server
6673
=========================================
67-
We currently support only 2 services:
74+
We currently support 3 services:
6875

6976
- BIST parser - Core NLP models annotation structure
7077

71-
.. code:: python
72-
73-
python server/serve.py --name bist
74-
75-
Once the server is up and running you can go to `http://localhost:8080/bist/demo.html`
78+
Once the server is up and running you can go to `http://localhost:8080`
7679
and check out a few test sentences, or you can send a POST request (as described above)
77-
to `http://localhost:8080/bist`, and receive `CoreNLPDoc` annotation structure response.
80+
to `http://localhost:8080/inference`, and receive `CoreNLPDoc` annotation structure response.
7881

7982
.. image :: assets/bist_service.png
8083
81-
- Spacy NER - High-level models annotation structure
82-
83-
.. code:: python
84-
85-
python server/serve.py --name spacy_ner
84+
- Spacy NER, NER - High-level models annotation structure
8685

87-
Once the server is up and running you can go to `http://localhost:8080/spacy_ner/demo.html`
86+
Once the server is up and running you can go to `http://localhost:8080`
8887
and check out a few test sentences, or you can send a Post request (as described above)
89-
to `http://localhost:8080/spacy_ner`, and receive `HighLevelDoc` annotation structure response.
88+
to `http://localhost:8080/inference`, and receive `HighLevelDoc` annotation structure response.
89+
90+
Spacy NER:
9091

9192
.. image :: assets/spacy_ner_service.png
9293
94+
NER:
95+
96+
.. image :: assets/ner_service.png
97+
9398
You can also take a look at the tests (tests/nlp_architect_server) to see more examples.
9499

95100
Example CURL request
96101
--------------------
97102

103+
Running `ner` model
104+
105+
.. code:: json
106+
107+
curl -i -H "Response-Format:json" -H "Content-Type:application/json" -d '{"model_name": "ner", "docs": [{"id": 1,"doc": "Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley."}]}' http://{localhost_ip}:8080/inference
108+
98109
Running `spacy_ner` model
99110

100111
.. code:: json
101112
102-
curl -i -H "Response-Format:json" -H "Content-Type:application/json" -d '{"docs": [{"id": 1,"doc": "Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley."}]}' http://{localhost_ip}:8080/spacy_ner
113+
curl -i -H "Response-Format:json" -H "Content-Type:application/json" -d '{"model_name": "spacy_ner", "docs": [{"id": 1,"doc": "Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley."}]}' http://{localhost_ip}:8080/inference
103114
104115
105116
Running `bist` model
106117

107118
.. code:: json
108119
109-
curl -i -H "Response-Format:json" -H "Content-Type:application/json" -d '{"docs":[{"id": 1,"doc": "Time flies like an arrow. fruit flies like a banana."},{"id": 2,"doc": "the horse passed the barn fell"},{"id": 3,"doc": "the old man the boat"}]}' http://10.13.133.120:8080/bist
120+
curl -i -H "Response-Format:json" -H "Content-Type:application/json" -d '{"model_name": "bist", "docs":[{"id": 1,"doc": "Time flies like an arrow. fruit flies like a banana."},{"id": 2,"doc": "the horse passed the barn fell"},{"id": 3,"doc": "the old man the boat"}]}' http://{localhost_ip}:8080/inference
110121
111122
112123
Annotation Structure Types - Server Responses
@@ -173,7 +184,7 @@ In order to add a new service to the server you need to go over 3 steps:
173184
174185
1. Choose the type of your service: Core NLP models or High-level models
175186
176-
2. Create API for your service. Create the file under `nlp_architect/api/abstract_api` folder. Make sure your class inherits from `AbstractApi` (`from nlp_architect.api.abstract_api import AbstractApi`) and implements all its methods. Notice that your `inference` class_method must return either "CoreNLPDoc" or "HighLevelDoc".
187+
2. Create API for your service. Create the file under `nlp_architect/api/abstract_api` folder. Make sure your class inherits from :py:class`AbstractApi <nlp_architect.api.abstract_api>` and implements all its methods. Notice that your `inference` class_method must return either "CoreNLPDoc" or "HighLevelDoc".
177188
178189
3. Add new service to `services.json` in the following template:
179190

licenses/ipdb-license.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
2+
https://github.com/gotcha/ipdb/blob/master/COPYING.txt
3+
4+
5+
6+
Copyright (c) 2007-2016 ipdb development team
7+
8+
All rights reserved.
9+
10+
Redistribution and use in source and binary forms, with or without
11+
modification, are permitted provided that the following conditions are
12+
met:
13+
14+
Redistributions of source code must retain the above copyright notice,
15+
this list of conditions and the following disclaimer.
16+
17+
Redistributions in binary form must reproduce the above copyright notice,
18+
this list of conditions and the following disclaimer in the documentation
19+
and/or other materials provided with the distribution.
20+
21+
Neither the name of the ipdb Development Team nor the names of its
22+
contributors may be used to endorse or promote products derived from this
23+
software without specific prior written permission.
24+
25+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
26+
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
27+
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
29+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
30+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
31+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
32+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
33+
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
34+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
35+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

nlp_architect/api/ner_api.py

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# ******************************************************************************
2+
# Copyright 2017-2018 Intel Corporation
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# ******************************************************************************
16+
import pickle
17+
18+
from os import path, makedirs, sys
19+
from keras.preprocessing.sequence import pad_sequences
20+
import numpy as np
21+
from nlp_architect.api.abstract_api import AbstractApi
22+
from nlp_architect.utils.io import download_unlicensed_file
23+
from nlp_architect.models.ner_crf import NERCRF
24+
25+
from nlp_architect.utils.text import SpacyInstance
26+
27+
nlp = SpacyInstance(disable=['tagger', 'ner', 'parser', 'vectors', 'textcat'])
28+
29+
30+
class NerApi(AbstractApi):
31+
"""
32+
Ner model API
33+
"""
34+
dir = path.dirname(path.realpath(__file__))
35+
pretrained_model = path.join(dir, 'ner-pretrained', 'model.h5')
36+
pretrained_model_info = path.join(dir, 'ner-pretrained', 'model_info.dat')
37+
38+
def __init__(self, ner_model=None, prompt=True):
39+
self.model = None
40+
self.model_info = None
41+
self.model_path = NerApi.pretrained_model
42+
self.model_info_path = NerApi.pretrained_model_info
43+
self._download_pretrained_model(prompt)
44+
45+
def encode_word(self, word):
46+
return self.model_info['word_vocab'].get(word, 1.0)
47+
48+
def encode_word_chars(self, word):
49+
return [self.model_info['char_vocab'].get(c, 1.0) for c in word]
50+
51+
def encode_input(self, text_arr):
52+
sentence = []
53+
sentence_chars = []
54+
for word in text_arr:
55+
sentence.append(self.encode_word(word))
56+
sentence_chars.append(self.encode_word_chars(word))
57+
encoded_sentence = pad_sequences(
58+
[np.asarray(sentence)], maxlen=self.model_info['sentence_len'])
59+
chars_padded = pad_sequences(
60+
sentence_chars, maxlen=self.model_info['word_len'])
61+
if self.model_info['sentence_len'] - chars_padded.shape[0] > 0:
62+
chars_padded = np.concatenate((np.zeros(
63+
(self.model_info['sentence_len'] - chars_padded.shape[0],
64+
self.model_info['word_len'])), chars_padded))
65+
encoded_chars = chars_padded.reshape(1, self.model_info['sentence_len'],
66+
self.model_info['word_len'])
67+
return encoded_sentence, encoded_chars
68+
69+
def _prompt(self):
70+
response = input('\nTo download \'{}\', please enter YES: '.
71+
format('ner'))
72+
res = response.lower().strip()
73+
if res == "yes" or (len(res) == 1 and res == 'y'):
74+
print('Downloading {}...'.format('ner'))
75+
responded_yes = True
76+
else:
77+
print('Download declined. Response received {} != YES|Y. '.format(res))
78+
responded_yes = False
79+
return responded_yes
80+
81+
def _download_pretrained_model(self, prompt=True):
82+
"""Downloads the pre-trained BIST model if non-existent."""
83+
dir_path = path.join(self.dir, 'ner-pretrained')
84+
if not path.isfile(path.join(dir_path, 'model.h5')):
85+
print('The pre-trained models to be downloaded for the NER dataset'
86+
'are licensed under Apache 2.0. By downloading, you accept the terms'
87+
'and conditions provided by the license')
88+
makedirs(dir_path, exist_ok=True)
89+
if prompt is True:
90+
agreed = self._prompt()
91+
if agreed is False:
92+
sys.exit(0)
93+
download_unlicensed_file('http://nervana-modelzoo.s3.amazonaws.com/NLP/NER/',
94+
'model.h5', self.model_path)
95+
download_unlicensed_file('http://nervana-modelzoo.s3.amazonaws.com/NLP/NER/',
96+
'model_info.dat', self.model_info_path)
97+
print('Done.')
98+
99+
def load_model(self):
100+
with open(self.model_info_path, 'rb') as fp:
101+
self.model_info = pickle.load(fp)
102+
self.model = NERCRF()
103+
self.model.build(
104+
self.model_info['sentence_len'],
105+
self.model_info['word_len'],
106+
self.model_info['num_of_labels'],
107+
self.model_info['word_vocab'],
108+
self.model_info['vocab_size'],
109+
self.model_info['char_vocab_size'],
110+
word_embedding_dims=self.model_info['word_embedding_dims'],
111+
char_embedding_dims=self.model_info['char_embedding_dims'],
112+
word_lstm_dims=self.model_info['word_lstm_dims'],
113+
tagger_lstm_dims=self.model_info['tagger_lstm_dims'],
114+
dropout=self.model_info['dropout'],
115+
external_embedding_model=self.model_info[
116+
'external_embedding_model'])
117+
self.model.load(self.model_path)
118+
119+
def pretty_print(self, text, tags):
120+
tags_str = [self.model_info['labels_id_to_word']
121+
.get(t, None) for t in tags[0]][-len(text):]
122+
mapped = [
123+
{'index': idx, 'word': el, 'label': tags_str[idx]} for idx, el in enumerate(text)
124+
]
125+
counter = 0
126+
ents = []
127+
spans = []
128+
for obj in mapped:
129+
if(obj['label'] != 'O'):
130+
spans.append({
131+
'start': counter,
132+
'end': (counter + len(obj['word'])),
133+
'type': obj['label']
134+
})
135+
counter += len(obj['word']) + 1
136+
ents = dict((obj['type'].lower(), obj) for obj in spans).keys()
137+
ret = {}
138+
ret['doc_text'] = ' '.join(text)
139+
ret['annotation_set'] = list(ents)
140+
ret['spans'] = spans
141+
ret['title'] = 'None'
142+
return {"doc": ret, 'type': 'high_level'}
143+
144+
def process_text(self, text):
145+
input_text = ' '.join(text.strip().split())
146+
return nlp.tokenize(input_text)
147+
148+
def inference(self, doc):
149+
text_arr = self.process_text(doc)
150+
words, chars = self.encode_input(text_arr)
151+
tags = self.model.predict([words, chars])
152+
tags = tags.argmax(2)
153+
return self.pretty_print(text_arr, tags)

0 commit comments

Comments
 (0)