Skip to content
This repository was archived by the owner on Nov 8, 2022. It is now read-only.

Commit c8c2955

Browse files
author
Steven Robertson
committed
Merge branch 'master' into steven/fix_sysinstall
2 parents e178d68 + 8784c11 commit c8c2955

34 files changed

+3272
-12
lines changed

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@
1212
.styleenv
1313
.coverage
1414
build
15-
*.gz
1615
generated
1716
*.ropeproject
1817
*.cubin
1918
*.hdf5
2019
*.h5
2120
*.html
21+
!solutions/set_expansion/ui/templates/*.html
2222
.vscode
2323
!server/web_service/static/*.html
2424
!tests/fixtures/data/server/*.gz
@@ -32,4 +32,5 @@ pylint.html
3232
pylint.txt
3333
flake8.txt
3434
nlp_architect/pipelines/bist-pretrained/*
35-
nlp_architect/api/ner-pretrained/*
35+
venv
36+
nlp_architect/api/ner-pretrained/*

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
# limitations under the License.
1515
# ******************************************************************************
1616

17-
FLAKE8_CHECK_DIRS := examples nlp_architect/* server tests
18-
PYLINT_CHECK_DIRS := examples nlp_architect server tests setup
17+
FLAKE8_CHECK_DIRS := examples nlp_architect/* server tests solutions
18+
PYLINT_CHECK_DIRS := examples nlp_architect server tests setup solutions
1919
DOC_DIR := doc
2020
DOC_PUB_RELEASE_PATH := $(DOC_PUB_PATH)/$(RELEASE)
2121

44.9 MB
Binary file not shown.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
Data:
3+
==========
4+
enwiki-20171201_subset.txt is a subset of Wikimedia English data dumps:
5+
6+
https://meta.wikimedia.org/wiki/Data_dumps
7+
https://dumps.wikimedia.org/enwiki/
8+
9+
10+
11+
License:
12+
==========
13+
Creative Commons Attribution-Share-Alike 3.0 License
14+
https://creativecommons.org/licenses/by-sa/3.0/

doc/source/api.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ to train the model weights, perform inference, and save/load the model.
4949
nlp_architect.models.bist_parser.BISTModel
5050
nlp_architect.models.memn2n_dialogue.MemN2N_Dialog
5151
nlp_architect.models.kvmemn2n.KVMemN2N
52+
nlp_architect.models.supervised_sentiment.simple_lstm
53+
nlp_architect.models.supervised_sentiment.one_hot_cnn
5254

5355

5456
``nlp_architect.layers``
@@ -88,7 +90,7 @@ these will be placed into a central repository.
8890
nlp_architect.data.sequential_tagging.SequentialTaggingDataset
8991
nlp_architect.data.babi_dialog.BABI_Dialog
9092
nlp_architect.data.wikimovies.WIKIMOVIES
91-
93+
nlp_architect.data.amazon_reviews.Amazon_Reviews
9294

9395

9496
``nlp_architect.pipelines``
@@ -117,4 +119,3 @@ NLP pipelines modules using models implemented from ``nlp_architect.models``.
117119

118120
server.serve
119121
server.service
120-
120 KB
Loading
29.8 KB
Loading

doc/source/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ The library contains state-of-art and novel NLP and NLU models in a varity of to
5757
- NER and NE expansion
5858
- Text chunking
5959
- Reading comprehension
60+
- Supervised sentiment analysis
6061

6162

6263
Deep Learning frameworks
@@ -115,6 +116,7 @@ on this project, please see the :doc:`developer guide <developer_guide>`.
115116
bist_parser.rst
116117
word_sense.rst
117118
np2vec.rst
119+
supervised_sentiment.rst
118120
tcn.rst
119121

120122
.. toctree::
@@ -126,6 +128,13 @@ on this project, please see the :doc:`developer guide <developer_guide>`.
126128
memn2n.rst
127129
kvmemn2n.rst
128130

131+
.. toctree::
132+
:hidden:
133+
:maxdepth: 1
134+
:caption: Solutions
135+
136+
term_set_expansion.rst
137+
129138
.. toctree::
130139
:hidden:
131140
:maxdepth: 1

doc/source/overview.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ The library contains state-of-art and novel NLP and NLU models in a varity of to
5353
- NER and NE expansion
5454
- Text chunking
5555
- Reading comprehension
56+
- Supervised sentiment analysis
5657

5758
Deep Learning frameworks
5859
````````````````````````
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
.. ---------------------------------------------------------------------------
2+
.. Copyright 2017-2018 Intel Corporation
3+
..
4+
.. Licensed under the Apache License, Version 2.0 (the "License");
5+
.. you may not use this file except in compliance with the License.
6+
.. You may obtain a copy of the License at
7+
..
8+
.. http://www.apache.org/licenses/LICENSE-2.0
9+
..
10+
.. Unless required by applicable law or agreed to in writing, software
11+
.. distributed under the License is distributed on an "AS IS" BASIS,
12+
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
.. See the License for the specific language governing permissions and
14+
.. limitations under the License.
15+
.. ---------------------------------------------------------------------------
16+
17+
Supervised Sentiment
18+
####################
19+
20+
Overview
21+
========
22+
23+
This is a set of models which are examples of supervised implementations for sentiment analysis.
24+
The larger idea behind these models is to allow ensembling with other supervised or unsupervised models.
25+
26+
Files
27+
=====
28+
29+
- **nlp_architect/models/supervised_sentiment.py**: Sentiment analysis models - currently an LSTM and a one-hot CNN
30+
- **nlp_architect/data/amazon_reviews.py**: Code which will download and process the Amazon datasets described below
31+
- **nlp_architect/utils/ensembler.py**: Contains the ensembling algorithm(s)
32+
- **example_ensemble.py**: An example of how the sentiment models can be trained and ensembled.
33+
- **optimize_example.py**: An example of using an hyperparameter optimizer with the simple LSTM model.
34+
35+
36+
Models
37+
======
38+
Two models are shown as classification examples. Additional models can be added as desired.
39+
40+
Bi-directional LSTM
41+
-------------------
42+
A simple bidirectional lstm with one fully connected layer. The number of vocab features, dense output size, and document input length, should be determined in the data preprocessing steps. The user can then change the size of the lstm hidden layer, and the recurrent dropout rate.
43+
44+
Temporal CNN
45+
------------
46+
As defined in "Text Understanding from Scratch" by Zhang, LeCun 2015 https://arxiv.org/pdf/1502.01710v4.pdf this model is a series of 1D CNNs, with a maxpooling and fully connected layers. The frame sizes may either be large or small.
47+
48+
49+
Datasets
50+
========
51+
The dataset in this example is the Amazon Reviews dataset, though other datasets can be easily substituted.
52+
The Amazon review dataset(s) should be downloaded from http://jmcauley.ucsd.edu/data/amazon/. These are `*.json.gzip` files which should be unzipped. The terms and conditions of the data set license apply. Intel does not grant any rights to the data files.
53+
For best results, a medium sized dataset should be chosen though the algorithms will work on larger and smaller datasets as well. For experimentation I chose the Movie and TV reviews.
54+
Only the "overall", "reviewText", and "summary" columns of the review dataset will be retained. The "overall" is the overall rating in terms of stars - this is transformed into a rating where currently 4-5 stars is a positive review, 3 is neutral, and 1-2 stars is a negative review.
55+
The "summary" or title of the review is concatenated with the review text and subsequently cleaned.
56+
57+
The Amazon Review Dataset was published in the following papers:
58+
59+
Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
60+
R. He, J. McAuley
61+
WWW, 2016
62+
http://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf
63+
64+
Image-based recommendations on styles and substitutes
65+
J. McAuley, C. Targett, J. Shi, A. van den Hengel
66+
SIGIR, 2015
67+
http://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf
68+
69+
70+
Running Modalities
71+
==================
72+
73+
Ensemble Train/Test
74+
-------------------
75+
Currently, the pipeline shows a full train/test/ensemble cycle. The main pipeline can be run with the following command:
76+
```
77+
python example_ensemble.py --file_path ./reviews_Movies_and_TV.json/
78+
```
79+
At the conclusion of training a final confusion matrix will be displayed.
80+
81+
Hyperparameter optimization
82+
---------------------------
83+
An example of hyperparameter optimization is given using the python package hyperopt which uses a Tree of Parzen estimator to optimize the simple bi-lstm algorithm. To run this example the following command can be utilized:
84+
```
85+
python optimize_example.py --file_path ./reviews_Movies_and_TV.json/ --new_trials 50 --output_file ./data/optimize_output.pkl
86+
```
87+
The file will output a result of each of the trial attempts to the specified pickle file.

0 commit comments

Comments
 (0)