IntelLabs
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 2 deletions b/‎.gitignore‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎Makefile‎
Lines changed: 2 additions & 2 deletions b/‎Makefile‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎datasets/wikipedia/enwiki-20171201_subset.txt.gz‎
44.9 MB b/‎datasets/wikipedia/enwiki-20171201_subset.txt.gz‎
44.9 MB
diff --git a/‎datasets/wikipedia/enwiki-20171201_subset_license.txt‎
Lines changed: 14 additions & 0 deletions b/‎datasets/wikipedia/enwiki-20171201_subset_license.txt‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎doc/source/api.rst‎
Lines changed: 3 additions & 2 deletions b/‎doc/source/api.rst‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎doc/source/assets/expansion_demo.png‎
120 KB b/‎doc/source/assets/expansion_demo.png‎
120 KB
diff --git a/‎doc/source/assets/expansion_flow.png‎
29.8 KB b/‎doc/source/assets/expansion_flow.png‎
29.8 KB
diff --git a/‎doc/source/index.rst‎
Lines changed: 9 additions & 0 deletions b/‎doc/source/index.rst‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎doc/source/overview.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/source/overview.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/source/supervised_sentiment.rst‎
Lines changed: 87 additions & 0 deletions b/‎doc/source/supervised_sentiment.rst‎
Lines changed: 87 additions & 0 deletions
@@ -12,13 +12,13 @@
 .styleenv
 .coverage
 build
-*.gz
 generated
 *.ropeproject
 *.cubin
 *.hdf5
 *.h5
 *.html
+!solutions/set_expansion/ui/templates/*.html
 .vscode
 !server/web_service/static/*.html
 !tests/fixtures/data/server/*.gz
@@ -32,4 +32,5 @@ pylint.html
 pylint.txt
 flake8.txt
 nlp_architect/pipelines/bist-pretrained/*
-nlp_architect/api/ner-pretrained/*
+venv
+nlp_architect/api/ner-pretrained/*
@@ -14,8 +14,8 @@
 # limitations under the License.
 # ******************************************************************************
 
-FLAKE8_CHECK_DIRS := examples nlp_architect/* server tests
-PYLINT_CHECK_DIRS := examples nlp_architect server tests setup
+FLAKE8_CHECK_DIRS := examples nlp_architect/* server tests solutions
+PYLINT_CHECK_DIRS := examples nlp_architect server tests setup solutions
 DOC_DIR := doc
 DOC_PUB_RELEASE_PATH := $(DOC_PUB_PATH)/$(RELEASE)
 
 
@@ -0,0 +1,14 @@
+
+Data:
+==========
+enwiki-20171201_subset.txt is a subset of Wikimedia English data dumps:
+
+https://meta.wikimedia.org/wiki/Data_dumps
+https://dumps.wikimedia.org/enwiki/
+
+
+
+License: 
+==========
+Creative Commons Attribution-Share-Alike 3.0 License
+https://creativecommons.org/licenses/by-sa/3.0/
@@ -49,6 +49,8 @@ to train the model weights, perform inference, and save/load the model.
    nlp_architect.models.bist_parser.BISTModel
    nlp_architect.models.memn2n_dialogue.MemN2N_Dialog
    nlp_architect.models.kvmemn2n.KVMemN2N
+   nlp_architect.models.supervised_sentiment.simple_lstm
+   nlp_architect.models.supervised_sentiment.one_hot_cnn
 
 
 ``nlp_architect.layers``
@@ -88,7 +90,7 @@ these will be placed into a central repository.
     nlp_architect.data.sequential_tagging.SequentialTaggingDataset
     nlp_architect.data.babi_dialog.BABI_Dialog
     nlp_architect.data.wikimovies.WIKIMOVIES
-
+    nlp_architect.data.amazon_reviews.Amazon_Reviews
 
 
 ``nlp_architect.pipelines``
@@ -117,4 +119,3 @@ NLP pipelines modules using models implemented from ``nlp_architect.models``.
 
     server.serve
     server.service
-
 
@@ -57,6 +57,7 @@ The library contains state-of-art and novel NLP and NLU models in a varity of to
 - NER and NE expansion
 - Text chunking
 - Reading comprehension
+- Supervised sentiment analysis
 
 
 Deep Learning frameworks
@@ -115,6 +116,7 @@ on this project, please see the :doc:`developer guide <developer_guide>`.
    bist_parser.rst
    word_sense.rst
    np2vec.rst
+   supervised_sentiment.rst
    tcn.rst
 
 .. toctree::
@@ -126,6 +128,13 @@ on this project, please see the :doc:`developer guide <developer_guide>`.
    memn2n.rst
    kvmemn2n.rst
 
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+   :caption: Solutions
+
+   term_set_expansion.rst
+
 .. toctree::
    :hidden:
    :maxdepth: 1
 
@@ -53,6 +53,7 @@ The library contains state-of-art and novel NLP and NLU models in a varity of to
 - NER and NE expansion
 - Text chunking
 - Reading comprehension
+- Supervised sentiment analysis
 
 Deep Learning frameworks
 ````````````````````````
 
@@ -0,0 +1,87 @@
+.. ---------------------------------------------------------------------------
+.. Copyright 2017-2018 Intel Corporation
+..
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+..
+..      http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. ---------------------------------------------------------------------------
+
+Supervised Sentiment
+####################
+
+Overview
+========
+
+This is a set of models which are examples of supervised implementations for sentiment analysis.
+The larger idea behind these models is to allow ensembling with other supervised or unsupervised models.
+
+Files
+=====
+
+- **nlp_architect/models/supervised_sentiment.py**: Sentiment analysis models - currently an LSTM and a one-hot CNN
+- **nlp_architect/data/amazon_reviews.py**: Code which will download and process the Amazon datasets described below
+- **nlp_architect/utils/ensembler.py**: Contains the ensembling algorithm(s)
+- **example_ensemble.py**: An example of how the sentiment models can be trained and ensembled.
+- **optimize_example.py**: An example of using an hyperparameter optimizer with the simple LSTM model.
+
+
+Models
+======
+Two models are shown as classification examples. Additional models can be added as desired.
+
+Bi-directional LSTM
+-------------------
+A simple bidirectional lstm with one fully connected layer. The number of vocab features, dense output size, and document input length, should be determined in the data preprocessing steps. The user can then change the size of the lstm hidden layer, and the recurrent dropout rate.
+
+Temporal CNN
+------------
+As defined in "Text Understanding from Scratch" by Zhang, LeCun 2015 https://arxiv.org/pdf/1502.01710v4.pdf this model is a series of 1D CNNs, with a maxpooling and fully connected layers. The frame sizes may either be large or small.
+
+
+Datasets
+========
+The dataset in this example is the Amazon Reviews dataset, though other datasets can be easily substituted.
+The Amazon review dataset(s) should be downloaded from http://jmcauley.ucsd.edu/data/amazon/. These are `*.json.gzip` files which should be unzipped. The terms and conditions of the data set license apply. Intel does not grant any rights to the data files.
+For best results, a medium sized dataset should be chosen though the algorithms will work on larger and smaller datasets as well. For experimentation I chose the Movie and TV reviews.
+Only the "overall", "reviewText", and "summary" columns of the review dataset will be retained. The "overall" is the overall rating in terms of stars - this is transformed into a rating where currently 4-5 stars is a positive review, 3 is neutral, and 1-2 stars is a negative review.
+The "summary" or title of the review is concatenated with the review text and subsequently cleaned.
+
+The Amazon Review Dataset was published in the following papers:
+
+Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
+R. He, J. McAuley
+WWW, 2016
+http://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf
+
+Image-based recommendations on styles and substitutes
+J. McAuley, C. Targett, J. Shi, A. van den Hengel
+SIGIR, 2015
+http://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf
+
+
+Running Modalities
+==================
+
+Ensemble Train/Test
+-------------------
+Currently, the pipeline shows a full train/test/ensemble cycle. The main pipeline can be run with the following command:
+```
+ python example_ensemble.py --file_path ./reviews_Movies_and_TV.json/
+```
+At the conclusion of training a final confusion matrix will be displayed.
+
+Hyperparameter optimization
+---------------------------
+An example of hyperparameter optimization is given using the python package hyperopt which uses a Tree of Parzen estimator to optimize the simple bi-lstm algorithm. To run this example the following command can be utilized:
+```
+ python optimize_example.py --file_path ./reviews_Movies_and_TV.json/ --new_trials 50 --output_file ./data/optimize_output.pkl
+```
+The file will output a result of each of the trial attempts to the specified pickle file.