intel
diff --git a/‎.ci/deps.sh‎
Lines changed: 6 additions & 0 deletions b/‎.ci/deps.sh‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorials/dataflows/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/tutorials/dataflows/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorials/dataflows/nlp.rst‎
Lines changed: 95 additions & 0 deletions b/‎docs/tutorials/dataflows/nlp.rst‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎examples/nlp/accuracy.sh‎
Lines changed: 15 additions & 0 deletions b/‎examples/nlp/accuracy.sh‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎examples/nlp/create_dataflow.sh‎
Lines changed: 9 additions & 0 deletions b/‎examples/nlp/create_dataflow.sh‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎examples/nlp/dataflow_diagram.sh‎
Lines changed: 1 addition & 0 deletions b/‎examples/nlp/dataflow_diagram.sh‎
Lines changed: 1 addition & 0 deletions
@@ -176,6 +176,12 @@ if [[ "x${PLUGIN}" == "xoperations/deploy" ]]; then
   python -m pip install -U -e "./feature/git"
 fi
 
+if [[ "x${PLUGIN}" == "xoperations/nlp" ]]; then
+  conda install -y -c conda-forge spacy
+  python -m spacy download en_core_web_sm
+  python -m pip install -U -e "./model/tensorflow"
+fi
+
 if [ "x${PLUGIN}" = "xexamples/shouldi" ]; then
   python -m pip install -U -e "./feature/git"
 fi
 
@@ -6,6 +6,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 ### Added
+- Tutorial for using NLP operations with models
 - Operations plugin for NLP
 - Support for default value in a Definition
 - Transformers Question Answering model
 
@@ -9,3 +9,4 @@ Here we have some examples to better understand the DFFML DataFlows.
 
     locking
     io
+    nlp
@@ -0,0 +1,95 @@
+Using NLP Operations
+====================
+
+This example will show you how to use DFFML operations to clean text data and train a model using DFFML cli.
+
+DFFML offers several :ref:`plugin_models`. For this example
+we will be using the tensorflow DNNClassifier model
+(:ref:`plugin_model_dffml_model_tensorflow_tfdnnc`) which is in the ``dffml-model-tensorflow`` package.
+
+We will use two operations :ref:`plugin_operation_dffml_operations_nlp_remove_stopwords` and :ref:`plugin_operation_dffml_operations_nlp_get_embedding`.
+Internally, both of these operations use `spacy <https://spacy.io/usage/spacy-101>`_ functions.
+
+To install DNNClassifier model and the above mentioned operations run:
+
+.. code-block:: console
+
+    $ pip install -U dffml-model-tensorflow dffml-operations-nlp
+
+Operation `remove_stopwords` cleans the text by removing most commanly used words which give the text little or no information eg. but, or, yet, it, is, am, etc.
+These words are called `StopWords`. 
+Operation `get_embedding` maps the tokens in the text to their corresponding word-vectors. Here we will use embeddings from `en_core_web_sm` spacy model.
+You can use other models like `en_core_web_md`, `en_core_web_lg` for better results but these are bigger in size and may take a while to download.
+
+Let's first download the `en_core_web_sm` model.
+
+.. code-block:: console
+
+    $ python -m spacy download en_core_web_sm
+
+Create training data:
+
+.. literalinclude:: /../examples/nlp/train_data.sh
+
+Now we will create a dataflow to describe how the text feature (`sentence`) will be processed.
+
+.. literalinclude:: /../examples/nlp/create_dataflow.sh
+
+Operation `get_embedding` takes `pad_token` as input (here `<PAD>`) to append to sentences of length smaller
+than `max_len` (here 10). A sentence which has length greater than `max_len` is truncated to have length equal to `max_len`.
+
+To visualize the dataflow run:
+
+.. literalinclude:: /../examples/nlp/dataflow_diagram.sh
+
+Copy and pasting the output of the above code into the
+`mermaidjs live editor <https://mermaidjs.github.io/mermaid-live-editor>`_
+results in the graph.
+
+.. image:: /.. /examples/nlp/dataflow_diagram.svg
+
+We can now use this dataflow to preprocess the data and make it ready to be fed into model:
+
+.. literalinclude:: /../examples/nlp/train.sh
+
+As shown in the above command, a single input feature to model (here embedding) is of shape `(1, max_len, size_of_embedding)`.
+Here we have taken `max_len` as 10 and the embedding size of `en_core_web_sm` is 96. So the resulting size of one input feature
+is (1,10,96).
+
+Assess accuracy:
+
+.. literalinclude:: /../examples/nlp/accuracy.sh
+
+The output is:
+
+.. code-block:: console
+
+    0.5
+
+Create test data:
+
+.. literalinclude:: /../examples/nlp/test_data.sh
+
+
+Make prediction on test data:
+
+.. literalinclude:: /../examples/nlp/predict.sh
+
+The output is:
+
+.. code-block:: console
+
+            Key:    0
+                                                       Record Features
+    +------------------------------------------------------------------------------------------------------------------------------+
+    |            sentence           |                                       Cats play a lot                                        |
+    +------------------------------------------------------------------------------------------------------------------------------+
+    |           embedding           |                  (0.32292864, 4.358501, 3.2268033, 1.87990 ... (length:10)                   |
+    +------------------------------------------------------------------------------------------------------------------------------+
+
+                                                            Prediction
+    +------------------------------------------------------------------------------------------------------------------------------+
+    |                                                          sentiment                                                           |
+    +------------------------------------------------------------------------------------------------------------------------------+
+    |           Value:  1           |                               Confidence:   0.5122595429420471                               |
+    +------------------------------------------------------------------------------------------------------------------------------+
@@ -0,0 +1,15 @@
+dffml accuracy \
+    -model tfdnnc \
+    -model-batchsize 100 \
+    -model-hidden 5 2 \
+    -model-clstype int \
+    -model-predict sentiment:int:1 \
+    -model-classifications 0 1 \
+    -model-directory tempdir \
+    -model-features embedding:float:[1,10,96] \
+    -sources text=df \
+    -source-text-dataflow nlp_ops_dataflow.json \
+    -source-text-features sentence:str:1 \
+    -source-text-source csv \
+    -source-text-source-filename train_data.csv \
+    -log debug
@@ -0,0 +1,9 @@
+dffml dataflow create get_single remove_stopwords get_embedding \
+    -inputs '["embedding"]'=get_single_spec "en_core_web_sm"=spacy_model_name_def "<PAD>"=pad_token_def 10=max_len_def \
+    -flow \
+      '[{"seed": ["sentence"]}]'=remove_stopwords.inputs.text \
+      '[{"seed": ["spacy_model_name_def"]}]'=get_embedding.inputs.spacy_model \
+      '[{"seed": ["pad_token_def"]}]'=get_embedding.inputs.pad_token \
+      '[{"seed": ["max_len_def"]}]'=get_embedding.inputs.max_len \
+      '[{"remove_stopwords": "result"}]'=get_embedding.inputs.text |
+    tee nlp_ops_dataflow.json
@@ -0,0 +1 @@
+dffml dataflow diagram -stage processing -- nlp_ops_dataflow.json
Original file line number	Diff line number	Diff line change
`@@ -9,3 +9,4 @@ Here we have some examples to better understand the DFFML DataFlows.`
`9`	`9`
`10`	`10`	`locking`
`11`	`11`	`io`
	`12`	`+ nlp`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+dffml dataflow diagram -stage processing -- nlp_ops_dataflow.json`