diff --git a/ML/PII/PII Demo- Fine-Tune.ipynb b/ML/PII/PII Demo- Fine-Tune.ipynb
new file mode 100644
index 0000000..c11ff2d
--- /dev/null
+++ b/ML/PII/PII Demo- Fine-Tune.ipynb	
@@ -0,0 +1 @@
+{"cells": [{"metadata": {}, "cell_type": "code", "source": "# @hidden_cell\n# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.\nfrom project_lib import Project\nproject = Project(project_id='ae1a755d-e162-4f07-9f5a-130d2280e78e', project_access_token='p-aa90b9b21de435c3f4c94494a24b5c5e69d030f8')\npc = project.project_context\n", "execution_count": 8, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "# Extract the Personal Identifiable Information (PII) using Watson NLP"}, {"metadata": {}, "cell_type": "markdown", "source": "<h2>Use Case</h2>\n\nThis notebook demonstrates how to extract PII entities using Watson NLP Pertained models also demonstrates how to prepare custom train models. PII (Personally extraction is the process of identifying and extracting personal information from a document or dataset. This information can include names, addresses, phone numbers, email addresses, Social Security numbers, Credit Card number, and other types of information that can be used to identify an individual. \n\n\n<h2>What you'll learn in this notebook</h2>\n\nWatson NLP offers Pertained Models for various NLP tasks also provides fine-tune functionality for custom training. This notebooks shows:\n\n* <b>RBR</b>:  A Rule-Based Reasoner (RBR) in NLP works by using a set of predefined rules to process and understand natural language input. These rules are used to identify specific patterns or structures in the input text and determine the meaning of the text based on those patterns.\n\n\n* <b>BILSTM</b>: the BiLSTM network would take the preprocessed text as input and learn to identify patterns and relationships between words that are indicative of PII data. The BiLSTM network would then output a probability score for each word in the text, indicating the likelihood that the word is part of a PII entity. The BiLSTM network may also be trained to recognize specific entities such as names, addresses, phone numbers, email addresses, etc.\n\n\n* <b>SIRE</b>: Statistical Information and Relation Extraction (SIRE) is a technique used in natural language processing (NLP) to extract specific information and relationships from text. It involves using machine learning algorithms to identify and extract structured data such as entities, attributes, and relations from unstructured text. SIRE is used in a variety of applications, including information extraction, knowledge graph construction, and question answering. SIRE typically uses supervised learning approach, where a model is trained using annotated examples of text and the corresponding structured data. The model can then be used to extract the same information from new, unseen text.\n\n\n* <b>Bert</b>: IBM Watson NLP BERT uses a pre-trained version of BERT that was trained on a large corpus of text data. The pre-trained model can be fine-tuned on a specific task such as text classification, named entity recognition, and more. The BERT architecture consists of an encoder network that is made up of multiple layers of transformer blocks. Each transformer block includes a self-attention mechanism and a feed-forward neural network.\n\n\n* <b>Transformer</b>: This model is a neural network architecture that is used in natural language processing tasks such as language translation, text summarization, and language generation. It is based on self-attention mechanism and can be used to extract information such as named entities, relationships and sentiments from the text."}, {"metadata": {}, "cell_type": "markdown", "source": "## Table of Contents\n\n\n1.  [Before you start](#beforeYouStart)\n1.\t[Load Entity PII Models](#LoadModel)\n1.  [Load PII XLSX Dataset from Data Assets](#Loaddata)\n1.  [TrainingData](#TrainingData)\n1.  [Watson NLP Models](#NLPModels)    \n    1. [BiLSTM Fine-tuned](#BILSTMFINE)\n    1. [SIRE Fine-tuned](#SIRETune)\n    1. [Transformer Fine-tuned](#TransTUne)\n    \n1.  [Testing With Hanzo's Test Dataset](#Testing)    \n1.  [Summary](#summary)"}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"beforeYouStart\"></a>\n### 1. Before you start\n"}, {"metadata": {}, "cell_type": "markdown", "source": "<div class=\"alert alert-block alert-danger\">\n<b>Stop kernel of other notebooks.</b></div>\n\n**Note:** If you have other notebooks currently running with the _Default Python 3.8 + Watson NLP XS_ environment, **stop their kernels** before running this notebook. All these notebooks share the same runtime environment, and if they are running in parallel, you may encounter memory issues. To stop the kernel of another notebook, open that notebook, and select _File > Stop Kernel_.\n\n<div class=\"alert alert-block alert-warning\">\n<b>Set Project token.</b></div>\n\nBefore you can begin working on this notebook in Watson Studio in Cloud Pak for Data as a Service, you need to ensure that the project token is set so that you can access the project assets via the notebook.\n\nWhen this notebook is added to the project, a project access token should be inserted at the top of the notebook in a code cell. If you do not see the cell above, add the token to the notebook by clicking **More > Insert project token** from the notebook action bar.  By running the inserted hidden code cell, a project object is created that you can use to access project resources.\n\n![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif)\n\n<div class=\"alert alert-block alert-info\">\n<b>Tip:</b> Cell execution</div>\n\nNote that you can step through the notebook execution cell by cell, by selecting Shift-Enter. Or you can execute the entire notebook by selecting **Cell -> Run All** from the menu."}, {"metadata": {}, "cell_type": "code", "source": "import json\nimport pandas as pd\nimport watson_nlp\nfrom watson_nlp import data_model as dm\nfrom watson_nlp.toolkit.entity_mentions_utils import prepare_train_from_json", "execution_count": 9, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Silence Tensorflow warnings\nimport tensorflow as tf\ntf.get_logger().setLevel('ERROR')\ntf.autograph.set_verbosity(0)", "execution_count": 10, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"LoadModel\"></a>\n### 2. Load Entity PII Models"}, {"metadata": {}, "cell_type": "code", "source": "# Load a syntax model to split the text into sentences and tokens\nsyntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock'))\n# Load bilstm model in WatsonNLP\nbilstm_model = watson_nlp.load(watson_nlp.download('entity-mentions_bilstm_en_pii'))\n# Load rbr model in WatsonNLP\nrbr_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_multi_pii'))\n# Download the GloVe model to be used as embeddings in the BiLSTM\nglove_model = watson_nlp.load(watson_nlp.download('embedding_glove_en_stock'))\n# Download the algorithm template\nmentions_train_template = watson_nlp.load(watson_nlp.download('file_path_entity-mentions_sire_multi_template-crf'))\n# Download the feature extractor\ndefault_feature_extractor = watson_nlp.load(watson_nlp.download('feature-extractor_rbr_entity-mentions_sire_en_stock'))\n# Download and load the pretrained model resource\n#pretrained_model_resource = watson_nlp.load(watson_nlp.download('pretrained-model_roberta-base_v2-8-0_llm_transformer_lang_en_cased_2022-05-06-052653'))", "execution_count": 11, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"Loaddata\"></a>\n### 3. Load PII XLSX Dataset from Data Assets"}, {"metadata": {}, "cell_type": "code", "source": "import os, types\nfrom botocore.client import Config\nimport ibm_boto3\n\ndef __iter__(self): return 0\n\n# @hidden_cell\n# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.\n# You might want to remove those credentials before you share the notebook.\ncos_client = ibm_boto3.client(service_name='s3',\n    ibm_api_key_id='o0avUc3SDky2d6pNzjuewCSTPPX7tQNz6BKKvL37nBL3',\n    ibm_auth_endpoint=\"https://iam.cloud.ibm.com/oidc/token\",\n    config=Config(signature_version='oauth'),\n    endpoint_url='https://s3.private.us.cloud-object-storage.appdomain.cloud')\n\nbucket = 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1'\nobject_key = '10-MB-Test.xlsx'\n\nbody = cos_client.get_object(Bucket=bucket,Key=object_key)['Body']\n\ndf = pd.read_excel(body.read())\ndf = df.dropna()\ndf.head()", "execution_count": 12, "outputs": [{"output_type": "execute_result", "execution_count": 12, "data": {"text/plain": "  First and Last Name          SSN   Credit Card Number First and Last Name.1  \\\n1       Robert\u00a0Aragon  489-36-8350  4929-3813-3266-4295         Robert\u00a0Aragon   \n2       Ashley\u00a0Borden  514-14-8905  5370-4638-8881-3020         Ashley\u00a0Borden   \n3       Thomas\u00a0Conley  690-05-5315  4916-4811-5814-8111         Thomas\u00a0Conley   \n4         Susan\u00a0Davis  421-37-1396  4916-4034-9269-8783           Susan\u00a0Davis   \n5    Christopher\u00a0Diaz  458-02-6124  5299-1561-5689-1938      Christopher\u00a0Diaz   \n\n         SSN.1 Credit Card Number.1 First and Last Name.2        SSN.2  \\\n1  489-36-8351  4929-3813-3266-4296         Robert\u00a0Aragon  489-36-8352   \n2  514-14-8906  5370-4638-8881-3021         Ashley\u00a0Borden  514-14-8907   \n3  690-05-5316  4916-4811-5814-8112         Thomas\u00a0Conley  690-05-5317   \n4  421-37-1397  4916-4034-9269-8784           Susan\u00a0Davis  421-37-1398   \n5  458-02-6125  5299-1561-5689-1939      Christopher\u00a0Diaz  458-02-6126   \n\n  Credit Card Number.2 First and Last Name.3  ... Credit Card Number.3  \\\n1  4929-3813-3266-4297         Robert\u00a0Aragon  ...  4929-3813-3266-4298   \n2  5370-4638-8881-3022         Ashley\u00a0Borden  ...  5370-4638-8881-3023   \n3  4916-4811-5814-8113         Thomas\u00a0Conley  ...  4916-4811-5814-8114   \n4  4916-4034-9269-8785           Susan\u00a0Davis  ...  4916-4034-9269-8786   \n5  5299-1561-5689-1940      Christopher\u00a0Diaz  ...  5299-1561-5689-1941   \n\n  First and Last Name.4        SSN.4 Credit Card Number.4  \\\n1         Robert\u00a0Aragon  489-36-8354  4929-3813-3266-4299   \n2         Ashley\u00a0Borden  514-14-8909  5370-4638-8881-3024   \n3         Thomas\u00a0Conley  690-05-5319  4916-4811-5814-8115   \n4           Susan\u00a0Davis  421-37-1400  4916-4034-9269-8787   \n5      Christopher\u00a0Diaz  458-02-6128  5299-1561-5689-1942   \n\n  First and Last Name.5        SSN.5 Credit Card Number.5  \\\n1         Robert\u00a0Aragon  489-36-8355  4929-3813-3266-4300   \n2         Ashley\u00a0Borden  514-14-8910  5370-4638-8881-3025   \n3         Thomas\u00a0Conley  690-05-5320  4916-4811-5814-8116   \n4           Susan\u00a0Davis  421-37-1401  4916-4034-9269-8788   \n5      Christopher\u00a0Diaz  458-02-6129  5299-1561-5689-1943   \n\n  First and Last Name.6        SSN.6 Credit Card Number.6  \n1         Robert\u00a0Aragon  489-36-8355  4929-3813-3266-4300  \n2         Ashley\u00a0Borden  514-14-8910  5370-4638-8881-3025  \n3         Thomas\u00a0Conley  690-05-5320  4916-4811-5814-8116  \n4           Susan\u00a0Davis  421-37-1401  4916-4034-9269-8788  \n5      Christopher\u00a0Diaz  458-02-6129  5299-1561-5689-1943  \n\n[5 rows x 21 columns]", "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>First and Last Name</th>\n      <th>SSN</th>\n      <th>Credit Card Number</th>\n      <th>First and Last Name.1</th>\n      <th>SSN.1</th>\n      <th>Credit Card Number.1</th>\n      <th>First and Last Name.2</th>\n      <th>SSN.2</th>\n      <th>Credit Card Number.2</th>\n      <th>First and Last Name.3</th>\n      <th>...</th>\n      <th>Credit Card Number.3</th>\n      <th>First and Last Name.4</th>\n      <th>SSN.4</th>\n      <th>Credit Card Number.4</th>\n      <th>First and Last Name.5</th>\n      <th>SSN.5</th>\n      <th>Credit Card Number.5</th>\n      <th>First and Last Name.6</th>\n      <th>SSN.6</th>\n      <th>Credit Card Number.6</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>1</th>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8350</td>\n      <td>4929-3813-3266-4295</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8351</td>\n      <td>4929-3813-3266-4296</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8352</td>\n      <td>4929-3813-3266-4297</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>...</td>\n      <td>4929-3813-3266-4298</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8354</td>\n      <td>4929-3813-3266-4299</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8355</td>\n      <td>4929-3813-3266-4300</td>\n      <td>Robert\u00a0Aragon</td>\n      <td>489-36-8355</td>\n      <td>4929-3813-3266-4300</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8905</td>\n      <td>5370-4638-8881-3020</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8906</td>\n      <td>5370-4638-8881-3021</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8907</td>\n      <td>5370-4638-8881-3022</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>...</td>\n      <td>5370-4638-8881-3023</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8909</td>\n      <td>5370-4638-8881-3024</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8910</td>\n      <td>5370-4638-8881-3025</td>\n      <td>Ashley\u00a0Borden</td>\n      <td>514-14-8910</td>\n      <td>5370-4638-8881-3025</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5315</td>\n      <td>4916-4811-5814-8111</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5316</td>\n      <td>4916-4811-5814-8112</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5317</td>\n      <td>4916-4811-5814-8113</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>...</td>\n      <td>4916-4811-5814-8114</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5319</td>\n      <td>4916-4811-5814-8115</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5320</td>\n      <td>4916-4811-5814-8116</td>\n      <td>Thomas\u00a0Conley</td>\n      <td>690-05-5320</td>\n      <td>4916-4811-5814-8116</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1396</td>\n      <td>4916-4034-9269-8783</td>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1397</td>\n      <td>4916-4034-9269-8784</td>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1398</td>\n      <td>4916-4034-9269-8785</td>\n      <td>Susan\u00a0Davis</td>\n      <td>...</td>\n      <td>4916-4034-9269-8786</td>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1400</td>\n      <td>4916-4034-9269-8787</td>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1401</td>\n      <td>4916-4034-9269-8788</td>\n      <td>Susan\u00a0Davis</td>\n      <td>421-37-1401</td>\n      <td>4916-4034-9269-8788</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6124</td>\n      <td>5299-1561-5689-1938</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6125</td>\n      <td>5299-1561-5689-1939</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6126</td>\n      <td>5299-1561-5689-1940</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>...</td>\n      <td>5299-1561-5689-1941</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6128</td>\n      <td>5299-1561-5689-1942</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6129</td>\n      <td>5299-1561-5689-1943</td>\n      <td>Christopher\u00a0Diaz</td>\n      <td>458-02-6129</td>\n      <td>5299-1561-5689-1943</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows \u00d7 21 columns</p>\n</div>"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"TrainingData\"></a>\n### 4. Preparing Training Data"}, {"metadata": {}, "cell_type": "markdown", "source": "Let's generate sentences using the columns of PII information. Ideally, the sentences would include name, SSN, and credit card number in context."}, {"metadata": {}, "cell_type": "code", "source": "def format_data(df, name_col, ssn_col, ccn_col):  \n    import random\n    \n    train_list = []\n    for i in range(1, len(df)):\n        name = df[name_col][i] \n        ssn = str(df[ssn_col][i])\n        ccn = str(df[ccn_col][i])\n        \n        text1 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (name, ssn, ccn)\n        text2 = \"%s is my social security number. The name on my American Express card %s is %s.\" % (ssn, ccn, name)\n        text3 = \"\"\n        text = random.choice([text1, text2])\n\n        name_begin = text.find(name)\n        name_end = text.find(name) + len(name)\n        ssn_begin = text.find(ssn)\n        ssn_end = text.find(ssn) + len(ssn)\n        ccn_begin = text.find(ccn)\n        ccn_end = text.find(ccn) + len(ccn)\n\n        data = {\n                    \"text\": text,\n                    \"mentions\": [\n                        {\n                            \"location\": {\n                                \"begin\": name_begin,\n                                \"end\": name_end\n                            },\n                            \"text\": name,\n                            \"type\": \"Name\"\n                        },\n                        {\n                            \"location\": {\n                                \"begin\": ssn_begin,\n                                \"end\": ssn_end\n                            },\n                            \"text\": ssn,\n                            \"type\": \"SocialSecurityNumber\"\n                        },\n                        {\n                            \"location\": {\n                                \"begin\": ccn_begin,\n                                \"end\": ccn_end\n                            },\n                            \"text\": ccn,\n                            \"type\": \"CreditCardNumber\"\n                        }\n                    ]   \n                }\n\n        train_list.append(data)\n    return train_list", "execution_count": 13, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "train_list = format_data(df=df, name_col='First and Last Name', ssn_col='SSN', ccn_col='Credit Card Number')", "execution_count": 14, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "Save the sentences into a json training file and a json dev file. This will save the file to the runtime local as well as the project data assets."}, {"metadata": {}, "cell_type": "code", "source": "with open('PII_text_train.json', 'w') as f:\n    json.dump(train_list, f)\nproject.save_data('PII_text_train.json', data=json.dumps(train_list), overwrite=True)", "execution_count": 15, "outputs": [{"output_type": "execute_result", "execution_count": 15, "data": {"text/plain": "{'file_name': 'PII_text_train.json',\n 'message': 'File saved to project storage.',\n 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',\n 'asset_id': '216b85be-aabe-4ff6-b264-acd101222fbc'}"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "dev_list = format_data(df=df, name_col='First and Last Name.1', ssn_col='SSN.1', ccn_col='Credit Card Number.1')", "execution_count": 16, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "with open('PII_text_dev.json', 'w') as f:\n    json.dump(dev_list, f)\nproject.save_data('PII_text_dev.json', data=json.dumps(dev_list), overwrite=True)", "execution_count": 17, "outputs": [{"output_type": "execute_result", "execution_count": 17, "data": {"text/plain": "{'file_name': 'PII_text_dev.json',\n 'message': 'File saved to project storage.',\n 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',\n 'asset_id': '76834e31-ab93-4aca-b86b-ce6e71476478'}"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "text = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][1], df['Credit Card Number'][1])", "execution_count": 18, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "train_data = dm.DataStream.from_json_array(\"PII_text_train.json\")\ntrain_iob_stream = prepare_train_from_json(train_data, syntax_model)\ndev_data = dm.DataStream.from_json_array(\"PII_text_dev.json\")\ndev_iob_stream = prepare_train_from_json(dev_data, syntax_model)", "execution_count": 19, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"NLPModels\"></a>\n### 5. Watson NLP Models"}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"BILSTMFINE\"></a>\n\n### BiLSTM Fine-tuned"}, {"metadata": {}, "cell_type": "code", "source": "bilstm_custom = bilstm_model.train(train_iob_stream, \n                                   dev_iob_stream, \n                                   embedding=glove_model.embedding,\n                                   #vocab_tags=None, \n                                   #char_embed_dim=32, \n                                   #dropout=0.2, \n                                   #num_oov_buckets=1, \n                                   num_train_epochs=5,\n                                   num_conf_epochs=5, \n                                   checkpoint_interval=5, \n                                   learning_rate=0.005, \n                                   #shuffle_buffer=2000, \n                                   #char_lstm_size=64, \n                                   #char_bidir=False, \n                                   lstm_size=16, \n                                   #train_batch_size=32, \n                                   #lower_case=False, \n                                   #embedding_lowercase=True, \n                                   #keep_model_artifacts=False)\n                                  )", "execution_count": null, "outputs": [{"output_type": "stream", "text": "2066/2138 [===========================>..] - ETA: 6s - loss: 1.7969e-04", "name": "stdout"}]}, {"metadata": {}, "cell_type": "code", "source": "project.save_data('bilstm_pii_custom', data=bilstm_custom.as_file_like_object(), overwrite=True)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "syntax_result = syntax_model.run(text)\nbilstm_result = bilstm_custom.run(syntax_result)\nbilstm_result", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"SIRETune\"></a>\n\n### SIRE Fine-tuned\n"}, {"metadata": {}, "cell_type": "code", "source": "help(watson_nlp.blocks.entity_mentions.SIRE)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "sire_custom = watson_nlp.blocks.entity_mentions.SIRE.train(train_iob_stream, \n                                                           'en', \n                                                           mentions_train_template,\n                                                           feature_extractors=[default_feature_extractor])", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "project.save_data('sire_pii_custom', data=sire_custom.as_file_like_object(), overwrite=True)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "syntax_result = syntax_model.run(text)\nsire_result = sire_custom.run(syntax_result)\nsire_result", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"TransTUne\"></a>\n### Transformer Fine-tuned\n"}, {"metadata": {"scrolled": false}, "cell_type": "code", "source": "#help(watson_nlp.blocks.entity_mentions.Transformer)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Download and load the pretrained model resource\npretrained_model_resource = watson_nlp.load(watson_nlp.download('pretrained-model_watbert_multi_transformer_multi_uncased'))", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "'''\ntransformer_custom = watson_nlp.blocks.entity_mentions.Transformer.train(train_iob_stream,\n                                                                         dev_iob_stream,\n                                                                         pretrained_model_resource,\n                                                                         lr=0.005,\n                                                                         num_train_epochs=5,\n                                                                         #per_device_train_batch_size=32,\n                                                                         #per_device_eval_batch_size=32,\n                                                                         max_seq_length=205,\n                                                                         seed=1,\n                                                                         keep_model_artifacts=True)\n'''\ntransformer_custom = watson_nlp.blocks.entity_mentions.Transformer.train(train_iob_stream,\n                                                                         dev_iob_stream,\n                                                                         pretrained_model_resource,\n                                                                         num_train_epochs=8,\n                                                                         learning_rate=3e-5,\n                                                                         per_device_train_batch_size=1,\n                                                                         per_device_eval_batch_size=32)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "project.save_data('transformer_pii_custom', data=transformer_custom.as_file_like_object(), overwrite=True)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "#Test the custom train transformer model \nsyntax_result = syntax_model.run(test1)\ntransformer_result = transformer_custom.run(syntax_result)\ntransformer_result", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"summary\"></a>\n## 6. Summary\n\n<span style=\"color:blue\">This notebook shows you how to use the Watson NLP library to:\n1. Extract PII Using Custom or Fine tune Models </span>"}, {"metadata": {}, "cell_type": "markdown", "source": "Please note that this content is made available by IBM Build Lab to foster Embedded AI technology adoption. The content may include systems & methods pending patent with USPTO and protected under US Patent Laws. For redistribution of this content, IBM will use release process. For any questions please log an issue in the GitHub.\n\nDeveloped by IBM Build Lab\n\nCopyright - 2022 IBM Corporation"}], "metadata": {"kernelspec": {"name": "python3", "display_name": "Python 3.10", "language": "python"}, "language_info": {"name": "python", "version": "3.10.6", "mimetype": "text/x-python", "codemirror_mode": {"name": "ipython", "version": 3}, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py"}}, "nbformat": 4, "nbformat_minor": 1}
\ No newline at end of file
diff --git a/ML/PII/PII Demo- Pertained.ipynb b/ML/PII/PII Demo- Pertained.ipynb
new file mode 100644
index 0000000..0120a9b
--- /dev/null
+++ b/ML/PII/PII Demo- Pertained.ipynb	
@@ -0,0 +1 @@
+{"cells": [{"metadata": {}, "cell_type": "code", "source": "# @hidden_cell\n# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.\nfrom project_lib import Project\nproject = Project(project_id='ae1a755d-e162-4f07-9f5a-130d2280e78e', project_access_token='p-aa90b9b21de435c3f4c94494a24b5c5e69d030f8')\npc = project.project_context\n", "execution_count": 1, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "# Extract the Personal Identifiable Information (PII) using Watson NLP"}, {"metadata": {}, "cell_type": "markdown", "source": "<h2>Use Case</h2>\n\nThis notebook demonstrates how to extract PII entities using Watson NLP Pertained models also demonstrates how to prepare custom train models. PII (Personally extraction is the process of identifying and extracting personal information from a document or dataset. This information can include names, addresses, phone numbers, email addresses, Social Security numbers, Credit Card number, and other types of information that can be used to identify an individual. \n\n\n<h2>What you'll learn in this notebook</h2>\n\nWatson NLP offers Pertained Models for various NLP tasks also provides fine-tune functionality for custom training. This notebooks shows:\n\n* <b>RBR</b>:  A Rule-Based Reasoner (RBR) in NLP works by using a set of predefined rules to process and understand natural language input. These rules are used to identify specific patterns or structures in the input text and determine the meaning of the text based on those patterns.\n\n\n* <b>BILSTM</b>: the BiLSTM network would take the preprocessed text as input and learn to identify patterns and relationships between words that are indicative of PII data. The BiLSTM network would then output a probability score for each word in the text, indicating the likelihood that the word is part of a PII entity. The BiLSTM network may also be trained to recognize specific entities such as names, addresses, phone numbers, email addresses, etc.\n\n\n* <b>SIRE</b>: Statistical Information and Relation Extraction (SIRE) is a technique used in natural language processing (NLP) to extract specific information and relationships from text. It involves using machine learning algorithms to identify and extract structured data such as entities, attributes, and relations from unstructured text. SIRE is used in a variety of applications, including information extraction, knowledge graph construction, and question answering. SIRE typically uses supervised learning approach, where a model is trained using annotated examples of text and the corresponding structured data. The model can then be used to extract the same information from new, unseen text.\n\n\n* <b>Bert</b>: IBM Watson NLP BERT uses a pre-trained version of BERT that was trained on a large corpus of text data. The pre-trained model can be fine-tuned on a specific task such as text classification, named entity recognition, and more. The BERT architecture consists of an encoder network that is made up of multiple layers of transformer blocks. Each transformer block includes a self-attention mechanism and a feed-forward neural network.\n\n\n* <b>Transformer</b>: This model is a neural network architecture that is used in natural language processing tasks such as language translation, text summarization, and language generation. It is based on self-attention mechanism and can be used to extract information such as named entities, relationships and sentiments from the text."}, {"metadata": {}, "cell_type": "markdown", "source": "## Table of Contents\n\n\n1.  [Before you start](#beforeYouStart)\n1.\t[Load Entity PII Models](#LoadModel)\n1.  [Watson NLP Models](#NLPModels)    \n    1. [BiLSTM Pretrained](#BILSTMPre)\n    1. [RBR Pretrained](#RBRPre)\n    \n1.  [Testing With Hanzo's Test Dataset](#Testing)    \n1.  [Summary](#summary)"}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"beforeYouStart\"></a>\n### 1. Before you start\n"}, {"metadata": {}, "cell_type": "markdown", "source": "<div class=\"alert alert-block alert-danger\">\n<b>Stop kernel of other notebooks.</b></div>\n\n**Note:** If you have other notebooks currently running with the _Default Python 3.8 + Watson NLP XS_ environment, **stop their kernels** before running this notebook. All these notebooks share the same runtime environment, and if they are running in parallel, you may encounter memory issues. To stop the kernel of another notebook, open that notebook, and select _File > Stop Kernel_.\n\n<div class=\"alert alert-block alert-warning\">\n<b>Set Project token.</b></div>\n\nBefore you can begin working on this notebook in Watson Studio in Cloud Pak for Data as a Service, you need to ensure that the project token is set so that you can access the project assets via the notebook.\n\nWhen this notebook is added to the project, a project access token should be inserted at the top of the notebook in a code cell. If you do not see the cell above, add the token to the notebook by clicking **More > Insert project token** from the notebook action bar.  By running the inserted hidden code cell, a project object is created that you can use to access project resources.\n\n![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif)\n\n<div class=\"alert alert-block alert-info\">\n<b>Tip:</b> Cell execution</div>\n\nNote that you can step through the notebook execution cell by cell, by selecting Shift-Enter. Or you can execute the entire notebook by selecting **Cell -> Run All** from the menu."}, {"metadata": {}, "cell_type": "code", "source": "import json\nimport pandas as pd\nimport watson_nlp\nfrom watson_nlp import data_model as dm\nfrom watson_nlp.toolkit.entity_mentions_utils import prepare_train_from_json", "execution_count": 2, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Silence Tensorflow warnings\nimport tensorflow as tf\ntf.get_logger().setLevel('ERROR')\ntf.autograph.set_verbosity(0)", "execution_count": 3, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"LoadModel\"></a>\n### 2. Load Entity PII Models"}, {"metadata": {}, "cell_type": "code", "source": "# Load a syntax model to split the text into sentences and tokens\nsyntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock'))\n# Load bilstm model in WatsonNLP\nbilstm_model = watson_nlp.load(watson_nlp.download('entity-mentions_bilstm_en_pii'))\n# Load rbr model in WatsonNLP\nrbr_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_multi_pii'))\n# Download the GloVe model to be used as embeddings in the BiLSTM\nglove_model = watson_nlp.load(watson_nlp.download('embedding_glove_en_stock'))\n# Download the algorithm template\nmentions_train_template = watson_nlp.load(watson_nlp.download('file_path_entity-mentions_sire_multi_template-crf'))\n# Download the feature extractor\ndefault_feature_extractor = watson_nlp.load(watson_nlp.download('feature-extractor_rbr_entity-mentions_sire_en_stock'))\n# Download and load the pretrained model resource\n#pretrained_model_resource = watson_nlp.load(watson_nlp.download('pretrained-model_roberta-base_v2-8-0_llm_transformer_lang_en_cased_2022-05-06-052653'))", "execution_count": 4, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "text = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][1], df['Credit Card Number'][1])", "execution_count": 11, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"NLPModels\"></a>\n### 3.  Watson NLP Models"}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"BILSTMPre\"></a>\n### BiLSTM Pretrained"}, {"metadata": {"scrolled": true}, "cell_type": "code", "source": "#help(bilstm_model)", "execution_count": 50, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "syntax_result = syntax_model.run(text)\nbilstm_result = bilstm_model.run(syntax_result)\nbilstm_result", "execution_count": 14, "outputs": [{"output_type": "execute_result", "execution_count": 14, "data": {"text/plain": "{\n  \"mentions\": [\n    {\n      \"span\": {\n        \"begin\": 11,\n        \"end\": 24,\n        \"text\": \"Robert\u00a0Aragon\"\n      },\n      \"type\": \"Person\",\n      \"producer_id\": {\n        \"name\": \"BiLSTM Entity Mentions\",\n        \"version\": \"1.0.0\"\n      },\n      \"confidence\": 0.8697594404220581,\n      \"mention_type\": \"MENTT_UNSET\",\n      \"mention_class\": \"MENTC_UNSET\",\n      \"role\": \"\"\n    }\n  ],\n  \"producer_id\": {\n    \"name\": \"BiLSTM Entity Mentions\",\n    \"version\": \"1.0.0\"\n  }\n}"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"RBRPre\"></a>\n\n### RBR Pretrained\n"}, {"metadata": {}, "cell_type": "code", "source": "", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "text1 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][1], \"` 378282246310005 `\")\ntext2 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][2], \"`378282246310005 `\")\ntext3 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][3], \"` 378282246310005 `\")\ntext4 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][4], \"` 5555555555554444 `\")\ntext5 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][5], \"`5555555555554444 `\")\ntext6= \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][6], \"` 5555-5555-5555-4444 `\")\ntext7 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][7], \"`5555-5555-5555-4444 `\")\ntext8 = \"My name is %s, and my social security number is %s. Here's the number to my Visa credit card, %s\" % (df['First and Last Name'][1], df['SSN'][8], \"` 5555-5555-5555-4444`\")", "execution_count": 52, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "text1", "execution_count": 53, "outputs": [{"output_type": "execute_result", "execution_count": 53, "data": {"text/plain": "\"My name is Robert\\xa0Aragon, and my social security number is 489-36-8350. Here's the number to my Visa credit card, ` 378282246310005 `\""}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "all_test=[text1,text2,text3,text4,text5,text6,text7,text8]", "execution_count": 54, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "#help(rbr_model)", "execution_count": null, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Test the pretrain\nfor test in all_test:\n    rbr_result = rbr_model.run(test, language_code='en')\n\n    for i in rbr_result.mentions:\n        print(\"Text: \", i.span.text.ljust(15, \" \"), \"Type: \", i.type)", "execution_count": 55, "outputs": [{"output_type": "stream", "text": "Text:  489-36-8350     Type:  NationalNumber.SocialSecurityNumber.US\nText:  378282246310005 Type:  BankAccountNumber.CreditCardNumber.Amex\nText:  514-14-8905     Type:  NationalNumber.SocialSecurityNumber.US\nText:  378282246310005 Type:  BankAccountNumber.CreditCardNumber.Amex\nText:  690-05-5315     Type:  NationalNumber.SocialSecurityNumber.US\nText:  378282246310005 Type:  BankAccountNumber.CreditCardNumber.Amex\nText:  421-37-1396     Type:  NationalNumber.SocialSecurityNumber.US\nText:  5555555555554444 Type:  BankAccountNumber.CreditCardNumber.Master\nText:  458-02-6124     Type:  NationalNumber.SocialSecurityNumber.US\nText:  5555555555554444 Type:  BankAccountNumber.CreditCardNumber.Master\nText:  612-20-6832     Type:  NationalNumber.SocialSecurityNumber.US\nText:  5555-5555-5555-4444 Type:  BankAccountNumber.CreditCardNumber.Master\nText:  300-62-3266     Type:  NationalNumber.SocialSecurityNumber.US\nText:  5555-5555-5555-4444 Type:  BankAccountNumber.CreditCardNumber.Master\nText:  660-03-8360     Type:  NationalNumber.SocialSecurityNumber.US\nText:  5555-5555-5555-4444 Type:  BankAccountNumber.CreditCardNumber.Master\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "code", "source": "#test dataset for URL extraction \n\ntest1 = \"My name is Robert\\xa0Aragon, and my social security number is 489-36-8350. Here's the number to my Visa credit card, http://www.example.com/page_id=5555555555554444\"", "execution_count": 29, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"Testing\"></a>\n\n## 4. Testing With Hanzo's Test Dataset"}, {"metadata": {}, "cell_type": "markdown", "source": "### RBR Stock (URL Extraction)"}, {"metadata": {}, "cell_type": "code", "source": "#Test Pretrained rbr stock model in WatsonNLP\nrbr_ent_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_en_stock'))\nrbr_ent_result = rbr_ent_model.run(test1)\n\nfor i in rbr_ent_result.mentions:\n    print(\"Text: \", i.span.text.ljust(15, \" \"), \"Type: \", i.type)", "execution_count": 30, "outputs": [{"output_type": "stream", "text": "Text:  489-36-8350     Type:  PhoneNumber\nText:  http://www.example.com/page_id=5555555555554444 Type:  URL\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "code", "source": "#Test Pretrained rbr PII model in WatsonNLP\ntest2 = \"http://www.example.com/page_id=5555555555554444\"\n\nrbr_result_pii = rbr_model.run(test2, language_code='en')\nrbr_result_pii\n\nfor i in rbr_result_pii.mentions:\n    print(\"Text: \", i.span.text.ljust(15, \" \"), \"Type: \", i.type)", "execution_count": 31, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "#Test Pretrained bilstm_model model in WatsonNLP\nsyntax_result = syntax_model.run(test2)\nbilstm_result = bilstm_model.run(syntax_result)\nbilstm_result\n\nfor i in bilstm_result.mentions:\n    print(\"Text: \", i.span.text.ljust(15, \" \"), \"Type: \", i.type)", "execution_count": 30, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Testing dataset\ntext1 = \"Here's the number to my Visa credit card, %s\" % ( \"378282246310005\")\ntext2 = \"Here's the number to my Visa credit card, %s\" % ( \" 378282246310005\")\ntext3 = \"Here's the number to my Visa credit card, %s\" % ( \"378282246310005 \")\ntext4 = \"Here's the number to my Visa credit card, %s\" % ( \" 378282246310005 \")\ntext5 = \"Here's the number to my Visa credit card, %s\" % ( \"5555555555554444\")\ntext6 = \"Here's the number to my Visa credit card, %s\" % ( \" 5555555555554444\")\ntext7 = \"Here's the number to my Visa credit card, %s\" % ( \"5555555555554444 \")\ntext8 = \"Here's the number to my Visa credit card, %s\" % ( \" 5555555555554444 \")\ntext9= \"Here's the number to my Visa credit card, %s\" % ( \"5555-5555-5555-4444\")\ntext10 = \"Here's the number to my Visa credit card, %s\" % ( \" 5555-5555-5555-4444\")\ntext11= \"Here's the number to my Visa credit card, %s\" % ( \"5555-5555-5555-4444 \")\ntext12 = \"Here's the number to my Visa credit card, %s\" % ( \" 5555-5555-5555-4444 \")\n\nall_test=[text1,text2,text3,text4,text5,text6,text7,text8,text8,text9,text10,text11,text12]", "execution_count": 18, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "# Test the pretrain\nt=0\nfor test in all_test:\n    rbr_result = rbr_model.run(test, language_code='en')\n    \n    for i in rbr_result.mentions:\n        print(\"Text\"+str(t), i.span.text.ljust(15, \" \"), \"Type: \", i.type)\n    t+=1", "execution_count": 25, "outputs": [{"output_type": "stream", "text": "Text2 378282246310005 Type:  BankAccountNumber.CreditCardNumber.Amex\nText3 378282246310005 Type:  BankAccountNumber.CreditCardNumber.Amex\nText6 5555555555554444 Type:  BankAccountNumber.CreditCardNumber.Master\nText7 5555555555554444 Type:  BankAccountNumber.CreditCardNumber.Master\nText8 5555555555554444 Type:  BankAccountNumber.CreditCardNumber.Master\nText9 5555-5555-5555  Type:  PhoneNumber\nText10 5555-5555-5555  Type:  PhoneNumber\nText11 5555-5555-5555-4444 Type:  BankAccountNumber.CreditCardNumber.Master\nText12 5555-5555-5555-4444 Type:  BankAccountNumber.CreditCardNumber.Master\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "<a id=\"summary\"></a>\n## 5. Summary\n\n<span style=\"color:blue\">This notebook shows you how to use the Watson NLP library to:\n1. Extract PII using Pertained Models\n</span>"}, {"metadata": {}, "cell_type": "markdown", "source": "Please note that this content is made available by IBM Build Lab to foster Embedded AI technology adoption. The content may include systems & methods pending patent with USPTO and protected under US Patent Laws. For redistribution of this content, IBM will use release process. For any questions please log an issue in the GitHub.\n\nDeveloped by IBM Build Lab\n\nCopyright - 2022 IBM Corporation"}], "metadata": {"kernelspec": {"name": "python3", "display_name": "Python 3.10", "language": "python"}, "language_info": {"name": "python", "version": "3.10.6", "mimetype": "text/x-python", "codemirror_mode": {"name": "ipython", "version": 3}, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py"}}, "nbformat": 4, "nbformat_minor": 1}
\ No newline at end of file
diff --git a/MLOps/custom-model-k8s/README.md b/MLOps/custom-model-k8s/README.md
index 9dcd359..848e221 100644
--- a/MLOps/custom-model-k8s/README.md
+++ b/MLOps/custom-model-k8s/README.md
@@ -1,223 +1,7 @@
 # Serving a Custom Model on a Kubernetes or OpenShift Cluster
 
-In this tutorial you will take a Watson NLP model that you have trained in Watson Studio and serve it on a Kubernetes or OpenShift cluster. The model will be packaged as a container image using the [model builder](https://github.com/IBM/ibm-watson-embed-model-builder). The container images can be used in the same way as the pretrained Watson NLP models, i.e. specified as init containers of Watson NLP Runtime Pods.
+With IBM Watson NLP, IBM introduced a common library for natural language processing, document understanding, translation, and trust. IBM Watson NLP brings everything under one umbrella for consistency and ease of development and deployment. 
 
-To complete this tutorial, you need to have first completed the [Consumer Complaints Classification](https://techzone.ibm.com/collection/watson-nlp-text-classification#tab-1) tutorial, which includes steps on training a custom ensemble model and saving it to the Cloud Object Storage (COS) bucket associated with the project.
+Follow the [tutorial](https://developer.ibm.com/tutorials/serve-custom-models-on-kubernetes-or-openshift/ to learn how to take a Watson NLP model that you trained in IBM Watson Studio and serve it on a Kubernetes or Red Hat OpenShift cluster. 
 
-## Reference Architecture
-
-![Reference architecure](Images/ref-arch-custom-models.png)
-
-## Prerequisites
-
-- [Python 3.9](https://www.python.org/downloads/) or later is installed
-- [Docker Desktop](https://docs.docker.com/get-docker/) is installed
-- Docker has access to the [Watson NLP Runtime and pretrained models](https://github.com/ibm-build-lab/Watson-NLP/blob/main/MLOps/access/README.md#docker)
-- You have a Kubernetes or OpenShift cluster on which you can deploy an application
-- You have either the Kubernetes (`kubectl`) or OpenShift (`oc`) CLI installed, and logged into your cluster. The current namespace should be set to the namespace in which you will deploy the model service
-- Your Kubernetes or OpenShift cluster has access to the [Watson NLP Runtime and pretrained models](https://github.com/ibm-build-lab/Watson-NLP/blob/main/MLOps/access/README.md#kubernetes-and-openshift)
-- You have completed the [Consumer Complaints Classification](https://techzone.ibm.com/collection/watson-nlp-text-classification#tab-1) tutorial, and have saved the custom trained model named `ensemble_model` to the COS bucket associated with the project. The tutorial uses this [notebook](https://github.com/ibm-build-lab/Watson-NLP/blob/main/ML/Text-Classification/Consumer%20complaints%20Classification.ipynb).
-
-## Steps
-
-### 1. Save your model
-
-First, you will export your Watson NLP model from Watson Studio on IBM Cloud. In the IBM Cloud Pak for Data GUI, navigate to the page for your Consumer Complaints Classification project. Click on the **Assets** tab. There you should find a model named `ensemble_mode` stored as a ZIP file.
-
-If the model is not there, go back to the notebook and ensure that you have followed the steps in the notebook:
-
-- Insert a project token into the notebook, and
-- Run the cell that saves the model.
-
-```python
-project.save_data('ensemble_model', data=ensemble_model.as_file_like_object(), overwrite=True)
-```
-
-Use the vertical ellipsis to the right of the model name to open a menu with the download option. Download the model to your local machine.
-
-Next, we will unzip the file. Create a directory to unzip the file into.
-
-```sh
-mkdir models
-```
-
-```sh
-mkdir models/ensemble_model
-```
-
-Unzip the file into the newly created directory. You may need to specify the path to the ZIP file if it is not in the current directory.
-
-```sh
-unzip ensemble_model -d models/ensemble_model
-```
-
-### 2. Build the model image
-
-Prepare your Python environment.
-
-```sh
-python3 -m venv client-env
-```
-
-```sh
-source client-env/bin/activate
-```
-
-Install the [model builder](https://github.com/IBM/ibm-watson-embed-model-builder) package.
-
-```sh
-pip install watson-embed-model-packager
-```
-
-Run the setup for the model builder package.
-
-```sh
-python -m watson_embed_model_packager setup \
-    --library-version watson_nlp:3.2.0 \
-    --local-model-dir /path/to/models \
-    --output-csv model-manifest.csv
-```
-
-Ensure that you replace `/path/to/models` in the above command with the path to your `models` directory. This command will generate the file `model-manifest.csv` that will be used during the build.
-
-Run the build command.
-
-```sh
-python -m watson_embed_model_packager build --config model-manifest.csv
-```
-
-This will create a Docker image with the name `watson-nlp_ensemble_model`.
-
-Verify the existence of this image:
-
-```sh
-docker images
-```
-
-### 3. Copy the model to a container registry
-
-To deploy this image in Kubernetes or OpenShift cluster, you must first provision the image to a container repository. Tag your image with proper repository and namespace/project name. Replace `<REGISTRY>` and `<NAMESPACE>` in the following commands based on your configuration.
-
-```sh
-docker tag watson-nlp_ensemble_model:latest <REGISTRY>/<NAMESPACE>/watson-nlp_ensemble_model:latest
-```
-
-Push the image to the registry.
-
-```sh
-docker push <REGISTRY>/<NAMESPACE>/watson-nlp_ensemble_model:latest
-```
-
-### 4. Serve the models
-
-Clone the GitHub repository containing sample code for this tutorial.
-
-```sh
-git clone https://github.com/ibm-build-lab/Watson-NLP
-```
-
-Go to the directory for this tutorial.
-
-```sh
-cd Watson-NLP/MLOps/custom-model-k8s
-```
-
-Open the Kubernetes manifest for editing.
-
-```sh
-vim deployment/deployment.yaml
-```
-
-Update the init container line in the file to point to your custom model image.
-
-```yaml
-    spec:
-      initContainers:
-      - name: ensemble-model
-        image: <REGISTRY>/<NAMESPACE>/watson-nlp_ensemble_model:latest
-```
-
-Create a [secret](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#registry-secret-existing-credentials) in the namespace to give credentials to the registry used, and [add this secret](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-pod-that-uses-your-secret) to the `imagePullSecrets` section, so that your Pod can pull the image from the registry.
-
-Deploy the model service.
-
-If using Kubernetes:
-
-```sh
-kubectl apply -f deployment/deployment.yaml
-```
-
-If using OpenShift:
-
-```sh
-oc apply -f deployment/deployment.yaml
-```
-
-The model service is now deployed.
-
-### 5. Test the service
-
-Run a simple Python client program to test that the model is being served. Note that the client code is specific to the model. If you serve a different model you will need to update the client program.
-
-Install the Python client library on your machine.
-
-```sh
-pip install watson_nlp_runtime_client
-```
-
-Enable port forwarding from your local machine.
-
-If running the service in a Kubernetes cluster:
-
-```sh
-kubectl port-forward svc/watson-nlp-runtime-service 8085
-```
-
-For OpenShift:
-
-```sh
-oc port-forward svc/watson-nlp-runtime-service 8085
-```
-
-Go to the directory with the client program and run it.
-
-```sh
-cd Client
-```
-
-Run the program with a single string argument.
-
-```sh
-python client.py "Watson NLP is awesome"
-```
-
-The program will return output similar to the following.
-
-```sh
-###### Calling GRPC endpoint =  localhost:8085
-###### Calling remote GRPC model =  ensemble_model
-classes {
-  class_name: "Credit reporting, credit repair services, or other personal consumer reports"
-  confidence: 0.328219473
-}
-classes {
-  class_name: "Debt collection"
-  confidence: 0.262635
-}
-classes {
-  class_name: "Credit card or prepaid card"
-  confidence: 0.16425848
-}
-classes {
-  class_name: "Checking or savings account"
-  confidence: 0.102090739
-}
-classes {
-  class_name: "Mortgage"
-  confidence: 0.0733666793
-}
-producer_id {
-  name: "Voting based Ensemble"
-  version: "0.0.1"
-}
-```
+The model is packaged as a container image using the model builder. The container images can be used in the same way as the pretrained Watson NLP models, that is, specified as init containers of Watson NLP Runtime Pods.

	First and Last Name	SSN	Credit Card Number	First and Last Name.1	SSN.1	Credit Card Number.1	First and Last Name.2	SSN.2	Credit Card Number.2	First and Last Name.3	...	Credit Card Number.3	First and Last Name.4	SSN.4	Credit Card Number.4	First and Last Name.5	SSN.5	Credit Card Number.5	First and Last Name.6	SSN.6	Credit Card Number.6
1	Robert\u00a0Aragon	489-36-8350	4929-3813-3266-4295	Robert\u00a0Aragon	489-36-8351	4929-3813-3266-4296	Robert\u00a0Aragon	489-36-8352	4929-3813-3266-4297	Robert\u00a0Aragon	...	4929-3813-3266-4298	Robert\u00a0Aragon	489-36-8354	4929-3813-3266-4299	Robert\u00a0Aragon	489-36-8355	4929-3813-3266-4300	Robert\u00a0Aragon	489-36-8355	4929-3813-3266-4300
2	Ashley\u00a0Borden	514-14-8905	5370-4638-8881-3020	Ashley\u00a0Borden	514-14-8906	5370-4638-8881-3021	Ashley\u00a0Borden	514-14-8907	5370-4638-8881-3022	Ashley\u00a0Borden	...	5370-4638-8881-3023	Ashley\u00a0Borden	514-14-8909	5370-4638-8881-3024	Ashley\u00a0Borden	514-14-8910	5370-4638-8881-3025	Ashley\u00a0Borden	514-14-8910	5370-4638-8881-3025
3	Thomas\u00a0Conley	690-05-5315	4916-4811-5814-8111	Thomas\u00a0Conley	690-05-5316	4916-4811-5814-8112	Thomas\u00a0Conley	690-05-5317	4916-4811-5814-8113	Thomas\u00a0Conley	...	4916-4811-5814-8114	Thomas\u00a0Conley	690-05-5319	4916-4811-5814-8115	Thomas\u00a0Conley	690-05-5320	4916-4811-5814-8116	Thomas\u00a0Conley	690-05-5320	4916-4811-5814-8116
4	Susan\u00a0Davis	421-37-1396	4916-4034-9269-8783	Susan\u00a0Davis	421-37-1397	4916-4034-9269-8784	Susan\u00a0Davis	421-37-1398	4916-4034-9269-8785	Susan\u00a0Davis	...	4916-4034-9269-8786	Susan\u00a0Davis	421-37-1400	4916-4034-9269-8787	Susan\u00a0Davis	421-37-1401	4916-4034-9269-8788	Susan\u00a0Davis	421-37-1401	4916-4034-9269-8788
5	Christopher\u00a0Diaz	458-02-6124	5299-1561-5689-1938	Christopher\u00a0Diaz	458-02-6125	5299-1561-5689-1939	Christopher\u00a0Diaz	458-02-6126	5299-1561-5689-1940	Christopher\u00a0Diaz	...	5299-1561-5689-1941	Christopher\u00a0Diaz	458-02-6128	5299-1561-5689-1942	Christopher\u00a0Diaz	458-02-6129	5299-1561-5689-1943	Christopher\u00a0Diaz	458-02-6129	5299-1561-5689-1943