OCBL215 - TFX Standard Components Walkthrough

akshaykumarpatil-tudip · akshaykumarpatil-tudip · commit 31d682f8744e · 2022-08-19T17:00:13.000+05:30
- Updated the target and solution notebook instructions as necessary.
- Added the note to restart the kernel.
- Updated the content formatting of the instructions.
diff --git a/workshops/tfx-caip-tf23/lab-01-tfx-walkthrough/labs/lab-01.ipynb b/workshops/tfx-caip-tf23/lab-01-tfx-walkthrough/labs/lab-01.ipynb
@@ -11,20 +11,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Learning Objectives\n",
+    "## Learning objectives\n",
     "\n",
     "1.  Develop a high level understanding of TFX pipeline components.\n",
     "2.  Learn how to use a TFX Interactive Context for prototype development of TFX pipelines.\n",
     "3.  Work with the Tensorflow Data Validation (TFDV) library to check and analyze input data.\n",
     "4.  Utilize the Tensorflow Transform (TFT) library for scalable data preprocessing and feature transformations.\n",
     "5.  Employ the Tensorflow Model Analysis (TFMA) library for model evaluation.\n",
     "\n",
-    "In this lab, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
+    "## Introduction\n",
+    "\n",
+    "In this notebook, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
     "\n",
     "You will utilize  **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalabe database like MySQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
     "\n",
-    "**Setup Note:**\n",
-    "Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select *Launch Classic Notebook* from the *Help* menu."
+    "**Setup Note**:\n",
+    "\n",
+    "Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select **Launch Classic Notebook** from the **Help** menu."
    ]
   },
   {
@@ -142,6 +145,13 @@
     "%pip install --upgrade --user tensorflow_model_analysis==0.25.0"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Restart the kernel by using Kernel > Restart kernel > Restart.**"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -151,6 +161,58 @@
     "Set constants, location paths and other environment settings. "
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import absl\n",
+    "import os\n",
+    "import tempfile\n",
+    "import time\n",
+    "\n",
+    "import tensorflow as tf\n",
+    "import tensorflow_data_validation as tfdv\n",
+    "import tensorflow_model_analysis as tfma\n",
+    "import tensorflow_transform as tft\n",
+    "import tfx\n",
+    "\n",
+    "from pprint import pprint\n",
+    "from tensorflow_metadata.proto.v0 import schema_pb2, statistics_pb2, anomalies_pb2\n",
+    "from tensorflow_transform.tf_metadata import schema_utils\n",
+    "from tfx.components import CsvExampleGen\n",
+    "from tfx.components import Evaluator\n",
+    "from tfx.components import ExampleValidator\n",
+    "from tfx.components import InfraValidator\n",
+    "from tfx.components import Pusher\n",
+    "from tfx.components import ResolverNode\n",
+    "from tfx.components import SchemaGen\n",
+    "from tfx.components import StatisticsGen\n",
+    "from tfx.components import Trainer\n",
+    "from tfx.components import Transform\n",
+    "from tfx.components import Tuner\n",
+    "from tfx.dsl.components.base import executor_spec\n",
+    "from tfx.components.common_nodes.importer_node import ImporterNode\n",
+    "from tfx.components.trainer import executor as trainer_executor\n",
+    "from tfx.dsl.experimental import latest_blessed_model_resolver\n",
+    "from tfx.orchestration import metadata\n",
+    "from tfx.orchestration import pipeline\n",
+    "from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext\n",
+    "from tfx.proto import evaluator_pb2\n",
+    "from tfx.proto import example_gen_pb2\n",
+    "from tfx.proto import infra_validator_pb2\n",
+    "from tfx.proto import pusher_pb2\n",
+    "from tfx.proto import trainer_pb2\n",
+    "from tfx.proto.evaluator_pb2 import SingleSlicingSpec\n",
+    "\n",
+    "from tfx.types import Channel\n",
+    "from tfx.types.standard_artifacts import Model\n",
+    "from tfx.types.standard_artifacts import HyperParameters\n",
+    "from tfx.types.standard_artifacts import ModelBlessing\n",
+    "from tfx.types.standard_artifacts import InfraBlessing"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1075,7 +1137,7 @@
     "### Visualize evaluation results\n",
     "You can visualize the evaluation results using the `tfma.view.render_slicing_metrics()` function from TensorFlow Model Analysis library.\n",
     "\n",
-    "**Setup Note:** *Currently, TFMA visualizations don't render in  JupyterLab. Make sure that you run this notebook in Classic Notebook.*"
+    "**Setup Note**: **Currently, TFMA visualizations don't render in  JupyterLab. Make sure that you run this notebook in Classic Notebook.**"
    ]
   },
   {
diff --git a/workshops/tfx-caip-tf23/lab-01-tfx-walkthrough/solutions/lab-01.ipynb b/workshops/tfx-caip-tf23/lab-01-tfx-walkthrough/solutions/lab-01.ipynb
@@ -11,20 +11,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Learning Objectives\n",
+    "## Learning objectives\n",
     "\n",
     "1.  Develop a high level understanding of TFX pipeline components.\n",
     "2.  Learn how to use a TFX Interactive Context for prototype development of TFX pipelines.\n",
     "3.  Work with the Tensorflow Data Validation (TFDV) library to check and analyze input data.\n",
     "4.  Utilize the Tensorflow Transform (TFT) library for scalable data preprocessing and feature transformations.\n",
     "5.  Employ the Tensorflow Model Analysis (TFMA) library for model evaluation.\n",
     "\n",
-    "In this lab, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
+    "## Introduction\n",
+    "\n",
+    "In this notebook, you will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX to analyze, understand, and pre-process the dataset and train, analyze, validate, and deploy a multi-class classification model to predict the type of forest cover from cartographic features.\n",
     "\n",
     "You will utilize  **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment. Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts. In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells. In a production deployment, ML Metadata will be managed in a scalabe database like MySQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.\n",
     "\n",
-    "**Setup Note:**\n",
-    "Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select *Launch Classic Notebook* from the *Help* menu."
+    "**Setup Note**:\n",
+    "\n",
+    "Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select **Launch Classic Notebook** from the **Help** menu."
    ]
   },
   {
@@ -142,6 +145,13 @@
     "%pip install --upgrade --user tensorflow_model_analysis==0.25.0"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Restart the kernel by using Kernel > Restart kernel > Restart.**"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -151,6 +161,58 @@
     "Set constants, location paths and other environment settings. "
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import absl\n",
+    "import os\n",
+    "import tempfile\n",
+    "import time\n",
+    "\n",
+    "import tensorflow as tf\n",
+    "import tensorflow_data_validation as tfdv\n",
+    "import tensorflow_model_analysis as tfma\n",
+    "import tensorflow_transform as tft\n",
+    "import tfx\n",
+    "\n",
+    "from pprint import pprint\n",
+    "from tensorflow_metadata.proto.v0 import schema_pb2, statistics_pb2, anomalies_pb2\n",
+    "from tensorflow_transform.tf_metadata import schema_utils\n",
+    "from tfx.components import CsvExampleGen\n",
+    "from tfx.components import Evaluator\n",
+    "from tfx.components import ExampleValidator\n",
+    "from tfx.components import InfraValidator\n",
+    "from tfx.components import Pusher\n",
+    "from tfx.components import ResolverNode\n",
+    "from tfx.components import SchemaGen\n",
+    "from tfx.components import StatisticsGen\n",
+    "from tfx.components import Trainer\n",
+    "from tfx.components import Transform\n",
+    "from tfx.components import Tuner\n",
+    "from tfx.dsl.components.base import executor_spec\n",
+    "from tfx.components.common_nodes.importer_node import ImporterNode\n",
+    "from tfx.components.trainer import executor as trainer_executor\n",
+    "from tfx.dsl.experimental import latest_blessed_model_resolver\n",
+    "from tfx.orchestration import metadata\n",
+    "from tfx.orchestration import pipeline\n",
+    "from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext\n",
+    "from tfx.proto import evaluator_pb2\n",
+    "from tfx.proto import example_gen_pb2\n",
+    "from tfx.proto import infra_validator_pb2\n",
+    "from tfx.proto import pusher_pb2\n",
+    "from tfx.proto import trainer_pb2\n",
+    "from tfx.proto.evaluator_pb2 import SingleSlicingSpec\n",
+    "\n",
+    "from tfx.types import Channel\n",
+    "from tfx.types.standard_artifacts import Model\n",
+    "from tfx.types.standard_artifacts import HyperParameters\n",
+    "from tfx.types.standard_artifacts import ModelBlessing\n",
+    "from tfx.types.standard_artifacts import InfraBlessing"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -1094,7 +1156,7 @@
     "### Visualize evaluation results\n",
     "You can visualize the evaluation results using the `tfma.view.render_slicing_metrics()` function from TensorFlow Model Analysis library.\n",
     "\n",
-    "**Setup Note:** *Currently, TFMA visualizations don't render in  JupyterLab. Make sure that you run this notebook in Classic Notebook.*"
+    "**Setup Note**: **Currently, TFMA visualizations don't render in  JupyterLab. Make sure that you run this notebook in Classic Notebook.**"
    ]
   },
   {