Added subections for object detection/pixel classification

kapil-varshney · kapil-varshney · commit 77bdc8f4d033 · 2021-10-08T23:03:06.000+05:30
diff --git a/guide/14-deep-learning/model_extension_guide.ipynb b/guide/14-deep-learning/model_extension_guide.ipynb
@@ -1,5 +1,12 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Add a new model using Model Extension"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -11,9 +18,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "With `arcgis.learn`, there are a multitude of machine learning models available for different tasks. There are models for object detection, pixel classification, image translation, natural language processing, point-cloud data, etc. and the list keeps on growing. But, what if you came across a deep learning model that is not yet a part of the `learn` module and you want to use it from its library or its open-source code on github? What if you created your own deep learning model for a specific task you are working on? What if you want to use these new models with all the capabilities of the ArcGIS ecosystem?\n",
+    "With `arcgis.learn`, there are a multitude of machine learning models available for different tasks. There are models for object detection, pixel classification, image translation, natural language processing, point-cloud data, etc. and the list keeps on growing. But what if you come across a deep learning model that is not yet a part of the `learn` module and you want to use it from its library or its open-source code on GitHub? What if you created your own deep learning model for a specific task you are working on? What if you want to use these new models with all the capabilities of the ArcGIS ecosystem?\n",
     "\n",
-    "There is a solution - **Model Extension**, a general purpose wrapper for any **object detection** and **pixel classification** model on top of our existing framework. It wraps all the details of our stack of Pytorch, Fastai, and the learn module and provides an easy to implement structure for the integration of a third-party deep learning model."
+    "There is a solution - **Model Extension**, a general-purpose wrapper for any **object detection** and **pixel classification** model on top of our existing framework. It wraps all the details of our stack of PyTorch, Fastai, and the learn module and provides an easy to implement structure for the integration of a third-party deep learning model."
    ]
   },
   {
@@ -84,22 +91,43 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "1. `on_batch_begin`: This function is required to transform the input data and the target (the ground truth) used for training the model. The transformation of inputs is in accordance to the model input requirements. This function is equivalent to the fastai on_batch_begin function, but is called in succession of it. Therefore, transformation of inputs is needed only if the format required by the model is different from what fastai transforms it into.\n",
+    "1. `on_batch_begin`: This function is required to transform the input data and the target (the ground truth) used for training the model. The transformation of inputs is in accordance to the model input requirements. This function is equivalent to the fastai on_batch_begin function, but is called in succession of it. Therefore, transformation of inputs is needed only if the format required by the model is different from what fastai transforms it into."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The function receives the following arguments:\n",
+    "+ `learn` - a fastai learner object\n",
     "\n",
-    "  The function receives the following arguments:\n",
-    "  + `learn` - a fastai learner object\n",
-    "  + `model_input_batch` -  fastai transformed batch of input images: tensor of shape [N,C,H,W] where\n",
+    "+ `model_input_batch` -  fastai transformed batch of input images: tensor of shape [N, C, H, W] with values in the range -1 and 1, where\n",
+    "                       N - batch size\n",
+    "                       C - number of channels (bands) in the image\n",
+    "                       H - height of the image\n",
+    "                       W - width of the image  \n",
+    "\n",
+    "+ `model_target_batch` - fastai transformed batch of targets. The targets will be of different type and shape for object detection and pixel classification.\n",
+    "  \n",
+    "    **Object Detection**\n",
+    "                       \n",
+    "                       list of tensors [bboxes, classes]\n",
+    "                       \n",
+    "                       bboxes: tensor of shape [N, B, 4], where \n",
     "                           N - batch size\n",
-    "                           C - number of channels (bands) in the image\n",
-    "                           H - height of the image\n",
-    "                           W - width of the image\n",
+    "                           B - the maximum number of boxes present in any image of the batch\n",
+    "                           4 - the bounding box coordinates in the order y1, x1, y2, x2 and values in the range -1 to 1\n",
+    "                           \n",
+    "                       classes: tensor of shape [N, B] representing class of each bounding box\n",
+    "\n",
+    "    **Pixel Classification**\n",
+    "\n",
+    "                       tensor of shape [N, K, H, W] representing a binary raster, where\n",
     "\n",
-    "  + `model_target_batch` - fastai transformed batch of targets: list of tensors [bboxes, classes]\n",
-    "                       bboxes is a tensor of shape [N, B, 4] where \n",
-    "                           N is the batch size\n",
-    "                           B is the max number of boxes present in any image of the batch\n",
-    "                           4 is for the bounding box coordinates in the order y1, x1, y2, x2 with values in range -1 to 1\n",
-    "                       classes is a tensor of shape [N, B] representing class of each bounding box"
+    "                           N - batch size\n",
+    "                           K - number of classes in the dataset\n",
+    "                           H - height of the image\n",
+    "                           W - width of the image"
    ]
   },
   {
@@ -126,11 +154,11 @@
     "2. `transform_input`: This function is required to transform the input images during inferencing.\n",
     "    \n",
     "    The function receives the following arguments:\n",
-    "    + `xb` -  fastai transformed batch of input images: tensor of shape [N,C,H,W] where\n",
-    "                               N - batch size\n",
-    "                               C - number of channels (bands) in the image\n",
-    "                               H - height of the image\n",
-    "                               W - width of the image"
+    "    + `xb` -  fastai transformed batch of input images: tensor of shape [N, C, H, W], where\n",
+    "           N - batch size\n",
+    "           C - number of channels (bands) in the image\n",
+    "           H - height of the image\n",
+    "           W - width of the image"
    ]
   },
   {
@@ -252,12 +280,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The raw output of the model is usually not in the best descernible form. It needs to be post-processed to be understood by the user. The `post_process` function is used to transform the raw outputs of the model to a specific format for the final results and visualization pipeline to ingest.\n",
+    "The raw output of the model is usually not in the best descernible form. It needs to be post-processed to be understood by the user. The `post_process` function is used to transform the raw outputs of the model to a specific format for the final results and visualization pipeline to ingest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Object Detection**\n",
     "\n",
     "The function receives the following arguments:\n",
-    "+ `pred` - Raw output of the model for a batch if images\n",
+    "+ `pred` - Raw output of the model for a batch of images\n",
     "+ `nms_overlap` - Non-maxima suppression value used to select from overlapping bounding boxes\n",
-    "+ `thresh` - Confidence threshold to be used to filter the predictions\n",
+    "+ `thres` - Confidence threshold to be used to filter the predictions\n",
     "+ `chip_size` - Size of the image chips on which predictions are made\n",
     "+ `device` - Device (CPU or GPU) on which the output needs to be put after post-processing\n",
     "\n",
@@ -291,6 +326,41 @@
     "        return post_processed_pred"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Pixel Classification**\n",
+    "\n",
+    "The function receives the following arguments:\n",
+    "+ `pred` - Raw output of the model for a batch of images\n",
+    "+ `thres` - Confidence threshold to be used to filter the predictions\n",
+    "\n",
+    "Returns:\n",
+    "+ `post_processed_pred`: tensor of shape [N, 1, H, W] or a List/Tuple of N tensors of shape [1, H, W], where\n",
+    "                       N - batch size\n",
+    "                       H - height of the image\n",
+    "                       W - width of the image\n",
+    "                       \n",
+    "                       The values (type: LongTensor) of the tensor denote the predicted class of each pixel.\n",
+    "                   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def post_process(self, pred, thres):\n",
+    "        \"\"\"\n",
+    "        Fuction to post process the output of the model in validation/infrencing mode.\n",
+    "        \"\"\"\n",
+    "        post_processed_pred = ...\n",
+    "            \n",
+    "        return post_processed_pred"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -532,7 +602,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "From here onwards, we can continue with the usual workflow of using an `arcgis.learn` deep learning model. Checkout [Detecting Swimming Pools using Deep Learning sample](https://developers.arcgis.com/python/sample-notebooks/detecting-swimming-pools-using-satellite-image-and-deep-learning/) sample notebook to see the workflow for an **object detection** model and the [Land Conver Classification using Satellite Imagery and Deep Learning](https://developers.arcgis.com/python/sample-notebooks/land-cover-classification-using-unet/) sample notebook for the workflow for a **pixel classification** model."
+    "From here onwards, we can continue with the usual workflow of using an `arcgis.learn` deep learning model. Refer to [Detecting Swimming Pools using Deep Learning sample](https://developers.arcgis.com/python/sample-notebooks/detecting-swimming-pools-using-satellite-image-and-deep-learning/) sample notebook to see the workflow for an **object detection** model and the [Land Conver Classification using Satellite Imagery and Deep Learning](https://developers.arcgis.com/python/sample-notebooks/land-cover-classification-using-unet/) sample notebook for the workflow for a **pixel classification** model."
    ]
   },
   {
@@ -559,7 +629,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.12"
+   "version": "3.7.11"
   }
  },
  "nbformat": 4,