karthik_review_update_1

sdash77 · sdash77 · commit 48f5a1fd466a · 2022-12-29T16:43:34.000+05:30
diff --git a/guide/14-deep-learning/how_PSETAE_works.ipynb b/guide/14-deep-learning/how_PSETAE_works.ipynb
@@ -28,7 +28,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Time-series of earth observation data is referred to us collection of satellite images of a location from different time-periods, stacked along the time axis resulting in a 3 dimensional structure. The collection have a common projection and a consistent timeline. Each location in the space-time is a vector of values across a timeline as shown in figure 1."
+    "Earth observation data cube or time-series is referred to as collection of satellite images of a location from different time-periods, stacked vertically resulting in a 3-dimensional structure. The collection have a common projection and a consistent timeline. Each location in the space-time is a vector of values across a timeline as shown in figure 1."
    ]
   },
   {
@@ -72,7 +72,7 @@
    "source": [
     "The [Export Training Data for Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm) is used to export training data for the model. The input satellite time-series is a [composite](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/composite-bands.htm) of rasters or [multi-dimensional raster](https://pro.arcgis.com/en/pro-app/latest/help/data/imagery/an-overview-of-multidimensional-raster-data.htm) from the required time periods or time steps. Here are the [steps](https://www.youtube.com/watch?v=HFbTFTnsMWM), to create multi-dimensional raster from collection of images. \n",
     "\n",
-    "Training labels can be created using the [Label objects for deep learning](https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/label-objects-for-deep-learning.htm#:~:text=The%20Label%20Objects%20for%20Deep,is%20divided%20into%20two%20parts.) tool available inside `Classification Tools`. Pixels are labelled with class based on the available information. As shown in figure are labelling of different crop types. "
+    "Training labels can be created using the [Label objects for deep learning](https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/label-objects-for-deep-learning.htm#:~:text=The%20Label%20Objects%20for%20Deep,is%20divided%20into%20two%20parts.) tool available inside `Classification Tools`. Pixels are labelled into different classes, based on the available information. Labelling of different crop types are shown in the figure. "
    ]
   },
   {
@@ -102,7 +102,7 @@
    "source": [
     "PSETAE architecure is based on transfomers, originally developed for sequence-to-sequence modeling. The proposed architecture encodes time-series of multi-spectral images. The pixels under each class label is given by spectro-temporal tensor of size T x C x N, where T the number of temporal observation, C the number of spectral channels, and N the number of pixels.  \n",
     "\n",
-    "The architecture of pse-tae consists of a pixel-set encoder, temporal attention encoder and, classifier. The components are briefly described in following sections."
+    "The architecture of PSETAE consists of a pixel-set encoder, temporal attention encoder and, classifier. The components are briefly described in following sections."
    ]
   },
   {
@@ -123,9 +123,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The study suggests that, convolution operations may not be suitable for detection the various classes from images with high spectral variations across time. CNNs were also observed to be memory intensive.\n",
+    "The study suggests that, convolution operations may not be suitable for the detection of the various classes from images with high spectral variations across time. CNNs were also observed to be memory intensive.\n",
     "\n",
-    "To overcome this issue, the authors proposed pixel-set encoder (PSE). PSE uses the samples pixels from time-series raster, which is processed through a series of shared MLP(Fully Connected, Batch Norms, Rectified Linear Units) layers. This results in the architecture learning about the statistical descriptors of the particular class pixel's spectral distribution. The output is a spectral embedding for set a pixels of a class at time t."
+    "To overcome this issue, the authors proposed pixel-set encoder (PSE). PSE uses a sample set of pixels from time-series raster, which is processed through a series of shared MLP(Fully Connected, Batch Norms, Rectified Linear Units) layers. This allows the architecture to learn about the statistical descriptors of a particular class's pixel's spectral distribution. The output is a spectral embedding for set a pixels of a class at time t."
    ]
   },
   {
@@ -153,11 +153,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This component is based on the state-of-the-art transformer used originally for dealing with sequential data from [Vaswani et al.](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf). After multi-channel spectral embedding at time t by pse, the temporal attention encoder (tae) tries to find embedding for each class time-series. The authors made the following changes to original transformers:\n",
+    "This component is based on the state-of-the-art transformer used originally for dealing with sequential data from [Vaswani et al.](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf). After multi-channel spectral embedding at time t by pse, the temporal attention encoder (tae) tries to find embedding for each parcel's time-series. The authors made the following changes to original transformers:\n",
     "\n",
     "- The pre-trained word embedding model used in the original mode is replaced by PSE's spectral embedding.\n",
     "- The positional encoder uses the first obervation or date to calculate number of days to other dates. This helps the model to take in account the variance in temporal observations.\n",
-    "- As the goal was to encode the whole time-series into single embedding, rather then output for each element of sequence. Query tensors generated by the each attention heads are pooled into single master query."
+    "- As the goal was to encode the whole time-series into single embedding, rather then output for each element of sequence. Hence, Query tensors generated by the each attention heads are pooled into single master query."
    ]
   },
   {
@@ -185,9 +185,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Shared PSE embeds all input time-series rasters in parallel and resulting embedded sequence is processed by temporal encoder. The resulting embedding is processed by an mlp to produice class logits. Spectro-temporal classifier combines PSE and TAE with a final MLP layer to produce class logits. \n",
+    "Shared PSE embeds all input time-series rasters in parallel and the resulting embedded sequence is processed by temporal encoder. The resulting embedding is processed by an mlp to produice class logits. Spectro-temporal classifier combines PSE and TAE with a final MLP layer to produce class logits. \n",
     "\n",
-    "For further information on the model's architecture, refer to paper."
+    "For further information on the model's architecture, refer to [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Garnot_Satellite_Image_Time_Series_Classification_With_Pixel-Set_Encoders_and_Temporal_CVPR_2020_paper.pdf)."
    ]
   },
   {
@@ -214,7 +214,7 @@
     "* `n_temporal` - optional for multi-dimensional, required for composite raster. *Number of temporal observations or time steps or number of composited rasters*. \n",
     "* `min_points` - optional. *Number of pixels equal to or multiples of 64 to sample from the each labelled region of training data i.e. 64, 128 etc*\n",
     "* `batch_size` - optional. *suggested batch size for this model is around 128*\n",
-    "* `n_temporal_dates` - optional for multi-dimensional, required for composite raster. *The dates of the observations will be used for the positional encoding and should be stored as a list of dates strings in YYYY-MM-DD format. For example, If we have stacked imagery of n bands each from two dates then, ['YYYY-MM-DD','YYYY-MM-DD'].*\n",
+    "* `n_temporal_dates` - optional for multi-dimensional, required for composite raster. *The dates of the observations will be used for the positional encoding and should be stored as a list of date strings in YYYY-MM-DD format*.\n",
     "* `dataset_type` - required. *type of dataset in-sync with the model*"
    ]
   },
@@ -226,27 +226,22 @@
     "\n",
     "`model = arcgis.learn.PSETAE(data=data)`\n",
     "\n",
-    "Default values for optimal performance are set for model's hyperparmeters. \n",
+    "model parameters that can be passed using keyword arguments:\n",
     "\n",
-    "Here data is the object returned from `prepare_data` function.\n",
+    "* `mlp1` - Optional list. Dimensions of the successive feature spaces of MLP1. default set to `[32, 64]`.\n",
+    "* `pooling` - Optional string. Pixel-embedding pooling strategy, can be chosen in ('mean','std','max','min'). default set to 'mean'.\n",
+    "* `mlp2` - Optional list. Dimensions of the successive feature spaces of MLP2. default set to `[128, 128]`.\n",
+    "* `n_head` - Optional integer. Number of attention heads. default set to 4.\n",
+    "* `d_k` - Optional integer. Dimension of the key and query vectors. default set to 32.\n",
+    "* `dropout` - Optional float. dropout. default set to 0.2.\n",
+    "* `T` - Optional integer. Period to use for the positional encoding. default set to 1000.\n",
+    "* `mlp4` - Optional list. dimensions of decoder mlp .default set to `[64, 32]`.\n",
     "\n",
-    "Than, the basic `arcgis.learn` workflow can be followed.\n",
+    "Default values for optimal performance are set for model's hyperparmeters. \n",
     "\n",
-    "For more information about the API & modify model's keyword arguments, please go to the [API reference](https://developers.arcgis.com/python/api-reference/arcgis.learn.toc.html)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Summary "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this guide, we learned about the various details of the psetae model, its working and how we can initialize the model in `arcgis.learn`."
+    "Here, `data` is the object returned from `prepare_data` function.\n",
+    "\n",
+    "For more information about the API, please go through the [API reference](https://developers.arcgis.com/python/api-reference/arcgis.learn.toc.html)."
    ]
   },
   {