|
1 | 1 | {
|
2 | 2 | "cells": [
|
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# How PSETAE model works" |
| 8 | + ] |
| 9 | + }, |
3 | 10 | {
|
4 | 11 | "cell_type": "markdown",
|
5 | 12 | "metadata": {},
|
6 | 13 | "source": [
|
7 | 14 | "## Table of Contents\n",
|
8 | 15 | "\n",
|
9 |
| - "* [Satellite Image Time-Series](#Satellite-Image-Time-Series)\n", |
10 |
| - "* [Training Labels and Time-Series Satellite Imagery](#Training-Labels-and-Time-Series-Satellite-Imagery)\n", |
11 |
| - "* [Time-Series Classification](#Time-Series-Classification)\n", |
12 |
| - " * [Pixel-Set Encoder](#Pixel-Set-Encoder)\n", |
13 |
| - " * [Temporal Attention Encoder](#Temporal-Attention-Encoder)\n", |
| 16 | + "* [Satellite image time series](#Satellite-image-time-series)\n", |
| 17 | + "* [Prepare training data for satellite image time series](#Prepare-training-data-for-satellite-image-time-series)\n", |
| 18 | + "* [Architecture](#Architecture)\n", |
| 19 | + " * [Pixel-set encoder](#Pixel-set-encoder)\n", |
| 20 | + " * [Temporal attention encoder](#Temporal-attention-encoder)\n", |
14 | 21 | " * [Spectro-temporal classifier](#Spectro-temporal-classifier)\n",
|
15 | 22 | "* [Implementation in arcgis.learn](#Implementation-in-arcgis.learn) \n",
|
16 | 23 | "* [Summary](#Summary)\n",
|
|
21 | 28 | "cell_type": "markdown",
|
22 | 29 | "metadata": {},
|
23 | 30 | "source": [
|
24 |
| - "## Satellite Image Time-Series" |
| 31 | + "## Satellite image time series" |
25 | 32 | ]
|
26 | 33 | },
|
27 | 34 | {
|
28 | 35 | "cell_type": "markdown",
|
29 | 36 | "metadata": {},
|
30 | 37 | "source": [
|
31 |
| - "Earth observation data cube or time-series is referred to as collection of satellite images of a location from different time-periods, stacked vertically resulting in a 3-dimensional structure. The collection have a common projection and a consistent timeline. Each location in the space-time is a vector of values across a timeline as shown in figure 1." |
| 38 | + "Earth observation time series is referred to as collection of satellite images of a location from different time-periods, stacked vertically resulting in a 3-dimensional structure. The collection has a common projection and a consistent timeline. Each location in the space-time is a vector of values across a timeline as shown in figure 1:" |
32 | 39 | ]
|
33 | 40 | },
|
34 | 41 | {
|
|
42 | 49 | "cell_type": "markdown",
|
43 | 50 | "metadata": {},
|
44 | 51 | "source": [
|
45 |
| - "<center> Figure 1. Time-series of satellite imagery </center>" |
| 52 | + "<center> Figure 1. Time-series of satellite imagery. [2] </center>" |
46 | 53 | ]
|
47 | 54 | },
|
48 | 55 | {
|
49 | 56 | "cell_type": "markdown",
|
50 | 57 | "metadata": {},
|
51 | 58 | "source": [
|
52 |
| - "Combination of temporal component with spectral information allow for detection and classification of complex pattern in various applications such as crop type and condition classification, mineral and land-cover mapping etc" |
| 59 | + "Combination of temporal component with spectral information allow for detection and classification of complex pattern in various applications such as crop type and condition classification, mineral and land-cover mapping etc." |
53 | 60 | ]
|
54 | 61 | },
|
55 | 62 | {
|
56 | 63 | "cell_type": "markdown",
|
57 | 64 | "metadata": {},
|
58 | 65 | "source": [
|
59 |
| - "In this guide, we will focus on PSETAE, a transformer based deep learning model originally developed by [Garnot et al](https://openaccess.thecvf.com/content_CVPR_2020/papers/Garnot_Satellite_Image_Time_Series_Classification_With_Pixel-Set_Encoders_and_Temporal_CVPR_2020_paper.pdf) for agricultural parcels classification into different crop types in satellite image time-series." |
| 66 | + "In this guide, we will focus on `PSETAE`, a transformer based deep learning model originally developed by [Garnot et al](https://openaccess.thecvf.com/content_CVPR_2020/papers/Garnot_Satellite_Image_Time_Series_Classification_With_Pixel-Set_Encoders_and_Temporal_CVPR_2020_paper.pdf) for agricultural parcels classification into different crop types in satellite image time-series." |
60 | 67 | ]
|
61 | 68 | },
|
62 | 69 | {
|
63 | 70 | "cell_type": "markdown",
|
64 | 71 | "metadata": {},
|
65 | 72 | "source": [
|
66 |
| - "## Training Labels and Time-Series Satellite Imagery" |
| 73 | + "## Prepare training data for satellite image time series" |
67 | 74 | ]
|
68 | 75 | },
|
69 | 76 | {
|
70 | 77 | "cell_type": "markdown",
|
71 | 78 | "metadata": {},
|
72 | 79 | "source": [
|
73 |
| - "The [Export Training Data for Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm) is used to export training data for the model. The input satellite time-series is a [composite](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/composite-bands.htm) of rasters or [multi-dimensional raster](https://pro.arcgis.com/en/pro-app/latest/help/data/imagery/an-overview-of-multidimensional-raster-data.htm) from the required time periods or time steps. Here are the [steps](https://www.youtube.com/watch?v=HFbTFTnsMWM), to create multi-dimensional raster from collection of images. \n", |
| 80 | + "The [Export Training Data for Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm) tool is used to export training data for the model. The input satellite time-series is a [composite](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/composite-bands.htm) of rasters or [multi-dimensional raster](https://pro.arcgis.com/en/pro-app/latest/help/data/imagery/an-overview-of-multidimensional-raster-data.htm) from the required time periods or time steps. Here are the [steps](https://www.youtube.com/watch?v=HFbTFTnsMWM), to create multi-dimensional raster from collection of images. \n", |
74 | 81 | "\n",
|
75 | 82 | "Training labels can be created using the [Label objects for deep learning](https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/label-objects-for-deep-learning.htm#:~:text=The%20Label%20Objects%20for%20Deep,is%20divided%20into%20two%20parts.) tool available inside `Classification Tools`. Pixels are labelled into different classes, based on the available information. Labelling of different crop types are shown in the figure. "
|
76 | 83 | ]
|
|
93 | 100 | "cell_type": "markdown",
|
94 | 101 | "metadata": {},
|
95 | 102 | "source": [
|
96 |
| - "## Time-Series Classification" |
| 103 | + "## Architecture" |
97 | 104 | ]
|
98 | 105 | },
|
99 | 106 | {
|
100 | 107 | "cell_type": "markdown",
|
101 | 108 | "metadata": {},
|
102 | 109 | "source": [
|
103 |
| - "PSETAE architecure is based on transfomers, originally developed for sequence-to-sequence modeling. The proposed architecture encodes time-series of multi-spectral images. The pixels under each class label is given by spectro-temporal tensor of size T x C x N, where T the number of temporal observation, C the number of spectral channels, and N the number of pixels. \n", |
| 110 | + "`PSETAE` model architecure is based on transfomers, originally developed for sequence-to-sequence modeling. The proposed architecture encodes time-series of multi-spectral images. The pixels under each class label is given by spectro-temporal tensor of size T x C x N, where T the number of temporal observation, C the number of spectral channels, and N the number of pixels. \n", |
104 | 111 | "\n",
|
105 |
| - "The architecture of PSETAE consists of a pixel-set encoder, temporal attention encoder and, classifier. The components are briefly described in following sections." |
| 112 | + "The architecture of `PSETAE` model consists of a pixel-set encoder, temporal attention encoder and, classifier. The components are briefly described in following sections." |
106 | 113 | ]
|
107 | 114 | },
|
108 | 115 | {
|
109 | 116 | "cell_type": "markdown",
|
110 | 117 | "metadata": {},
|
111 | 118 | "source": [
|
112 |
| - "## Model architecture" |
113 |
| - ] |
114 |
| - }, |
115 |
| - { |
116 |
| - "cell_type": "markdown", |
117 |
| - "metadata": {}, |
118 |
| - "source": [ |
119 |
| - "### Pixel-Set Encoder" |
| 119 | + "### Pixel-set encoder" |
120 | 120 | ]
|
121 | 121 | },
|
122 | 122 | {
|
|
139 | 139 | "cell_type": "markdown",
|
140 | 140 | "metadata": {},
|
141 | 141 | "source": [
|
142 |
| - "<center> Figure 3. Pixel-Set Encoder </center> " |
| 142 | + "<center> Figure 3. Pixel-Set Encoder. [1] </center> " |
143 | 143 | ]
|
144 | 144 | },
|
145 | 145 | {
|
146 | 146 | "cell_type": "markdown",
|
147 | 147 | "metadata": {},
|
148 | 148 | "source": [
|
149 |
| - "### Temporal Attention Encoder" |
| 149 | + "### Temporal attention encoder" |
150 | 150 | ]
|
151 | 151 | },
|
152 | 152 | {
|
|
171 | 171 | "cell_type": "markdown",
|
172 | 172 | "metadata": {},
|
173 | 173 | "source": [
|
174 |
| - "<center> Figure 4. Temporal Attention Encoder </center> " |
| 174 | + "<center> Figure 4. Temporal Attention Encoder. [1] </center> " |
175 | 175 | ]
|
176 | 176 | },
|
177 | 177 | {
|
|
194 | 194 | "cell_type": "markdown",
|
195 | 195 | "metadata": {},
|
196 | 196 | "source": [
|
197 |
| - "## Implementation in `arcgis.learn`" |
| 197 | + "## Implementation in `arcgis.learn`" |
198 | 198 | ]
|
199 | 199 | },
|
200 | 200 | {
|
|
203 | 203 | "source": [
|
204 | 204 | "Input Raster - time-series raster is a [composite](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/composite-bands.htm) or [multi-dimensional](https://pro.arcgis.com/en/pro-app/latest/help/data/imagery/an-overview-of-multidimensional-raster-data.htm) raster from the required time periods or time steps. \n",
|
205 | 205 | "\n",
|
206 |
| - "Export - use the input raster to export the raster chips in `RCNN Masks` metadata format using [Export Training Data for Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm) tool available in `ArcGIS Pro`.\n", |
207 |
| - "\n", |
208 |
| - "The resulting path from from export tool is provided to [prepare_data](https://developers.arcgis.com/python/api-reference/arcgis.learn.toc.html#prepare-data) function in `arcgis.learn` to create a databunch.\n", |
| 206 | + "Export - use the input raster to export the raster chips in `RCNN Masks` metadata format using [Export Training Data for Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm) tool available in `ArcGIS Pro`. The resulting path from from export tool is provided to [prepare_data](https://developers.arcgis.com/python/api-reference/arcgis.learn.toc.html#prepare-data) function in `arcgis.learn` to create a databunch.\n", |
209 | 207 | "\n",
|
210 | 208 | "`data = prepare_data(path=r\"path/to/exported/data\", n_temporal, min_points, batch_size, n_temporal_dates, dataset_type=\"PSETAE\")`\n",
|
211 | 209 | " \n",
|
212 | 210 | "where,\n",
|
213 | 211 | "\n",
|
214 |
| - "* `n_temporal` - optional for multi-dimensional, required for composite raster. *Number of temporal observations or time steps or number of composited rasters*. \n", |
215 |
| - "* `min_points` - optional. *Number of pixels equal to or multiples of 64 to sample from the each labelled region of training data i.e. 64, 128 etc*\n", |
216 |
| - "* `batch_size` - optional. *suggested batch size for this model is around 128*\n", |
217 |
| - "* `n_temporal_dates` - optional for multi-dimensional, required for composite raster. *The dates of the observations will be used for the positional encoding and should be stored as a list of date strings in YYYY-MM-DD format*.\n", |
218 |
| - "* `dataset_type` - required. *type of dataset in-sync with the model*" |
| 212 | + "- `n_temporal` - Optional for multi-dimensional, required for composite raster. Number of temporal observations or time steps or number of composited rasters. \n", |
| 213 | + "- `min_points` - Optional. Number of pixels equal to or multiples of 64 to sample from the each labelled region of training data i.e. 64, 128, etc.\n", |
| 214 | + "- `batch_size` - Optional. Suggested batch size for this model is around 128.\n", |
| 215 | + "- `n_temporal_dates` - Optional for multi-dimensional, required for composite raster. The dates of the observations will be used for the positional encoding and should be stored as a list of date strings in YYYY-MM-DD format.\n", |
| 216 | + "- `dataset_type` - Required. Type of dataset in-sync with the model." |
219 | 217 | ]
|
220 | 218 | },
|
221 | 219 | {
|
|
224 | 222 | "source": [
|
225 | 223 | "By default, initialization of the `PSETAE` object as shown below:\n",
|
226 | 224 | "\n",
|
227 |
| - "`model = arcgis.learn.PSETAE(data=data)`\n", |
| 225 | + "`model = arcgis.learn.PSETAE(data)`\n", |
228 | 226 | "\n",
|
229 | 227 | "model parameters that can be passed using keyword arguments:\n",
|
230 | 228 | "\n",
|
|
262 | 260 | ],
|
263 | 261 | "metadata": {
|
264 | 262 | "kernelspec": {
|
265 |
| - "display_name": "Python 3", |
| 263 | + "display_name": "Python 3 (ipykernel)", |
266 | 264 | "language": "python",
|
267 | 265 | "name": "python3"
|
268 | 266 | },
|
|
276 | 274 | "name": "python",
|
277 | 275 | "nbconvert_exporter": "python",
|
278 | 276 | "pygments_lexer": "ipython3",
|
279 |
| - "version": "3.7.11" |
| 277 | + "version": "3.9.15" |
280 | 278 | },
|
281 | 279 | "toc": {
|
282 | 280 | "base_numbering": 1,
|
|
0 commit comments