Skip to content

Commit ea8974c

Browse files
committed
Add landsat masking section to masking docs page
Signed-off-by: Jason T. Brown <[email protected]>
1 parent 5529d48 commit ea8974c

File tree

1 file changed

+70
-5
lines changed

1 file changed

+70
-5
lines changed

pyrasterframes/src/main/python/docs/masking.pymd

Lines changed: 70 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ spark = create_rf_spark_session()
1414
```
1515

1616
Masking is a common operation in raster processing. It is setting certain cells to the @ref:[NoData value](nodata-handling.md). This is usually done to remove low-quality observations from the raster processing. Another related use case is to @ref:["clip"](masking.md#clipping) a raster to a given polygon.
17+
18+
In this section we will demonstrate two common schemes for masking. In Sentinel 2, there is a separate classification raster that defines low quality areas. In Landsat 8, several quality factors are measured and the indications are packed into a single integer, which we have to unpack.
1719

18-
## Masking Example
20+
## Masking Sentinel 2
1921

2022
Let's demonstrate masking with a pair of bands of Sentinel-2 data. The measurement bands we will use, blue and green, have no defined NoData. They share quality information from a separate file called the scene classification (SCL), which delineates areas of missing data and probable clouds. For more information on this, see the [Sentinel-2 algorithm overview](https://earth.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm). Figure 3 tells us how to interpret the scene classification. For this example, we will exclude NoData, defective pixels, probable clouds, and cirrus clouds: values 0, 1, 8, 9, and 10.
2123

@@ -40,7 +42,7 @@ unmasked.printSchema()
4042
unmasked.select(rf_cell_type('blue'), rf_cell_type('scl')).distinct()
4143
```
4244

43-
## Define CellType for Masked Tile
45+
### Define CellType for Masked Tile
4446

4547
Because there is not a NoData already defined for the blue band, we must choose one. In this particular example, the minimum value is greater than zero, so we can use 0 as the NoData value. We will construct a new `CellType` object to represent this.
4648

@@ -59,7 +61,7 @@ We next convert the blue band to this cell type.
5961
converted = unmasked.select('scl', 'green', rf_convert_cell_type('blue', masked_blue_ct).alias('blue'))
6062
```
6163

62-
## Apply Mask from Quality Band
64+
### Apply Mask from Quality Band
6365

6466
Now we set cells of our `blue` column to NoData for all locations where the `scl` tile is in our set of undesirable values. This is the actual _masking_ operation.
6567

@@ -89,7 +91,7 @@ And the original SCL data. The bright yellow is a cloudy region in the original
8991
display(sample[1])
9092
```
9193

92-
## Transferring Mask
94+
### Transferring Mask
9395

9496
We can now apply the same mask from the blue column to the green column. Note here we have supressed the step of explicitly checking what a "safe" NoData value for the green band should be.
9597

@@ -98,9 +100,72 @@ masked.withColumn('green_masked', rf_mask(rf_convert_cell_type('green', masked_b
98100
.orderBy(-rf_no_data_cells('blue_masked'))
99101
```
100102

103+
## Masking Landsat 8
104+
105+
106+
We will work with the Landsat scene [here](https://landsat-pds.s3.us-west-2.amazonaws.com/c1/L8/153/075/LC08_L1TP_153075_20190718_20190731_01_T1/index.html). For simplicity, we will just use two of the seven 30m bands. The quality mask for all bands is all contained in the `BQA` band.
107+
108+
109+
```python, build_l8_df
110+
base_url = 'https://landsat-pds.s3.us-west-2.amazonaws.com/c1/L8/153/075/LC08_L1TP_153075_20190718_20190731_01_T1/LC08_L1TP_153075_20190718_20190731_01_T1_'
111+
data4 = base_url + 'B4.TIF'
112+
data2 = base_url + 'B2.TIF'
113+
mask = base_url + 'BQA.TIF'
114+
l8_df = spark.read.raster([[data4, data2, mask]]) \
115+
.withColumnRenamed('proj_raster_0', 'data') \
116+
.withColumnRenamed('proj_raster_1', 'data2') \
117+
.withColumnRenamed('proj_raster_2', 'mask')
118+
```
119+
120+
Masking is described [on the Landsat Missions page](https://www.usgs.gov/land-resources/nli/landsat/landsat-collection-1-level-1-quality-assessment-band). It is pretty dense. Focus for this data set is the Collection 1 Level-1 for Landsat 8.
121+
122+
There are several inter-related factors to consider. In this exercise we will mask away the following.
123+
124+
* Designated Fill = yes
125+
* Cloud = yes
126+
* Cloud Shadow Confidence = Medium or High
127+
* Cirrus Confidence = Medium or High
128+
129+
Note that you should consider your application and do your own exploratory analysis to determine the most appropriate mask!
130+
131+
According to the information on the Landsat site this translates to masking by bit values in the BQA according to the following table.
132+
133+
| Description | Value | Bits | Bit values |
134+
|-------------------- |---------- |------- |---------------- |
135+
| Designated fill | yes | 0 | 1 |
136+
| Cloud | yes | 4 | 1 |
137+
| Cloud shadow conf. | med / hi | 7-8 | 10, 11 (2, 3) |
138+
| Cirrus conf. | med / hi | 11-12 | 10, 11 (2, 3) |
139+
140+
141+
In this case, we will use the value of 0 as the NoData in the band data. Inspecting the associated [MTL txt file](https://landsat-pds.s3.us-west-2.amazonaws.com/c1/L8/153/075/LC08_L1TP_153075_20190718_20190731_01_T1/LC08_L1TP_153075_20190718_20190731_01_T1_MTL.txt) By inspection, we can discover that the minimum value in the band will be 1, thus allowing our use of 0 for NoData.
142+
143+
The code chunk below works through each of the rows in the table above. The first expression sets the cell type to have the selected NoData. The @ref:[`rf_mask_by_bit`](reference.md#rf-mask-by-bit) and @ref:[`rf_mask_by_bits`](reference.md#rf-mask-by-bits) functions extract the selected bit or bits from the `mask` cells and compare them to the provided values.
144+
145+
```python, build_l8_mask
146+
l8_df = l8_df.withColumn('data_masked', # set to cell type that has a nodata
147+
rf_convert_cell_type('data', CellType.uint16())) \
148+
.withColumn('data_masked', # fill yes
149+
rf_mask_by_bit('data_masked', 'mask', 0, 1)) \
150+
.withColumn('data_masked', # cloud yes
151+
rf_mask_by_bit('data_masked', 'mask', 4, 1)) \
152+
.withColumn('data_masked', # cloud shadow conf is medium or high
153+
rf_mask_by_bits('data_masked', 'mask', 7, 2, [2, 3])) \
154+
.withColumn('data_masked', # cloud shadow conf is medium or high
155+
rf_mask_by_bits('data_masked', 'mask', 11, 2, [2, 3])) \
156+
.withColumn('data2', # mask other data col against the other band
157+
rf_mask(rf_convert_cell_type('data2',CellType.uint16()), 'data_masked')) \
158+
.filter(rf_data_cells('data_masked') > 0) # remove any entirely ND rows
159+
160+
# Inspect a sample
161+
l8_df.select('data', 'mask', 'data_masked', 'data2', rf_extent('data_masked')) \
162+
.filter(rf_data_cells('data_masked') > 32000)
163+
```
164+
165+
101166
## Clipping
102167

103-
Clipping is the use of a polygon to determine the areas to mask in a raster. Typically the areas inside a polygon are retained and the cells outside are set to NoData. Given a geometry column on our DataFrame, we have to carry out three basic steps. First we have to ensure the vector geometry is correctly projected to the same @ref:[CRS](concepts.md#coordinate-reference-system-crs) as the raster. We'll continue with our example creating a simple polygon. Buffering a point will create an approximate circle.
168+
Clipping is the use of a polygon to determine the areas to mask in a raster. Typically the areas inside a polygon are retained and the cells outside are set to NoData. Given a geometry column on our DataFrame, we have to carry out three basic steps. First we have to ensure the vector geometry is correctly projected to the same @ref:[CRS](concepts.md#coordinate-reference-system-crs) as the raster. We'll continue with our Sentinel 2 example, creating a simple polygon. Buffering a point will create an approximate circle.
104169

105170

106171
```python, reproject_geom

0 commit comments

Comments
 (0)