Skip to content

Commit 9e07cae

Browse files
author
Eric Culbertson
committed
Add NoData sum section. Fixed abiguity in adding tiles with different NoData
Signed-off-by: Eric Culbertson <[email protected]>
1 parent c09ee49 commit 9e07cae

File tree

1 file changed

+72
-35
lines changed

1 file changed

+72
-35
lines changed

pyrasterframes/src/main/python/docs/nodata-handling.pymd

Lines changed: 72 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,37 @@ And the original SCL data. The bright yellow is a cloudy region in the original
146146
display(sample[1])
147147
```
148148

149+
## NoData and Local Arithmatic
150+
151+
Let's now explore how the presence of NoData affects local arithmatic operations. To demonstrate the behaviour, lets create two tiles. One tile will have values of 0 and 1, and the other will have values of just 0.
152+
153+
154+
```python
155+
tile_size = 100
156+
x = np.zeros((tile_size, tile_size), dtype='int16')
157+
x[:,tile_size//2:] = 1
158+
x = Tile(x)
159+
y = Tile(np.zeros((tile_size, tile_size), dtype='int16'))
160+
161+
rf = spark.createDataFrame([Row(x=x, y=y)])
162+
print('x')
163+
display(x)
164+
print('y')
165+
display(y)
166+
```
167+
168+
Now, let's create a new column from `x` with the value of 1 changed to NoData. Then, we will add this new column with NoData to the `y` column. As shown below, the result of the sum also has NoData (represented in white). In general for arithmatic operations, Data + NoData = NoData. To see more information about possible operations on Tile columns, see the @ref:[local map algebra](local-algebra.md) doc.
169+
170+
```python
171+
masked_rf = rf.withColumn('x_nd', rf_mask_by_value('x', 'x', lit(1)) )
172+
masked_rf = masked_rf.withColumn('x_nd_y_sum', rf_local_add('x_nd', 'y'))
173+
row = masked_rf.collect()[0]
174+
print('x with NoData')
175+
display(row.x_nd)
176+
print('x with NoData and y sum')
177+
display(row.x_nd_y_sum)
178+
```
179+
149180
## Changing a Tile's NoData Values
150181

151182
One way to mask a tile is to make a new tile with a user defined NoData value. We will explore this method below. First, lets create a rasterframe from a tile with values of 0, 1, 2, and 3. We will use numpy to create a 100x100 Tile with columns of 0, 1, 2, and 3.
@@ -191,36 +222,6 @@ And the tile named `tile_nd_2` has the values of 1 and 2 masked out. This is bec
191222
display(collected[0].tile_nd_2)
192223
```
193224

194-
## Nodata Values in Aggregation
195-
196-
Let's use the same tile as before to demonstrate how NoData values affect tile aggregations
197-
198-
```python
199-
tile_size = 100
200-
x = np.zeros((tile_size, tile_size), dtype='int16')
201-
for i in range(4):
202-
x[:, i*tile_size//4:(i+1)*tile_size//4] = i
203-
x = Tile(x)
204-
205-
rf = spark.createDataFrame([Row(tile=x)])
206-
display(x)
207-
```
208-
209-
First we create the two new masked tile columns as before. One with only the value of 1 masked, and the other with and values of 1 and 2 masked.
210-
211-
```python
212-
masked_rf = rf.withColumn('tile_nd_1',
213-
rf_convert_cell_type('tile', get_nodata_ct(1))) \
214-
.withColumn('tile_nd_2',
215-
rf_convert_cell_type('tile_nd_1', get_nodata_ct(2)))
216-
```
217-
218-
The results of `rf_tile_sum` vary on the tiles that were masked. This is because any cells with NoData values are ignored in the aggregation. Note that `tile_nd_2` has the lowest sum, since it has the fewest amount of data cells.
219-
220-
```python
221-
masked_rf.select(rf_tile_sum('tile'), rf_tile_sum('tile_nd_1'), rf_tile_sum('tile_nd_2')).show()
222-
```
223-
224225

225226
## Combining Tiles with Different Data Types
226227

@@ -242,21 +243,57 @@ When performing a local operation between tile columns with cell types `int` and
242243
rf_added = rf.withColumn('xy_sum', rf_local_add('y', 'x'))
243244
rf_added.select(rf_cell_type('xy_sum'), rf_cell_type('y'), rf_cell_type('x')).distinct().show()
244245
```
245-
Combining tile columns of different cell types gets a little trickier when user defined NoData cell types are involved. Let's create 3 tile columns: one without a defined NoData value, one with a NoData value of 1, and one with a NoData value of 2.
246246

247+
Combining tile columns of different cell types gets a little trickier when user defined NoData cell types are involved. Let's create 2 tile columns: one with a NoData value of 1, and one with a NoData value of 2.
247248

248249
```python
249250
x = Tile((np.ones((100, 100))*3).astype('int16'))
250-
251251
rf = spark.createDataFrame([Row(x=x)])
252252

253253
rf_nd = rf.withColumn('x_nd_1', rf_convert_cell_type('x', get_nodata_ct(1))) \
254254
.withColumn('x_nd_2', rf_convert_cell_type('x', get_nodata_ct(2)))
255255
```
256256

257-
Let's try adding the tile column without a defined NoData value to a tile column with a defined NoData value. When there is an inconsistent NoData value in the two columns, the NoData value of the right-hand side of the sum is kept. In this case, this means the result has no defined NoData value.
257+
Let's try adding the tile columns with different NoData values. When there is an inconsistent NoData value in the two columns, the NoData value of the right-hand side of the sum is kept. In this case, this means the result has a NoData value of 1.
258+
259+
```python
260+
rf_nd_sum = rf_nd.withColumn('x_nd_sum', rf_local_add('x_nd_2', 'x_nd_1'))
261+
rf_nd_sum.select(rf_cell_type('x_nd_sum')).distinct().show()
262+
```
263+
264+
Reversing the order of the sum changes the NoData value of the resulting column to 2.
265+
266+
```python
267+
rf_nd_sum = rf_nd.withColumn('x_nd_sum', rf_local_add('x_nd_1', 'x_nd_2'))
268+
rf_nd_sum.select(rf_cell_type('x_nd_sum')).distinct().show()
269+
```
270+
271+
## Nodata Values in Aggregation
272+
273+
Let's use the same tile as before to demonstrate how NoData values affect tile aggregations
274+
275+
```python
276+
tile_size = 100
277+
x = np.zeros((tile_size, tile_size), dtype='int16')
278+
for i in range(4):
279+
x[:, i*tile_size//4:(i+1)*tile_size//4] = i
280+
x = Tile(x)
281+
282+
rf = spark.createDataFrame([Row(tile=x)])
283+
display(x)
284+
```
285+
286+
First we create the two new masked tile columns as before. One with only the value of 1 masked, and the other with and values of 1 and 2 masked.
287+
288+
```python
289+
masked_rf = rf.withColumn('tile_nd_1',
290+
rf_convert_cell_type('tile', get_nodata_ct(1))) \
291+
.withColumn('tile_nd_2',
292+
rf_convert_cell_type('tile_nd_1', get_nodata_ct(2)))
293+
```
294+
295+
The results of `rf_tile_sum` vary on the tiles that were masked. This is because any cells with NoData values are ignored in the aggregation. Note that `tile_nd_2` has the lowest sum, since it has the fewest amount of data cells.
258296

259297
```python
260-
rf_nd_sum = rf_nd.withColumn('x_nd_sum', rf_local_add('x_nd_1', 'x'))
261-
rf_nd_sum.select(rf_cell_type('x_nd_sum'), rf_cell_type('x'), rf_cell_type('x_nd_1')).distinct().show()
298+
masked_rf.select(rf_tile_sum('tile'), rf_tile_sum('tile_nd_1'), rf_tile_sum('tile_nd_2')).show()
262299
```

0 commit comments

Comments
 (0)