Skip to content

Commit 631d1b9

Browse files
committed
Misc tweaks to address PR feedback.
1 parent 8d91f0f commit 631d1b9

File tree

12 files changed

+58
-41
lines changed

12 files changed

+58
-41
lines changed

core/src/main/scala/org/locationtech/rasterframes/util/package.scala

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -204,14 +204,19 @@ package object util {
204204
val header = cols.map(_.name).mkString("| ", " | ", " |") + "\n" + ("|---" * cols.length) + "|\n"
205205
val stringifiers = stringifyRowElements(cols, truncate)
206206
val cat = concat_ws(" | ", stringifiers: _*)
207-
val body = df
208-
.select(cat).limit(numRows)
207+
val rows = df
208+
.select(cat)
209+
.limit(numRows)
209210
.as[String]
210211
.collect()
211212
.map(_.replaceAll("\\[", "\\\\["))
212213
.map(_.replace('\n', '↩'))
214+
215+
val body = rows
213216
.mkString("| ", " |\n| ", " |")
214-
header + body
217+
218+
val caption = if (rows.length >= numRows) s"\n_Showing only top $numRows rows_.\n\n" else ""
219+
caption + header + body
215220
}
216221

217222
def toHTML(numRows: Int = 5, truncate: Boolean = false): String = {
@@ -220,13 +225,17 @@ package object util {
220225
val header = "<thead>\n" + cols.map(_.name).mkString("<tr><th>", "</th><th>", "</th></tr>\n") + "</thead>\n"
221226
val stringifiers = stringifyRowElements(cols, truncate)
222227
val cat = concat_ws("</td><td>", stringifiers: _*)
223-
val body = df
228+
val rows = df
224229
.select(cat).limit(numRows)
225230
.as[String]
226231
.collect()
232+
233+
val body = rows
227234
.mkString("<tr><td>", "</td></tr>\n<tr><td>", "</td></tr>\n")
228235

229-
"<table>\n" + header + "<tbody>\n" + body + "</tbody>\n" + "</table>"
236+
val caption = if (rows.length >= numRows) s"<caption>Showing only top $numRows rows</caption>\n" else ""
237+
238+
"<table>\n" + caption + header + "<tbody>\n" + body + "</tbody>\n" + "</table>"
230239
}
231240
}
232241

docs/src/main/paradox/_template/page.st

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@
3333
.md-clear { clear: both; }
3434
table { font-size: 80%; }
3535
code { font-size: 0.75em !important; }
36+
table a {
37+
word-break: break-all;
38+
}
3639
</style>
3740
</head>
3841

pyrasterframes/src/main/python/docs/aggregation.pymd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,14 @@ We use the @ref:[`rf_tile_mean`](reference.md#rf-tile-mean) function to compute
3535

3636
```python, tile_mean
3737
means = rf.select(F.col('id'), rf_tile_mean(F.col('tile')))
38-
display(means)
38+
means
3939
```
4040

4141
We use the @ref:[`rf_agg_mean`](reference.md#rf-agg-mean) function to compute the DataFrame aggregate, which averages 25 values of 1.0 and 25 values of 3.0, across the fifty cells in two rows. Note that only a single row is returned since the average is computed over the full DataFrame.
4242

4343
```python, agg_mean
4444
mean = rf.agg(rf_agg_mean(F.col('tile')))
45-
display(mean)
45+
mean
4646
```
4747

4848
We use the @ref:[`rf_agg_local_mean`](reference.md#rf-agg-local-mean) function to compute the element-wise local aggregate mean across the two rows. For this aggregation, we are computing the mean of one value of 1.0 and one value of 3.0 to arrive at the element-wise mean, but doing so twenty-five times, one for each position in the _tile_.
@@ -62,7 +62,7 @@ We can also count the total number of data and NoData cells over all the _tiles_
6262
```python, cell_counts
6363
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/MCD43A4.006/11/05/2018233/MCD43A4.A2018233.h11v05.006.2018242035530_B02.TIF')
6464
stats = rf.agg(rf_agg_data_cells('proj_raster'), rf_agg_no_data_cells('proj_raster'))
65-
display(stats)
65+
stats
6666
```
6767

6868
## Statistical Summaries
@@ -79,15 +79,15 @@ stats.printSchema()
7979
```
8080

8181
```python, show_stats
82-
display(stats.select('stats.min', 'stats.max', 'stats.mean', 'stats.variance'))
82+
stats.select('stats.min', 'stats.max', 'stats.mean', 'stats.variance')
8383
```
8484

8585
The @ref:[`rf_agg_stats`](reference.md#rf-agg-stats) function aggregates over all of the _tiles_ in a DataFrame and returns a statistical summary of all cell values as shown below.
8686

8787
```python, agg_stats
8888
stats = rf.agg(rf_agg_stats('proj_raster').alias('stats')) \
8989
.select('stats.min', 'stats.max', 'stats.mean', 'stats.variance')
90-
display(stats)
90+
stats
9191
```
9292

9393
The @ref:[`rf_agg_local_stats`](reference.md#rf-agg-local-stats) function computes the element-wise local aggregate statistical summary as shown below. The DataFrame used in the previous two code blocks has unequal _tile_ dimensions, so a different DataFrame is used in this code block to avoid a runtime error.

pyrasterframes/src/main/python/docs/getting-started.pymd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ df = spark.read.raster('https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/201
4444
# Add 3 element-wise, show some rows of the DataFrame
4545
sample = df.withColumn('added', rf_local_add(df.proj_raster, lit(3))) \
4646
.select(rf_crs('added'), rf_extent('added'), rf_tile('added'))
47-
display(sample)
47+
sample
4848
```
4949

5050
This example is extended in the [getting started Jupyter notebook](https://nbviewer.jupyter.org/github/locationtech/rasterframes/blob/develop/rf-notebook/src/main/notebooks/Getting%20Started.ipynb).

pyrasterframes/src/main/python/docs/languages.pymd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ result = red_nir_tiles_monthly_2017 \
6060
.agg(rf_agg_stats(rf_normalized_difference(col('nir'), col('red'))).alias('ndvi_stats')) \
6161
.orderBy(col('month')) \
6262
.select('month', 'ndvi_stats.*')
63-
display(result)
63+
result
6464
```
6565

6666
## SQL
@@ -87,7 +87,7 @@ SELECT granule_id, month(acquisition_date) as month, B01 as red, B02 as nir
8787
FROM modis
8888
WHERE year(acquisition_date) = 2017 AND day(acquisition_date) = 15 AND granule_id = 'h21v09'
8989
""")
90-
display(sql('DESCRIBE red_nir_monthly_2017'))
90+
sql('DESCRIBE red_nir_monthly_2017')
9191
```
9292

9393
### Step 3: Read tiles
@@ -116,7 +116,7 @@ SELECT month, ndvi_stats.* FROM (
116116
ORDER BY month
117117
)
118118
""")
119-
display(grouped)
119+
grouped
120120
```
121121

122122
## Scala

pyrasterframes/src/main/python/docs/nodata-handling.pymd

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ We can also inspect the cell type of a given _tile_ or `proj_raster` column.
4141
```python, ct_from_sen
4242
cell_types = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif') \
4343
.select(rf_cell_type('proj_raster')).distinct()
44-
display(cell_types)
44+
cell_types
4545
```
4646

4747
### Understanding Cell Types and NoData
@@ -96,7 +96,7 @@ unmasked.printSchema()
9696

9797
```python, show_cell_types
9898
cell_types = unmasked.select(rf_cell_type('blue'), rf_cell_type('scl')).distinct()
99-
display(cell_types)
99+
cell_types
100100
```
101101

102102
Drawing on @ref:[local map algebra](local-algebra.md) techniques, we will create new _tile_ columns that are indicators of unwanted pixels, as defined above. Since the mask column is an integer type, the addition is equivalent to a logical or, so the boolean true values are 1.
@@ -116,14 +116,14 @@ one_mask = mask_part.withColumn('mask', rf_local_add('nodata', 'defect')) \
116116
.withColumn('mask', rf_local_add('mask', 'cirrus'))
117117

118118
cell_types = one_mask.select(rf_cell_type('mask')).distinct()
119-
display(cell_types)
119+
cell_types
120120
```
121121

122122
Because there is not a NoData already defined, we will choose one. In this particular example, the minimum value is greater than zero, so we can use 0 as the NoData value.
123123

124124
```python, pick_nd
125125
blue_min = one_mask.agg(rf_agg_stats('blue').min.alias('blue_min'))
126-
display(blue_min)
126+
blue_min
127127
```
128128

129129
We can now construct the cell type string for our blue band's cell type, designating 0 as NoData.
@@ -147,7 +147,7 @@ We can verify that the number of NoData cells in the resulting `blue_masked` col
147147

148148
```python, show_masked
149149
counts = masked.select(rf_no_data_cells('blue_masked'), rf_tile_sum('mask'))
150-
display(counts)
150+
counts
151151
```
152152

153153
It's also nice to view a sample. The white regions are areas of NoData.
@@ -258,7 +258,7 @@ y = Tile((np.ones((100, 100))*3), CellType.int32())
258258
rf = spark.createDataFrame([Row(x=x, y=y)])
259259

260260
cell_types = rf.select(rf_cell_type('x'), rf_cell_type('y')).distinct()
261-
display(cell_types)
261+
cell_types
262262
```
263263

264264
When performing a local operation between _tile_ columns with cell types `int` and `float`, the resulting _tile_ cell type will be `float`. In local algebra over two _tiles_ of different "sized" cell types, the resulting cell type will be the larger of the two input _tiles'_ cell types.
@@ -269,7 +269,7 @@ sums = rf.select(
269269
rf_cell_type('y'),
270270
rf_cell_type(rf_local_add('x', 'y')).alias('xy_sum'),
271271
)
272-
display(sums)
272+
sums
273273
```
274274

275275
Combining _tile_ columns of different cell types gets a little trickier when user defined NoData cell types are involved. Let's create two _tile_ columns: one with a NoData value of 1, and one with a NoData value of 2 (using our previously defined `get_nodata_ct` function).
@@ -285,15 +285,15 @@ Let's try adding the _tile_ columns with different NoData values. When there is
285285
```python, show_3
286286
rf_nd_sum = rf_nd.withColumn('x_nd_sum', rf_local_add('x_nd_2', 'x_nd_1'))
287287
cell_types = rf_nd_sum.select(rf_cell_type('x_nd_sum')).distinct()
288-
display(cell_types)
288+
cell_types
289289
```
290290

291291
Reversing the order of the sum changes the NoData value of the resulting column to 2.
292292

293293
```python, show_4
294294
rf_nd_sum = rf_nd.withColumn('x_nd_sum', rf_local_add('x_nd_1', 'x_nd_2'))
295295
cell_types = rf_nd_sum.select(rf_cell_type('x_nd_sum')).distinct()
296-
display(cell_types)
296+
cell_types
297297
```
298298

299299
## NoData Values in Aggregation
@@ -324,5 +324,5 @@ The results of `rf_tile_sum` vary on the _tiles_ that were masked. This is becau
324324

325325
```python, show_5
326326
sums = masked_rf.select(rf_tile_sum('tile'), rf_tile_sum('tile_nd_1'), rf_tile_sum('tile_nd_2'))
327-
display(sums)
327+
sums
328328
```

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,15 @@ parts = rf.select(
3636
rf_extent("proj_raster").alias("extent"),
3737
rf_tile("proj_raster").alias("tile")
3838
)
39-
display(parts)
39+
parts
4040
```
4141

4242

4343
You can also see that the single raster has been broken out into many arbitrary non-overlapping regions. Doing so takes advantage of parallel in-memory reads from the cloud hosted data source and allows Spark to work on manageable amounts of data per task. The following code fragment shows us how many subtiles were created from a single source image.
4444

4545
```python, count_by_uri
4646
counts = rf.groupby(rf.proj_raster_path).count()
47-
display(counts)
47+
counts
4848
```
4949

5050
Let's select a single _tile_ and view it. The _tile_ preview image as well as the string representation provide some basic information about the _tile_: its dimensions as numbers of columns and rows and the cell type, or data type of all the cells in the _tile_. For more about cell types, refer to @ref:[this discussion](nodata-handling.md#cell-types).
@@ -106,7 +106,7 @@ print("Available scenes: ", modis_catalog.count())
106106
```
107107

108108
```python, show_catalog
109-
display(modis_catalog)
109+
modis_catalog
110110
```
111111

112112
MODIS data products are delivered on a regular, consistent grid, making identification of a specific area over time easy using [`(h,v)`](https://modis-land.gsfc.nasa.gov/MODLAND_grid.html) grid coordinates (see below).
@@ -117,7 +117,7 @@ For example, MODIS data right above the equator is all grid coordinates with `v0
117117

118118
```python, catalog_filtering
119119
equator = modis_catalog.where(F.col('gid').like('%v07%'))
120-
display(equator.select('date', 'gid'))
120+
equator.select('date', 'gid')
121121
```
122122

123123
Now that we have prepared our catalog, we simply pass the DataFrame or CSV string to the `raster` DataSource to load the imagery. The `catalog_col_names` parameter gives the columns that contain the URI's to be read.
@@ -134,7 +134,7 @@ Observe the schema of the resulting DataFrame has a projected raster struct for
134134

135135
```python, cat_read_sample
136136
sample = rf.select('gid', rf_extent('red'), rf_extent('nir'), rf_tile('red'), rf_tile('nir'))
137-
display(sample.limit(3))
137+
sample.limit(3)
138138
```
139139

140140
## Lazy Raster Reading
@@ -145,13 +145,13 @@ Consider the following two reads of the same data source. In the first, the lazy
145145

146146
```python, lazy_demo_1
147147
uri = 'https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif'
148-
lazy = spark.read.raster(uri).select('proj_raster.tile').limit(1)
149-
display(lazy)
148+
lazy = spark.read.raster(uri).select('proj_raster.tile')
149+
lazy
150150
```
151151

152152
```python, lazy_demo_2
153153
non_lazy = spark.read.raster(uri, lazy_tiles=False).select('proj_raster.tile')
154-
display(non_lazy)
154+
non_lazy
155155
```
156156

157157
In the initial examples on this page, you may have noticed that the realized (non-lazy) _tiles_ are shown, but we did not change `lazy_tiles`. Instead, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized _tile_ from the lazy representation.

pyrasterframes/src/main/python/docs/supervised-learning.pymd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ df_mask.printSchema()
114114

115115
## Create ML Pipeline
116116

117-
We import various Spark components that we need to construct our [Pipeline](https://spark.apache.org/docs/latest/ml-pipeline.html). These are the objects that will work in sequence to conduct the data preparation and modeling.
117+
We import various Spark components that we need to construct our [`Pipeline`](https://spark.apache.org/docs/latest/ml-pipeline.html). These are the objects that will work in sequence to conduct the data preparation and modeling.
118118

119119
```python, imports, echo=True
120120
from pyrasterframes import TileExploder
@@ -186,7 +186,7 @@ cnf_mtrx = prediction_df.groupBy(classifier.getPredictionCol()) \
186186
.pivot(classifier.getLabelCol()) \
187187
.count() \
188188
.sort(classifier.getPredictionCol())
189-
display(cnf_mtrx)
189+
cnf_mtrx
190190
```
191191

192192
## Visualize Prediction

pyrasterframes/src/main/python/docs/time-series.pymd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ m.save(temp_folium)
5454
with open(temp_folium, 'rb') as f:
5555
b64 = base64.b64encode(f.read())
5656
with open('docs/static/cuya.md', 'w') as md:
57-
md.write('<iframe src="data:text/html;charset=utf-8;base64,{}" allowfullscreen="" webkitallowfullscreen="" mozallowfullscreen="" style="position:relative;width:100%;height:500"></iframe>'.format(b64.decode('utf-8')))
57+
md.write('<iframe src="data:text/html;charset=utf-8;base64,{}" allowfullscreen="" webkitallowfullscreen="" mozallowfullscreen="" style="position:relative;width:100%;height:500px"></iframe>'.format(b64.decode('utf-8')))
5858
# seems that the height is not correct?
5959
```
6060

pyrasterframes/src/main/python/docs/unsupervised-learning.pymd

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ For this problem, we will use the K-means clustering algorithm and configure our
7474
kmeans = KMeans().setK(5).setFeaturesCol('features')
7575
```
7676

77-
We can combine the above stages into a single _pipeline_.
77+
We can combine the above stages into a single [`Pipeline`](https://spark.apache.org/docs/latest/ml-pipeline.html).
7878

7979
```python, pipeline
8080
pipeline = Pipeline().setStages([exploder, assembler, kmeans])
@@ -92,7 +92,12 @@ We can use the `transform` function to score the training data in the fitted _pi
9292

9393
```python, transform
9494
clustered = model.transform(df)
95-
display(clustered)
95+
```
96+
97+
Now let's take a look at some sample output.
98+
99+
```python, view_predictions
100+
clustered.select('prediction', 'extent', 'column_index', 'row_index', 'features')
96101
```
97102

98103
If we want to inspect the model statistics, the SparkML API requires us to go through this unfortunate contortion to access the clustering results:
@@ -126,7 +131,7 @@ retiled.printSchema()
126131
```
127132

128133
```python, display
129-
display(retiled)
134+
retiled
130135
```
131136

132137
The resulting output is shown below.

0 commit comments

Comments
 (0)