You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/aggregation.pymd
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -35,14 +35,14 @@ We use the @ref:[`rf_tile_mean`](reference.md#rf-tile-mean) function to compute
35
35
36
36
```python, tile_mean
37
37
means = rf.select(F.col('id'), rf_tile_mean(F.col('tile')))
38
-
display(means)
38
+
means
39
39
```
40
40
41
41
We use the @ref:[`rf_agg_mean`](reference.md#rf-agg-mean) function to compute the DataFrame aggregate, which averages 25 values of 1.0 and 25 values of 3.0, across the fifty cells in two rows. Note that only a single row is returned since the average is computed over the full DataFrame.
42
42
43
43
```python, agg_mean
44
44
mean = rf.agg(rf_agg_mean(F.col('tile')))
45
-
display(mean)
45
+
mean
46
46
```
47
47
48
48
We use the @ref:[`rf_agg_local_mean`](reference.md#rf-agg-local-mean) function to compute the element-wise local aggregate mean across the two rows. For this aggregation, we are computing the mean of one value of 1.0 and one value of 3.0 to arrive at the element-wise mean, but doing so twenty-five times, one for each position in the _tile_.
@@ -62,7 +62,7 @@ We can also count the total number of data and NoData cells over all the _tiles_
The @ref:[`rf_agg_stats`](reference.md#rf-agg-stats) function aggregates over all of the _tiles_ in a DataFrame and returns a statistical summary of all cell values as shown below.
The @ref:[`rf_agg_local_stats`](reference.md#rf-agg-local-stats) function computes the element-wise local aggregate statistical summary as shown below. The DataFrame used in the previous two code blocks has unequal _tile_ dimensions, so a different DataFrame is used in this code block to avoid a runtime error.
This example is extended in the [getting started Jupyter notebook](https://nbviewer.jupyter.org/github/locationtech/rasterframes/blob/develop/rf-notebook/src/main/notebooks/Getting%20Started.ipynb).
Drawing on @ref:[local map algebra](local-algebra.md) techniques, we will create new _tile_ columns that are indicators of unwanted pixels, as defined above. Since the mask column is an integer type, the addition is equivalent to a logical or, so the boolean true values are 1.
Because there is not a NoData already defined, we will choose one. In this particular example, the minimum value is greater than zero, so we can use 0 as the NoData value.
When performing a local operation between _tile_ columns with cell types `int` and `float`, the resulting _tile_ cell type will be `float`. In local algebra over two _tiles_ of different "sized" cell types, the resulting cell type will be the larger of the two input _tiles'_ cell types.
Combining _tile_ columns of different cell types gets a little trickier when user defined NoData cell types are involved. Let's create two _tile_ columns: one with a NoData value of 1, and one with a NoData value of 2 (using our previously defined `get_nodata_ct` function).
@@ -285,15 +285,15 @@ Let's try adding the _tile_ columns with different NoData values. When there is
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-read.pymd
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -36,15 +36,15 @@ parts = rf.select(
36
36
rf_extent("proj_raster").alias("extent"),
37
37
rf_tile("proj_raster").alias("tile")
38
38
)
39
-
display(parts)
39
+
parts
40
40
```
41
41
42
42
43
43
You can also see that the single raster has been broken out into many arbitrary non-overlapping regions. Doing so takes advantage of parallel in-memory reads from the cloud hosted data source and allows Spark to work on manageable amounts of data per task. The following code fragment shows us how many subtiles were created from a single source image.
44
44
45
45
```python, count_by_uri
46
46
counts = rf.groupby(rf.proj_raster_path).count()
47
-
display(counts)
47
+
counts
48
48
```
49
49
50
50
Let's select a single _tile_ and view it. The _tile_ preview image as well as the string representation provide some basic information about the _tile_: its dimensions as numbers of columns and rows and the cell type, or data type of all the cells in the _tile_. For more about cell types, refer to @ref:[this discussion](nodata-handling.md#cell-types).
MODIS data products are delivered on a regular, consistent grid, making identification of a specific area over time easy using [`(h,v)`](https://modis-land.gsfc.nasa.gov/MODLAND_grid.html) grid coordinates (see below).
@@ -117,7 +117,7 @@ For example, MODIS data right above the equator is all grid coordinates with `v0
Now that we have prepared our catalog, we simply pass the DataFrame or CSV string to the `raster` DataSource to load the imagery. The `catalog_col_names` parameter gives the columns that contain the URI's to be read.
@@ -134,7 +134,7 @@ Observe the schema of the resulting DataFrame has a projected raster struct for
In the initial examples on this page, you may have noticed that the realized (non-lazy) _tiles_ are shown, but we did not change `lazy_tiles`. Instead, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized _tile_ from the lazy representation.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/supervised-learning.pymd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -114,7 +114,7 @@ df_mask.printSchema()
114
114
115
115
## Create ML Pipeline
116
116
117
-
We import various Spark components that we need to construct our [Pipeline](https://spark.apache.org/docs/latest/ml-pipeline.html). These are the objects that will work in sequence to conduct the data preparation and modeling.
117
+
We import various Spark components that we need to construct our [`Pipeline`](https://spark.apache.org/docs/latest/ml-pipeline.html). These are the objects that will work in sequence to conduct the data preparation and modeling.
0 commit comments