You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from pyrasterframes.utils import create_rf_spark_session
8
+
from pyrasterframes.rasterfunctions import *
9
+
from pyspark.sql.functions import *
10
+
spark = create_rf_spark_session()
11
+
12
+
```
13
+
14
+
## Description
15
+
16
+
A common operation for raster data is reprojecting or warping the data to a different @ref:[CRS][CRS] with a specific @link:[transform](https://gdal.org/user/raster_data_model.html#affine-geotransform) { open=new }. In many use cases, the particulars of the warp operation depend on another set of raster data. Furthermore, the warp is done to put both sets of raster data to a common set of grid to enable manipulation of the datasets together.
17
+
18
+
In RasterFrames, you can perform a **Raster Join** on two DataFrames containing raster data.
19
+
The operation will perform a _spatial join_ based on the [CRS][CRS] and [extent][extent] data in each DataFrame. By default it is a left join and uses an intersection operator.
20
+
For each candidate row, all _tile_ columns on the right hand side are warped to match the left hand side's [CRS][CRS], [extent][extent], and dimensions. Warping relies on GeoTrellis library code and uses nearest neighbor resampling method.
21
+
The operation is also an aggregate, with multiple intersecting right-hand side tiles `merge`d into the result. There is no guarantee about the ordering of tiles used to select cell values in the case of overlapping tiles.
22
+
When using the @ref:[`raster` DataSource](raster-join.md) you will automatically get the @ref:[CRS][CRS] and @ref:[extent][extent] information needed to do this operation.
23
+
24
+
25
+
## Example Code
26
+
27
+
Because the raster join is a distributed spatial join, indexing of both DataFrames using the [spatial index][spatial-index] is crucial for performance.
28
+
29
+
```python, example_raster_join
30
+
# Southern Mozambique December 29, 2016
31
+
modis = spark.read.raster('s3://astraea-opendata/MCD43A4.006/21/11/2016297/MCD43A4.A2016297.h21v11.006.2016306075821_B01.TIF',
* `left_extent` - the column on the left-hand DataFrame giving the [extent][extent] of the tile columns
52
+
* `left_crs` - the column on the left-hand DataFrame giving the [CRS][CRS] of the tile columns
53
+
* `right_extent` - the column on the right-hand DataFrame giving the [extent][extent] of the tile columns
54
+
* `right_crs` - the column on the right-hand DataFrame giving the [CRS][CRS] of the tile columns
55
+
* `join_exprs` - a single column expression as would be used in the [`on` parameter of `join`](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join)
56
+
57
+
58
+
Note that the `join_exprs` will override the join behavior described above. By default the expression is equivalent to:
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-read.pymd
+7-1Lines changed: 7 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -217,13 +217,19 @@ In the initial examples on this page, you may have noticed that the realized (no
217
217
218
218
## Spatial Indexing and Partitioning
219
219
220
-
It's often desirable to take extra steps in ensuring your data is effectively distributed over your computing resources. One way of doing that is using something called a ["space filling curve"](https://en.wikipedia.org/wiki/Space-filling_curve), which turns an N-dimensional value into a one dimensional value, with properties that favor keeping entities near each other in N-space near each other in index space. To have RasterFrames add a spatial index based partitioning on a raster reads, use the `spatial_index_partitions` parameter:
220
+
It's often desirable to take extra steps in ensuring your data is effectively distributed over your computing resources. One way of doing that is using something called a ["space filling curve"](https://en.wikipedia.org/wiki/Space-filling_curve), which turns an N-dimensional value into a one dimensional value, with properties that favor keeping entities near each other in N-space near each other in index space. In particular RasterFrames support space-filling curves mapping the geographic location of _tiles_ to a one-dimensional index space called [`xz2`](https://www.geomesa.org/documentation/user/datastores/index_overview.html). To have RasterFrames add a spatial index based partitioning on a raster reads, use the `spatial_index_partitions` parameter. By default it will use the same number of partitions as configured in [`spark.sql.shuffle.partitions`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#other-configuration-options).
0 commit comments