Skip to content

Commit c14f60d

Browse files
committed
Started work on SQL document section.
1 parent 767624b commit c14f60d

File tree

1 file changed

+99
-0
lines changed

1 file changed

+99
-0
lines changed
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# API Languages
2+
3+
One of the great powers of RasterFrames, afforded by Spark SQL, is the ability to express computation in multiple programming languages. This manual is centered around Python because that's the most common language used in data science and GIS analytics. However, Scala (the implementation language of RasterFrames) and SQL are also fully supported. Examples in Python can be mechanically translated into the other two languages without much difficulty once the naming conventions are understood. In the sections below we will show the same example program (computing average NDVI per month for a single tile in Tanzania).
4+
5+
```python, imports, echo=False
6+
from pyspark.sql.functions import month, dayofmonth, year
7+
from pyrasterframes.utils import create_rf_spark_session
8+
from pyrasterframes.rasterfunctions import *
9+
import pyrasterframes.rf_ipython
10+
import pandas as pd
11+
import os
12+
```
13+
14+
## Python
15+
16+
Step 1: Load the catalog
17+
18+
```python, step_1_python
19+
modis = spark.read.format('aws-pds-modis-catalog').load()
20+
```
21+
Step 2: Down-select data by month
22+
23+
```python, step_2_python
24+
red_nir_monthly_2017 = modis \
25+
.select('granule_id', month('acquisition_date').alias('month'), col('B01').alias('red'), col('B02').alias('nir')) \
26+
.where((year('acquisition_date') == 2017) & (dayofmonth('acquisition_date') == 15) & (col('granule_id') == 'h21v09'))
27+
```
28+
29+
Step 3: Read tiles
30+
31+
```python, step_3_python
32+
red_nir_tiles_monthly_2017 = spark.read.raster(catalog=red_nir_monthly_2017, catalog_col_names=['red', 'nir'])
33+
```
34+
35+
Step 4: Compute aggregates
36+
37+
```python, step_4_python
38+
result = red_nir_tiles_monthly_2017 \
39+
.groupBy('month') \
40+
.agg(first('month'), rf_agg_stats(rf_normalized_difference(col('nir'), col('red')).alias('ndvi_stats'))) \
41+
.where(st_intersects(st_reproject(rf_geometry(col('red')), rf_crs(col('red')), rf_mk_crs('EPSG:4326')), st_makePoint(34.870605, -4.729727))) \
42+
.orderBy(col('month'))
43+
result.show()
44+
```
45+
46+
## SQL
47+
48+
For convenience we're going to evaluate SQL from the Python environment. The SQL fragments should work in the `spark-sql` shell just the same.
49+
50+
```python, sql_setup
51+
spark = create_rf_spark_session()
52+
def sql(stmt):
53+
return spark.sql(stmt)
54+
```
55+
56+
Step 1: Load the catalog
57+
58+
```python, step_1_sql
59+
sql("CREATE OR REPLACE TEMPORARY VIEW modis USING `aws-pds-modis-catalog`")
60+
```
61+
62+
Step 2: Down-select data by month
63+
64+
```python, step_2_sql
65+
sql("""
66+
CREATE OR REPLACE TEMPORARY VIEW red_nir_monthly_2017 AS
67+
SELECT granule_id, month(acquisition_date) as month, B01 as red, B02 as nir
68+
FROM modis
69+
WHERE year(acquisition_date) = 2017 AND day(acquisition_date) = 15 AND granule_id = 'h21v09'
70+
""")
71+
sql('DESCRIBE red_nir_monthly_2017').show()
72+
```
73+
74+
Step 3: Read tiles
75+
76+
```python, step_3_sql
77+
sql("""
78+
CREATE OR REPLACE TEMPORARY VIEW red_nir_tiles_monthly_2017
79+
USING raster
80+
OPTIONS (catalogTable='red_nir_monthly_2017', catalogColumns='red,nir')
81+
""")
82+
```
83+
84+
Step 4: Compute aggregates
85+
86+
```python, step_4_sql
87+
sql("""
88+
SELECT month, ndvi_stats.* FROM (
89+
SELECT month, rf_agg_stats(rf_normalized_difference(nir, red)) as ndvi_stats
90+
FROM red_nir_tiles_monthly_2017
91+
WHERE st_intersects(st_reproject(rf_geometry(red), rf_crs(red), 'EPSG:4326'), st_makePoint(34.870605, -4.729727))
92+
GROUP BY month
93+
ORDER BY month
94+
)
95+
""").show()
96+
```
97+
98+
## Scala
99+

0 commit comments

Comments
 (0)