Skip to content

Commit 6ce7df2

Browse files
committed
address review
1 parent 0fb9dc9 commit 6ce7df2

File tree

2 files changed

+66
-57
lines changed

2 files changed

+66
-57
lines changed

content/tutorials/get_started/fast_track_grass_and_python.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -101,9 +101,9 @@ i.e., there's an existing GRASS project.
101101
Be sure you also have the following Python libraries installed in your
102102
environment: `folium` or `ipyleaflet`, `numpy`, `seaborn`, `matplotlib`, `pandas`.
103103

104-
The first thing we need to do is to
105-
*import GRASS python packages*. In order to do so, we need to
106-
*add GRASS python package to PATH*. Let's see how we do that.
104+
The first thing we need to do (for some environments) is to
105+
*import GRASS Python packages*. In order to do so, we need to
106+
*add GRASS Python package to path*. Let's see how we do that.
107107

108108
```{python}
109109
# import standard Python packages
@@ -114,14 +114,14 @@ from pathlib import Path
114114
```
115115

116116
```{python}
117-
# check where GRASS python packages are and add them to PATH
117+
# check where GRASS Python packages are and add them to path
118118
sys.path.append(
119119
subprocess.check_output(["grass", "--config", "python_path"], text=True).strip()
120120
)
121121
```
122122

123123
```{python}
124-
# import GRASS python packages
124+
# import GRASS Python packages
125125
import grass.script as gs
126126
import grass.jupyter as gj
127127
```

content/tutorials/parallelization/SIMWE_parallelization.qmd

Lines changed: 61 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ jupyter: python3
2626

2727
In this tutorial, we will model overland water flow as well as erosion and
2828
deposition patterns along the Yadkin River in North Carolina, USA.
29+
Subwatersheds that have larger erosion values are potentially
30+
the source of increased sediment loads downstream,
31+
contributing to water quality degradation and habitat disruption.
2932

3033
To simulate these processes, we will use the SIMWE model, with simplified
3134
runoff inputs derived from the
@@ -70,19 +73,20 @@ and [Get started with GRASS in Google Colab](../get_started/grass_gis_in_google_
7073
# Setup
7174

7275
Start with importing Python packages.
73-
To import the grass package, you need to tell Python where the GRASS Python package is.
76+
To import the _grass_ package, you need to tell Python where the GRASS Python package is
77+
(can be skipped for some environment).
7478

7579
```{python}
7680
# import standard Python packages
7781
import os
7882
import sys
7983
import subprocess
8084
81-
# check where GRASS python packages are and add them to PATH
85+
# check where GRASS Python packages are and add them to path
8286
sys.path.append(
8387
subprocess.check_output(["grass", "--config", "python_path"], text=True).strip()
8488
)
85-
# import GRASS GIS python packages
89+
# import GRASS Python packages
8690
import grass.script as gs
8791
import grass.jupyter as gj
8892
from grass.tools import Tools
@@ -101,7 +105,7 @@ We will create a temporary folder, that will store our GRASS projects, since we
101105
import tempfile
102106
103107
tempdir = tempfile.TemporaryDirectory()
104-
project_path = tempdir.name
108+
path = tempdir.name
105109
```
106110

107111
# Input data download and processing
@@ -126,9 +130,7 @@ nlcd_filename = Path(url).stem + ".tif"
126130
Create a GRASS project using the coordinate reference system (CRS) of the NLCD dataset.
127131

128132
```{python}
129-
from pathlib import Path
130-
131-
nlcd_project = f"{Path(project_path, "nlcd")}"
133+
nlcd_project = Path(path, "nlcd")
132134
# create a project
133135
gs.create_project(nlcd_project, filename=nlcd_filename)
134136
# initialize GRASS session in that project
@@ -149,20 +151,19 @@ We will use NLCD data later, now we process hydrography dataset.
149151
We will download and unzip [National Hydrography Dataset](https://www.usgs.gov/national-hydrography/access-national-hydrography-products) for North Carolina and create a GRASS project, in which we will extract the river and adjacent subwatersheds.
150152

151153
```{python}
152-
import zipfile
153-
154154
url = "https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHD/State/GPKG/NHD_H_North_Carolina_State_GPKG.zip"
155155
hydro_filename, headers = urllib.request.urlretrieve(url)
156156
with zipfile.ZipFile(hydro_filename, "r") as zip_ref:
157157
zip_ref.extractall()
158158
os.remove(hydro_filename)
159159
hydro_filename = Path(url).stem + ".gpkg"
160160
```
161-
162-
Create the GRASS project called "hydro" based on the hydrography data, the data is in latitude-longitude CRS.
161+
We will create another project, called "hydro", to do the initial processing
162+
of the hydrography data in the original CRS of the data
163+
(latitude-longitude CRS).
163164

164165
```{python}
165-
hydro_project = f"{Path(project_path, "hydro")}"
166+
hydro_project = Path(path, "hydro")
166167
167168
gs.create_project(hydro_project, filename=hydro_filename)
168169
session = gj.init(hydro_project)
@@ -219,7 +220,9 @@ tools.v_in_ogr(
219220
```
220221

221222
Next, we want to extract the subwatersheds along the river.
222-
If we simply overlap (with [v.select](https://grass.osgeo.org/grass-devel/manuals/v.select.html)) the river and the subwatersheds, we will miss some of them:
223+
If we simply overlap (with [v.select](https://grass.osgeo.org/grass-devel/manuals/v.select.html))
224+
the river and the subwatersheds, we will miss some of them
225+
because the river data don't always overlap or touch the subwatersheds.
223226

224227
```{python}
225228
tools.v_select(
@@ -263,8 +266,9 @@ The rest of the workflow will be done in a CRS used in North Carolina (EPSG 3358
263266
Since we want our project to use a different CRS more suitable for our study area (EPSG:3358 for North Carolina), we will create it now:
264267

265268
```{python}
266-
gs.create_project(path=project_path, name="NC", epsg="3358")
267-
session = gj.init(f"{Path(project_path, 'NC')}")
269+
nc_project = path / "NC"
270+
gs.create_project(nc_project, epsg="3358")
271+
session = gj.init(nc_project)
268272
```
269273

270274
We will reproject the river subwatersheds vector with [v.proj]((https://grass.osgeo.org/grass-devel/manuals/v.proj.html)):
@@ -551,23 +555,10 @@ tools.r_mapcalc(
551555
f"erosion_{huc12} = if(erdep_{huc12} < 0, abs(erdep_{huc12}) * {simulation_time * 60}, null())"
552556
)
553557
erosion = tools.r_univar(map=f"erosion_{huc12}", format="json")
558+
print(erosion["mean"])
554559
```
555560

556-
```text
557-
'null_cells': 418835,
558-
'cells': 599907,
559-
'min': 3.4630666334578564e-08,
560-
'max': 154011.390625,
561-
'range': 154011.39062496537,
562-
'mean': 98.83270611141184,
563-
'mean_of_abs': 98.83270611141184,
564-
'stddev': 1045.8673705761544,
565-
'variance': 1093838.556835879,
566-
'coeff_var': 1058.2199068769523,
567-
'sum': 17895835.761005566}
568-
```
569-
570-
Deactivate the active raster mask, restoring operations to apply across the full region.
561+
Deactivate the active raster mask, restoring operations to apply across the full computational region.
571562

572563
```{python}
573564
mask.deactivate()
@@ -591,24 +582,26 @@ we will use the workflow we just ran and create a
591582
script that uses Python's `multiprocessing` module to parallelize the workflow.
592583
Each subwatershed is processed independently in its own environment, which allows computations to run concurrently without interference.
593584

594-
Each subwatershed needs to set different mask and computational region, however normally, those settings are global, and so for different mask and region for each parallel process, we will use mask and region context managers.
595-
596-
* Masking is handled using `MaskManager`, a [context manager for setting and managing raster mask](https://grass.osgeo.org/grass-devel/manuals/libpython/grass.script.html#grass.script.MaskManager),
597-
making it possible to have custom mask for the current process. This feature is available only since GRASS 8.5.
598-
599-
```python
600-
with gs.MaskManager(mask_name=f"basin_{huc12}"):
601-
# Run actual computation with active mask.
602-
gs.run_command(...)
603-
```
585+
Each subwatershed needs to set different computational region and mask.
586+
However those setting are usually global for each mapset.
587+
So, to use different regions and masks for each parallel process, we will use the region and mask context managers.
604588

605589
* Computational region is handled using `RegionManager`, a [context manager for setting and managing computational region](https://grass.osgeo.org/grass-devel/manuals/libpython/grass.script.html#grass.script.RegionManager),
606590
making it possible to have custom region for the current process. This feature is available only since GRASS 8.5.
607591

608592
```python
609593
with gs.RegionManager(vector=f"basin_{huc12}"):
610594
# Run actual computation in the specified region.
611-
gs.run_command(...)
595+
tools.r_sim_water(...)
596+
```
597+
598+
* Masking is handled using `MaskManager`, a [context manager for setting and managing raster mask](https://grass.osgeo.org/grass-devel/manuals/libpython/grass.script.html#grass.script.MaskManager),
599+
making it possible to have custom mask for the current process. This feature is available only since GRASS 8.5.
600+
601+
```python
602+
with gs.MaskManager(mask_name=f"basin_{huc12}"):
603+
# Run actual computation with active mask.
604+
tools.r_sim_water(...)
612605
```
613606

614607
## Putting it all together
@@ -638,20 +631,31 @@ def compute(huc12):
638631
flags="t",
639632
)
640633
# Set the computational region to match the basin
634+
# while using the NED raster cell size and alignment
641635
with gs.RegionManager(vector=f"basin_{huc12}", raster="ned"):
642636
tools.v_to_rast(
643637
input=f"basin_{huc12}",
644638
output=f"basin_{huc12}",
645639
use="val",
646640
)
641+
tools.r_proj(
642+
project="nlcd",
643+
mapset="PERMANENT",
644+
input="nlcd",
645+
output=f"nlcd_{huc12}",
646+
)
647+
tools.r_recode(
648+
input=f"nlcd_{huc12}",
649+
output=f"mannings_{huc12}",
650+
rules="mannings.txt",
651+
)
652+
tools.r_recode(
653+
input=f"nlcd_{huc12}",
654+
output=f"runoff_{huc12}",
655+
rules="runoff.txt",
656+
)
647657
with gs.MaskManager(mask_name=f"basin_{huc12}"):
648658
# Run actual computation with active mask.
649-
tools.r_proj(
650-
project="nlcd",
651-
mapset="PERMANENT",
652-
input="nlcd",
653-
output=f"nlcd_{huc12}",
654-
)
655659
tools.r_recode(
656660
input=f"nlcd_{huc12}",
657661
output=f"mannings_{huc12}",
@@ -713,11 +717,11 @@ def compute(huc12):
713717
if __name__ == "__main__":
714718
tools = Tools()
715719
# The entire workflow will run in parallel,
716-
# so this limits the number of threads the tools can use to 1.
720+
# so this limits the number of threads each individual tool can use to 1.
717721
tools.g_gisenv(set="NPROCS=1")
718722
basins = tools.v_db_select(format="json", map="river_basins")["records"]
719723
huc12s = [basin["huc12"] for basin in basins]
720-
# set the number of processes:
724+
# set the number of processes to be used for the computation in total
721725
with Pool(processes=cpu_count()) as pool:
722726
result = list(tqdm(pool.imap(compute, huc12s), total=len(huc12s)))
723727
with open("result.json", "w") as fp:
@@ -744,15 +748,20 @@ df["normalized_erosion"] = df.erosion_mean / max(df.erosion_mean)
744748
df
745749
```
746750

747-
Load the subwatershed layers into geopandas for visualization. Join the dataframe with the simulation values using the _huc12_ key and keep only subwatersheds that have computed erosion values.
751+
Load the subwatershed layers into geopandas for visualization.
752+
Join the dataframe with the simulation values using the _huc12_ key
753+
and from all NC subwatersheds filter only those we initially selected
754+
that have now computed erosion values.
748755

749756
```{python}
750757
import geopandas as gpd
751758
752759
gdf = gpd.read_file(hydro_filename, layer="WBDHU12")
753760
gdf = gdf.merge(df, on="huc12", how="left")
754-
# JSON does not support Timestamp type
761+
# drop loaddate column because it's timestamp type
762+
# not supported by JSON, would later fail
755763
gdf = gdf.drop(columns=["loaddate"])
764+
# keep only subwatersheds with computed values
756765
gdf = gdf[gdf["erosion_total"].notna()]
757766
```
758767

@@ -801,4 +810,4 @@ m
801810
![](SIMWE_images/webmap.webp)
802811

803812
We can now identify subwatersheds that have larger erosion values and are potentially
804-
the source of the source of increased sediment loads downstream, contributing to water quality degradation and habitat disruption.
813+
the source of increased sediment loads downstream, contributing to water quality degradation and habitat disruption.

0 commit comments

Comments
 (0)