Skip to content

Commit c91fde0

Browse files
transition to named tutorial dirs and minor typos (#33)
* fix minor typos on software install page * typos on motivation page * change tutorial1 to itslive * change tutorial2 to sentinel1 * update directory names * stop tracking data files --------- Co-authored-by: e-marshall <em.marshall.108@gmail.com>
1 parent b6f9f6f commit c91fde0

File tree

3,633 files changed

+42
-7979
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,633 files changed

+42
-7979
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ _static
66
**/data/*
77

88
#Extra nbs
9-
tutorial2/subste_nbs
9+
sentinel1/subste_nbs
1010

1111
# Byte-compiled / optimized / DLL files
1212
__pycache__/

book/_toc.yml

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,22 @@ parts:
1515
- file: background/tutorial_data
1616
- file: intro/software
1717
- file: background/relevant_concepts
18-
- caption: Part 1
18+
- caption: "Part 1: ITS_LIVE"
1919
chapters:
20-
- file: tutorial1/itslive_intro
21-
- file: tutorial1/nbs/1_accessing_itslive_s3_data
22-
- file: tutorial1/nbs/2_larger_than_memory_data
23-
- file: tutorial1/nbs/3_combining_raster_vector_data
24-
- file: tutorial1/nbs/4_exploratory_data_analysis_single
25-
- file: tutorial1/nbs/5_exploratory_data_analysis_group
20+
- file: itslive/itslive_intro
21+
- file: itslive/nbs/1_accessing_itslive_s3_data
22+
- file: itslive/nbs/2_larger_than_memory_data
23+
- file: itslive/nbs/3_combining_raster_vector_data
24+
- file: itslive/nbs/4_exploratory_data_analysis_single
25+
- file: itslive/nbs/5_exploratory_data_analysis_group
2626
- caption: Part 2
2727
chapters:
28-
- file: tutorial2/s1_intro
29-
- file: tutorial2/nbs/1_read_asf_data
30-
- file: tutorial2/nbs/2_wrangle_metadata
31-
- file: tutorial2/nbs/3_asf_exploratory_analysis
32-
- file: tutorial2/nbs/4_read_pc_data
33-
- file: tutorial2/nbs/5_comparing_s1_rtc_datasets
28+
- file: sentinel1/s1_intro
29+
- file: sentinel1/nbs/1_read_asf_data
30+
- file: sentinel1/nbs/2_wrangle_metadata
31+
- file: sentinel1/nbs/3_asf_exploratory_analysis
32+
- file: sentinel1/nbs/4_read_pc_data
33+
- file: sentinel1/nbs/5_comparing_s1_rtc_datasets
3434
- caption: Summary + Conclusion
3535
chapters:
3636
- file: pt4/summary

book/background/context_motivation.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ This book demonstrates scientific workflows using publicly-available, cloud-opti
77
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis.
88

99
```{epigraph}
10-
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis.After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11-
12-
-- {cite:t}`abernathey_2021_cloud`
10+
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis. After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11+
-- {cite}`abernathey_2021_cloud`
1312
```
1413

1514
### *II. Increasingly large, cloud-optimized data means new tools and approaches for data management*
@@ -24,6 +23,6 @@ Volume of NASA Earth Science Data archives, including growth of existing-mission
2423

2524
### *III. Asking questions of complex datasets*
2625

27-
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts ({cite:t}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`).
26+
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (e.g. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With the increasingly complex and large volume of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts {cite}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`.
2827

29-
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained opens-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
28+
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained open-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available, cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.

book/background/data_cubes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ The development and increasing adoption of analysis-ready specifications for sat
6868
However, many legacy datasets still require significant effort in order to be considered 'analysis-ready'. Furthermore, for analysts, 'analysis-ready' can be a subjective and evolving label. Semantically, from a user-perspective, analysis-ready data can be thought of as data whose structure is conducive to scientific analysis.
6969

7070
## *III. Analysis-ready data cubes & this book*
71-
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1](../tutorial1/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [tutorial 2](../tutorial2/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
71+
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1: ITS_LIVE](../itslive/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [Tutorial 2: Sentinel-1](../sentinel1/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
7272

7373
### References
7474
- {cite:t}`montero_2024_EarthSystemData`

book/endmatter/appendix.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ In the first tutorial, while making an [interactive visualization of vector data
88

99
## [2. Reading a stack of files with `xr.open_mfdataset()` (Sentinel-1 tutorial)](nbs/2_read_w_xropen_mfdataset.ipynb)
1010

11-
Xarray's `xr.open_mfdataset()` [function](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) allows the user to read in and combine multiple files at once to produce a single `xr.DataArray` object. This approach was explore when developing the [Read ASF-processed Sentinel-1 RTC data notebook](../tutorial2/nbs/1_read_asf_data.ipynb). However, `xr.open_mfdataset() didn't work well for this purpose because, while the stack of raster files used in this example covers a common area of interest, it includes several different spatial footprints. This creates problems when specifying a chunking strategy.
11+
Xarray's `xr.open_mfdataset()` [function](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) allows the user to read in and combine multiple files at once to produce a single `xr.DataArray` object. This approach was explore when developing the [Read ASF-processed Sentinel-1 RTC data notebook](../sentinel1/nbs/1_read_asf_data.ipynb). However, `xr.open_mfdataset() didn't work well for this purpose because, while the stack of raster files used in this example covers a common area of interest, it includes several different spatial footprints. This creates problems when specifying a chunking strategy.
1212

1313
`xr.open_mfdataset()` takes a 'preprocess' argument that allows the user to write a function to specify how each raster file should be read so that the structure and metadata of the returned object matches the desired format. However, because it applies the same preprocessing steps to each file, the chunking strategyy is defined off of the first file in the stack. With files that cover different spatial footprints, different chunking strategies will be required. The processing works fine for lazy steps, but a memory 'blow-up' occurs at computation time.
1414

@@ -28,4 +28,4 @@ If you wanted to select scenes from a single viewing geometry at the expense of
2828
```
2929

3030
## [3. Another regridding approach using `xESMF` (Sentinel-1 tutorial)](nbs/3_regridding_w_xesmf.ipynb)
31-
This notebook demonstrates an alternative approach to the regridding shown in [noteboook 5](../tutorial2/nbs/5_comparing_s1_rtc_datasets.ipynb) of Tutorial 2, but this time using a different regridding package.
31+
This notebook demonstrates an alternative approach to the regridding shown in [noteboook 5](../sentinel1/nbs/5_comparing_s1_rtc_datasets.ipynb) of Tutorial 2, but this time using a different regridding package.

book/endmatter/nbs/1_handle_mult_geom_types.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -151,10 +151,10 @@
151151
"outputs": [],
152152
"source": [
153153
"# Read initial gdf\n",
154-
"se_asia = gpd.read_parquet(\"../data/tutorial1/rgi7_region15_south_asia_east.parquet\")\n",
154+
"se_asia = gpd.read_parquet(\"../data/itslive/rgi7_region15_south_asia_east.parquet\")\n",
155155
"\n",
156156
"# Read bbox of ITS_LIVE data cube\n",
157-
"bbox_dc = gpd.read_file(\"../data/tutorial1/bbox_dc.geojson\")\n",
157+
"bbox_dc = gpd.read_file(\"../data/itslive/bbox_dc.geojson\")\n",
158158
"\n",
159159
"# Project the rgi outlines so that its CRS matches the CRS of the bbox\n",
160160
"se_asia_prj = se_asia.to_crs(bbox_dc.crs)\n",

book/endmatter/nbs/2_read_w_xropen_mfdataset.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@
305305
"outputs": [],
306306
"source": [
307307
"timeseries_type = \"full\"\n",
308-
"path_to_rtcs = f\"tutorial2/data/{timeseries_type}_timeseries/asf_rtcs\""
308+
"path_to_rtcs = f\"sentinel1/data/{timeseries_type}_timeseries/asf_rtcs\""
309309
]
310310
},
311311
{

book/endmatter/nbs/3_regridding_w_xesmf.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
"source": [
6464
"timeseries_type = 'full'\n",
6565
"\n",
66-
"asf_cube = xr.open_dataset(f'../../tutorial2/data/{timeseries_type}_timeseries/intermediate_cubes/s1_asf_clipped_cube.zarr',\n",
66+
"asf_cube = xr.open_dataset(f'../../sentinel1/data/{timeseries_type}_timeseries/intermediate_cubes/s1_asf_clipped_cube.zarr',\n",
6767
" engine='zarr',chunks='auto', decode_coords='all')\n",
6868
"asf_cube = asf_cube.rename({'acq_date':'time'})\n",
6969
"asf_cube = asf_cube.compute()"

book/intro/getting_started.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Head to the [software](software.md) page for detailed instructions on how to get
1414

1515
Most of the examples in this book use data accessed programmatically from cloud-object storage. We make subsets of the data available in this book's Github repository to remove the need for computationally-intensive operations in the tutorials.
1616

17-
Several notebooks in the second tutorial use Sentinel-1 data processed by the Alaska Satellite Facility and downloaded locally. Users who would like to follow these steps on their own may do so (and access the data [here]()), but a smaller subset of the dataset is also made available [here](). For more detail on different ways to work through the Sentinel-1 tutorial, head [here](../tutorial2/s1_intro.md), and for more background on all of the datasets used in this book see [Tutorial Data](../background/tutorial_data.md).
17+
Several notebooks in the second tutorial use Sentinel-1 data processed by the Alaska Satellite Facility and downloaded locally. Users who would like to follow these steps on their own may do so (and access the data [here]()), but a smaller subset of the dataset is also made available [here](). For more detail on different ways to work through the Sentinel-1 tutorial, head [here](../sentinel1/s1_intro.md), and for more background on all of the datasets used in this book see [Tutorial Data](../background/tutorial_data.md).
1818

1919
:::{important}
2020
The datasets used in these tutorials can be complicated to work with and require significant background knowledge in order to understand their limitations and how best to interpret them. **It is the responsibility of the user** to understand the physical principles that underpin remote sensing datasets and how they should be used and interpreted. See the [Tutorial Data](../background/tutorial_data.md) section for detailed discussion of these datasets and links to important background information.
@@ -46,10 +46,10 @@ Background on data cubes and an introduction to array-based geoscience data and
4646
Each tutorial focuses on a different type of remote sensing dataset and demonstrates how to assess and work through the nuances, details and challenges that can arise from each. A common characteristic of each dataset that is emphasized throughout the notebooks is working with larger-than-memory datasets on the computational resources of a standard laptop.
4747

4848
#### Part 1: {{part2_title}}
49-
A [tutorial](../tutorial1/itslive_intro.md) focusing on [ITS_LIVE](https://its-live.jpl.nasa.gov/), a NASA MEASURES project and publicly accessible dataset stored in an AWS S3 repo as Zarr data cubes.
49+
A [tutorial](../itslive/itslive_intro.md) focusing on [ITS_LIVE](https://its-live.jpl.nasa.gov/), a NASA MEASURES project and publicly accessible dataset stored in an AWS S3 repo as Zarr data cubes.
5050

5151
#### Part 2: {{part3_title}}
52-
This [tutorial](../tutorial2/s1_intro.md) focuses on another satellite dataset: [Sentinel-1](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-1) Radiometric Terrain Corrected imagery. Sentinel-1 is a satellite-based imaging radar. More specifically, it is a synthetic aperture radar (SAR). SAR sensor look to the side rather than straight-down like conentional optical and infrared satellite sensors. This side-looking geometry causes geometric distortions that need to be addressed prior to analysis. SAR data undergoes different types of processing for different scientific applications. Part 2 demonstrates how to access this data from two publicly available, online respositories: Alaska Satellite Facility and Microsoft Planetary Computer. These notebooks demonstrate the different ways to read this data and prepare it for analysis, as well as an initial comparison of the two datasets.
52+
This [tutorial](../sentinel1/s1_intro.md) focuses on another satellite dataset: [Sentinel-1](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-1) Radiometric Terrain Corrected imagery. Sentinel-1 is a satellite-based imaging radar. More specifically, it is a synthetic aperture radar (SAR). SAR sensor look to the side rather than straight-down like conentional optical and infrared satellite sensors. This side-looking geometry causes geometric distortions that need to be addressed prior to analysis. SAR data undergoes different types of processing for different scientific applications. Part 2 demonstrates how to access this data from two publicly available, online respositories: Alaska Satellite Facility and Microsoft Planetary Computer. These notebooks demonstrate the different ways to read this data and prepare it for analysis, as well as an initial comparison of the two datasets.
5353

5454
### {{part4_title}}
5555

book/intro/software.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ There are two options for creating a software environment: [pixi](https://pixi.s
2626
```cd cloud-open-source-geospatial-datacube-workflows/book```
2727

2828
3. Create and activate a conda environment from the `environment.yml` file located in the repo:
29-
```conda env create -f .binder/environment.yaml```
29+
```conda env create -f .binder/environment.yml```
3030

31-
4. Start Jupyterlab and navigate to the directories containing the jupyter notebooks (`itslive_nbs` and `s1_nbs`):
31+
4. Start Jupyterlab and navigate to the directories containing the Jupyter notebooks (`itslive/nbs` and `s1/nbs`):
3232
```jupyterlab```
3333

34-
Both tutorials also uses functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive_nbs/itslive_tools.py) and [`s1_tools.py`](../s1_nbs/s1_tools.py).
34+
Both tutorials use functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive/nbs/itslive_tools.py) and [`s1_tools.py`](../s1/nbs/s1_tools.py).
3535

0 commit comments

Comments
 (0)