Skip to content

Commit 100db13

Browse files
authored
add figure refs (#73)
1 parent 2c2e2f6 commit 100db13

File tree

7 files changed

+51
-14
lines changed

7 files changed

+51
-14
lines changed

book/background/1_context_motivation.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@ Technological developments in recent decades have engendered fundamental shifts
1313

1414
## *Increasingly large, cloud-optimized data means new tools and approaches for data management*
1515

16-
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,Boulton02012018,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`).
16+
The increasing volume of publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,Boulton02012018,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user,wagemann_2022_FiveGuidingPrinciples`). {numref}`eo_data_trend` shows the recent trend and projected continued increases in the volume of NASA Earth Science data archives. New satellites like [NISAR](https://nisar.jpl.nasa.gov/) will add to the growth of the data archives.
1717

1818
```{figure} imgs/fy24-projection-chart.png
1919
---
20+
name: eo_data_trend
2021
---
2122
Volume of NASA Earth Science Data archives, including growth of existing-mission archives and new missions, projected through 2029. Source: [NASA EarthData - Open Science](https://www.earthdata.nasa.gov/about/open-science).
2223
```

book/background/2_data_cubes.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,10 @@ Fundamentally, many of these complexities can be reduced to one distinction: is
1818

1919
### *An example dataset as a Xarray cube*
2020

21-
Imagine we have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see Montero et al. {cite:t}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*.
22-
23-
In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1]. It looks something like this:
24-
```{figure} imgs/image_stack.png
21+
Imagine we have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see Montero et al. {cite:t}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*. In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1], a schematic of which is shown in {numref}`2d-stack`.
22+
```{figure} imgs/2d_collection.png
2523
---
24+
name: 2d-stack
2625
---
2726
Illustration of earth observation time series as a stack of 2-d images and associated metadata.
2827
@@ -34,11 +33,11 @@ Without coordinate information and metadata, the image data are abstract arrays,
3433
To use this data for scientific analysis, we need to construct it into the form of a cube. This requires a comprehensive understanding of the different pieces of information contained in the dataset and how they relate to one another in order to map the components of the dataset onto a cube structure.
3534
```{figure} imgs/cube.png
3635
---
36+
name: 3d-cube
3737
---
3838
Illustration of earth observation time series organized as a 3-d Xarray data cube. Source: Adapted from [Xarray Dev](https://xarray.dev/).
3939
```
40-
41-
In the context of the Xarray data model, univariate data cubes can be represented by an `xr.DataArray` or a `xr.Dataset` with one `data_variable`. Multivariate data cubes should be represented by `xr.Dataset` objects. The building blocks of `xr.DataArrays` and `xr.Datasets` are dimensions, coordinates, data variables, attributes. We recommend the Xarray [terminology](https://docs.xarray.dev/en/stable/user-guide/terminology.html) for a detailed overview of Xarray objects and common operations.
40+
In the context of the Xarray data model, univariate data cubes can be represented by an `xr.DataArray` or a `xr.Dataset` with one `data_variable`. {numref}`3d-cube` illustrates how to represent multivariate data cubes using `xr.Dataset` objects. The building blocks of `xr.DataArrays` and `xr.Datasets` are dimensions, coordinates, data variables, attributes. We recommend the Xarray [terminology](https://docs.xarray.dev/en/stable/user-guide/terminology.html) for a detailed overview of Xarray objects and common operations.
4241

4342
We've just discussed what a data cube is in the context of a standard earth observation dataset and how to use the Xarray data model to efficiently represent this kind of data. Another way of describing those steps is preparing the dataset so that it is fit for analysis.
4443

book/background/4_tutorial_data.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,12 @@ ITS_LIVE is a dataset of ice velocity observations derived from applying a featu
1717

1818
```{figure} imgs/lopez06-3341335.png
1919
---
20+
name: ITS_LIVE-time-series
2021
---
21-
Example of a ice velocity time series along centerline profile of Malaspina Glacier featuring velocity observations from a range of satellite sensors. Source: Reproduced with permission from {cite:t}`lopez_2023_itslive`.
22+
Example of a ice velocity time series along a profile of Malaspina Glacier featuring velocity observations from a range of satellite sensors. Source: Reproduced with permission from {cite:t}`lopez_2023_itslive`.
2223
```
2324

24-
Part of what is so exciting about ITS_LIVE is that it combines image pairs from a number of satellites, including imagery from optical (Landsat 4,5,7,8,9 & Sentinel-2) and synthetic aperture radar (Sentinel-1) sensors. For this reason, ITS_LIVE time series data can be quite large. Another exciting aspect of the ITS_LIVE dataset is that the image pair time series data is made available as Zarr data cubes stored in cloud object storage on Amazon Web Services (AWS), meaning that users don't need to download massive files to start working with the data!
25+
{numref}`ITS_LIVE-time-series` shows an ITS_LIVE time series at various locations on the Malaspina glacier and the satellite sensors that contribute observations throughout the time series. Part of what is so exciting about ITS_LIVE is that it combines image pairs from a number of satellites, including imagery from optical (Landsat 4,5,7,8,9 & Sentinel-2) and synthetic aperture radar (Sentinel-1) sensors. For this reason, ITS_LIVE time series data can be quite large. Another exciting aspect of the ITS_LIVE dataset is that the image pair time series data is made available as Zarr data cubes stored in cloud object storage on Amazon Web Services (AWS), meaning that users don't need to download massive files to start working with the data!
2526

2627

2728
:::{admonition} A note about working with image pair time series
@@ -64,13 +65,13 @@ We provide a very brief overview of RTC processing below but it is not intended
6465
---
6566
height: 250 px
6667
figclass: margin-caption
67-
name: SAR diagram
68+
name: SAR-diagram
6869
---
6970
Schematic of observation geometry used to form a SAR image.
70-
Credit: [NASA EarthData / NASA SAR Handbook](https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/sar).
71+
Source: [NASA EarthData / NASA SAR Handbook](https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/sar).
7172
```
7273

73-
SAR data is collected in slant range, which is the viewing geometry of the side-looking sensor and has two dimensions: range and azimuth. These are the along-track and across-track directions of the imaged swath. As data is transformed from radar coordinates (slant range) to geocoded coordinates, the spaces represented by individual pixels in the two coordinate systems do not always align, and distortions can arise due to certain viewing angle geometries and surface topography features. In addition, radiometric distortion can arise due to scattering responses from multiple scattering features within a single pixel. Radiometric terrain correction is a processing step that accounts for these distortions and the transformation from radar to geocoded coordinates that prepares SAR data for analysis.
74+
SAR data is collected in slant range, which is the viewing geometry of the side-looking sensor and has two dimensions: range and azimuth. These are the along-track and across-track directions of the imaged swath. {numref}`SAR-diagram` illustrates the viewing geometry of a SAR image. As data is transformed from radar coordinates (slant range) to geocoded coordinates, the spaces represented by individual pixels in the two coordinate systems do not always align, and distortions can arise due to certain viewing angle geometries and surface topography features. In addition, radiometric distortion can arise due to scattering responses from multiple scattering features within a single pixel. Radiometric terrain correction is a processing step that accounts for these distortions and the transformation from radar to geocoded coordinates that prepares SAR data for analysis.
7475

7576
### Sentinel-1 RTC datasets
7677
::::{tab-set}

book/book_refs.bib

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -545,6 +545,23 @@ @article{wagemann_2021_user
545545
year = {2021},
546546
pages = {1758--1774},
547547
}
548+
@article{wagemann_2022_FiveGuidingPrinciples,
549+
title = {Five {{Guiding Principles}} to {{Make Jupyter Notebooks Fit}} for {{Earth Observation Data Education}}},
550+
author = {Wagemann, Julia and Fierli, Federico and Mantovani, Simone and Siemen, Stephan and Seeger, Bernhard and Bendix, J{\"o}rg},
551+
year = {2022},
552+
month = jul,
553+
journal = {Remote Sensing},
554+
volume = {14},
555+
number = {14},
556+
pages = {3359},
557+
issn = {2072-4292},
558+
doi = {10.3390/rs14143359},
559+
urldate = {2025-04-28},
560+
abstract = {There is a growing demand to train Earth Observation (EO) data users in how to access and use existing and upcoming data. A promising tool for data-related training is computational notebooks, which are interactive web applications that combine text, code and computational output. Here, we present the Learning Tool for Python (LTPy), which is a training course (based on Jupyter notebooks) on atmospheric composition data. LTPy consists of more than 70 notebooks and has taught over 1000 EO data users so far, whose feedback is overall positive. We adapted five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make the Jupyter notebooks more educational and reusable. The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible and (v) aim for being reproducible. We see two areas for future developments: first, to collect feedback and evaluate whether the instructional design elements proposed meet their objective; and second, to develop tools that automatize the implementation of best practices.},
561+
copyright = {https://creativecommons.org/licenses/by/4.0/},
562+
langid = {english},
563+
}
564+
548565

549566
@Article{Wickham_2014_Tidy,
550567
author = {Hadley Wickham},

book/introduction.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,6 @@ Underpinning these examples is a focus on understanding the different components
1313

1414
```{figure} background/imgs/cube.png
1515
:width: 75%
16+
17+
Illustration of a Xarray 3-d data cube.
1618
```

paper.bib

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,23 @@ @article{wagemann_2021_user
263263
pages = {1758--1774},
264264
}
265265

266+
@article{wagemann_2022_FiveGuidingPrinciples,
267+
title = {Five {{Guiding Principles}} to {{Make Jupyter Notebooks Fit}} for {{Earth Observation Data Education}}},
268+
author = {Wagemann, Julia and Fierli, Federico and Mantovani, Simone and Siemen, Stephan and Seeger, Bernhard and Bendix, J{\"o}rg},
269+
year = {2022},
270+
month = jul,
271+
journal = {Remote Sensing},
272+
volume = {14},
273+
number = {14},
274+
pages = {3359},
275+
issn = {2072-4292},
276+
doi = {10.3390/rs14143359},
277+
urldate = {2025-04-28},
278+
abstract = {There is a growing demand to train Earth Observation (EO) data users in how to access and use existing and upcoming data. A promising tool for data-related training is computational notebooks, which are interactive web applications that combine text, code and computational output. Here, we present the Learning Tool for Python (LTPy), which is a training course (based on Jupyter notebooks) on atmospheric composition data. LTPy consists of more than 70 notebooks and has taught over 1000 EO data users so far, whose feedback is overall positive. We adapted five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make the Jupyter notebooks more educational and reusable. The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible and (v) aim for being reproducible. We see two areas for future developments: first, to collect feedback and evaluate whether the instructional design elements proposed meet their objective; and second, to develop tools that automatize the implementation of best practices.},
279+
copyright = {https://creativecommons.org/licenses/by/4.0/},
280+
langid = {english},
281+
}
282+
266283
@misc{Gardner_Scambos_2022,
267284
title={MEaSUREs ITS_LIVE Landsat Image-Pair Glacier and Ice Sheet Surface Velocities, Version 1},
268285
url={https://nsidc.org/data/NSIDC-0775/versions/1}, DOI={10.5067/IMR9D3PEI28U},

paper.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,10 @@ bibliography: "paper.bib"
3939
---
4040

4141
# Summary
42-
Advances in cloud computing, remote sensing, and engineering are transforming earth system science into an increasingly data-intensive field, requiring students and scientists to learn a broad range of new skills related to scientific programming, data management, and cloud infrastructure [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @ramachandran_2021_open; @wagemann_2021_user]. This work contains educational modules designed to reduce barriers to interacting with large, complex, cloud-hosted remote sensing datasets using open-source computational tools and software. The goal of these materials is to demonstrate and promote the rigorous investigation of n-dimensional multi-sensor satellite imagery datasets through scientific programming. These tutorials feature publicly available satellite imagery with global coverage and commonly used sensors such as optical and synthetic aperture radar data with different levels of processing. We include thorough discussions of specific data formats and demonstrate access patterns for two popular cloud infrastructure platforms (Amazon Web Services and Microsoft Planetary Computer) as well as public cloud computational resources for remote sensing data processing at Alaska Satellite Facility (ASF).
42+
Advances in cloud computing, remote sensing, and engineering are transforming earth system science into an increasingly data-intensive field, requiring students and scientists to learn a broad range of new skills related to scientific programming, data management, and cloud infrastructure [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @ramachandran_2021_open; @wagemann_2021_user; wagemann_2022_FiveGuidingPrinciples]. This work contains educational modules designed to reduce barriers to interacting with large, complex, cloud-hosted remote sensing datasets using open-source computational tools and software. The goal of these materials is to demonstrate and promote the rigorous investigation of n-dimensional multi-sensor satellite imagery datasets through scientific programming. These tutorials feature publicly available satellite imagery with global coverage and commonly used sensors such as optical and synthetic aperture radar data with different levels of processing. We include thorough discussions of specific data formats and demonstrate access patterns for two popular cloud infrastructure platforms (Amazon Web Services and Microsoft Planetary Computer) as well as public cloud computational resources for remote sensing data processing at Alaska Satellite Facility (ASF).
4343

4444
# Statement of Need
45-
Research on the transition to data-intensive, cloud-based science highlights the need for knowledge development to accompany technological advances in order to realize the benefit of these transformations [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @palumbo_2017_building; @radocaj_2020_global; @ramachandran_2021_open; @Sudmanns_2020_big; @wagemann_2021_user].
45+
Research on the transition to data-intensive, cloud-based science highlights the need for knowledge development to accompany technological advances in order to realize the benefit of these transformations [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @palumbo_2017_building; @radocaj_2020_global; @ramachandran_2021_open; @Sudmanns_2020_big; @wagemann_2021_user; @wagemann_2022_FiveGuidingPrinciples].
4646

4747
These educational modules address this need and are guided by principles identified in Diataxis [@Procida_Diataxis_documentation_framework] in order to help analysts engage in data-driven scientific discovery using cloud-based data and open-source tools.
4848

0 commit comments

Comments
 (0)