add figure refs (#73)

e-marshall · web-flow · commit 100db134bb10 · 2025-04-29T11:27:12.000-06:00
diff --git a/book/background/1_context_motivation.md b/book/background/1_context_motivation.md
@@ -13,10 +13,11 @@ Technological developments in recent decades have engendered fundamental shifts
 
 ## *Increasingly large, cloud-optimized data means new tools and approaches for data management*
 
-The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,Boulton02012018,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`). 
+The increasing volume of publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,Boulton02012018,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user,wagemann_2022_FiveGuidingPrinciples`). {numref}`eo_data_trend` shows the recent trend and projected continued increases in the volume of NASA Earth Science data archives. New satellites like [NISAR](https://nisar.jpl.nasa.gov/) will add to the growth of the data archives.
 
 ```{figure} imgs/fy24-projection-chart.png
 ---
+name: eo_data_trend
 ---
 Volume of NASA Earth Science Data archives, including growth of existing-mission archives and new missions, projected through 2029. Source: [NASA EarthData - Open Science](https://www.earthdata.nasa.gov/about/open-science).
 ```
diff --git a/book/background/2_data_cubes.md b/book/background/2_data_cubes.md
@@ -18,11 +18,10 @@ Fundamentally, many of these complexities can be reduced to one distinction: is
 
 ### *An example dataset as a Xarray cube*
 
-Imagine we have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see Montero et al. {cite:t}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*. 
-
-In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1]. It looks something like this:
-```{figure} imgs/image_stack.png
+Imagine we have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see Montero et al. {cite:t}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*. In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1], a schematic of which is shown in {numref}`2d-stack`.
+```{figure} imgs/2d_collection.png
 ---
+name: 2d-stack
 ---
 Illustration of earth observation time series as a stack of 2-d images and associated metadata. 
 
@@ -34,11 +33,11 @@ Without coordinate information and metadata, the image data are abstract arrays,
 To use this data for scientific analysis, we need to construct it into the form of a cube. This requires a comprehensive understanding of the different pieces of information contained in the dataset and how they relate to one another in order to map the components of the dataset onto a cube structure. 
 ```{figure} imgs/cube.png
 ---
+name: 3d-cube
 ---
 Illustration of earth observation time series organized as a 3-d Xarray data cube. Source: Adapted from [Xarray Dev](https://xarray.dev/).
 ```
-
-In the context of the Xarray data model, univariate data cubes can be represented by an `xr.DataArray` or a `xr.Dataset` with one `data_variable`. Multivariate data cubes should be represented by `xr.Dataset` objects. The building blocks of `xr.DataArrays` and `xr.Datasets` are dimensions, coordinates, data variables, attributes. We recommend the Xarray [terminology](https://docs.xarray.dev/en/stable/user-guide/terminology.html) for a detailed overview of Xarray objects and common operations.
+In the context of the Xarray data model, univariate data cubes can be represented by an `xr.DataArray` or a `xr.Dataset` with one `data_variable`. {numref}`3d-cube` illustrates how to represent multivariate data cubes using `xr.Dataset` objects. The building blocks of `xr.DataArrays` and `xr.Datasets` are dimensions, coordinates, data variables, attributes. We recommend the Xarray [terminology](https://docs.xarray.dev/en/stable/user-guide/terminology.html) for a detailed overview of Xarray objects and common operations.
 
 We've just discussed what a data cube is in the context of a standard earth observation dataset and how to use the Xarray data model to efficiently represent this kind of data. Another way of describing those steps is preparing the dataset so that it is fit for analysis.
 
diff --git a/book/background/4_tutorial_data.md b/book/background/4_tutorial_data.md
@@ -17,11 +17,12 @@ ITS_LIVE is a dataset of ice velocity observations derived from applying a featu
 
 ```{figure} imgs/lopez06-3341335.png
 ---
+name: ITS_LIVE-time-series
 ---
-Example of a ice velocity time series along centerline profile of Malaspina Glacier featuring velocity observations from a range of satellite sensors. Source: Reproduced with permission from {cite:t}`lopez_2023_itslive`.
+Example of a ice velocity time series along a profile of Malaspina Glacier featuring velocity observations from a range of satellite sensors. Source: Reproduced with permission from {cite:t}`lopez_2023_itslive`.
 ```
 
-Part of what is so exciting about ITS_LIVE is that it combines image pairs from a number of satellites, including imagery from optical (Landsat 4,5,7,8,9 & Sentinel-2) and synthetic aperture radar (Sentinel-1) sensors. For this reason, ITS_LIVE time series data can be quite large. Another exciting aspect of the ITS_LIVE dataset is that the image pair time series data is made available as Zarr data cubes stored in cloud object storage on Amazon Web Services (AWS), meaning that users don't need to download massive files to start working with the data! 
+{numref}`ITS_LIVE-time-series` shows an ITS_LIVE time series at various locations on the Malaspina glacier and the satellite sensors that contribute observations throughout the time series. Part of what is so exciting about ITS_LIVE is that it combines image pairs from a number of satellites, including imagery from optical (Landsat 4,5,7,8,9 & Sentinel-2) and synthetic aperture radar (Sentinel-1) sensors. For this reason, ITS_LIVE time series data can be quite large. Another exciting aspect of the ITS_LIVE dataset is that the image pair time series data is made available as Zarr data cubes stored in cloud object storage on Amazon Web Services (AWS), meaning that users don't need to download massive files to start working with the data! 
 
 
 :::{admonition} A note about working with image pair time series
@@ -64,13 +65,13 @@ We provide a very brief overview of RTC processing below but it is not intended
 ---
 height: 250 px
 figclass: margin-caption
-name: SAR diagram
+name: SAR-diagram
 ---
 Schematic of observation geometry used to form a SAR image.  
-Credit: [NASA EarthData / NASA SAR Handbook](https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/sar).
+Source: [NASA EarthData / NASA SAR Handbook](https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/sar).
 ```
 
-SAR data is collected in slant range, which is the viewing geometry of the side-looking sensor and has two dimensions: range and azimuth. These are the along-track and across-track directions of the imaged swath. As data is transformed from radar coordinates (slant range) to geocoded coordinates, the spaces represented by individual pixels in the two coordinate systems do not always align, and distortions can arise due to certain viewing angle geometries and surface topography features. In addition, radiometric distortion can arise due to scattering responses from multiple scattering features within a single pixel. Radiometric terrain correction is a processing step that accounts for these distortions and the transformation from radar to geocoded coordinates that prepares SAR data for analysis.
+SAR data is collected in slant range, which is the viewing geometry of the side-looking sensor and has two dimensions: range and azimuth. These are the along-track and across-track directions of the imaged swath. {numref}`SAR-diagram` illustrates the viewing geometry of a SAR image. As data is transformed from radar coordinates (slant range) to geocoded coordinates, the spaces represented by individual pixels in the two coordinate systems do not always align, and distortions can arise due to certain viewing angle geometries and surface topography features. In addition, radiometric distortion can arise due to scattering responses from multiple scattering features within a single pixel. Radiometric terrain correction is a processing step that accounts for these distortions and the transformation from radar to geocoded coordinates that prepares SAR data for analysis.
 
 ### Sentinel-1 RTC datasets
 ::::{tab-set}
diff --git a/book/book_refs.bib b/book/book_refs.bib
@@ -545,6 +545,23 @@ @article{wagemann_2021_user
     year = {2021},
     pages = {1758--1774},
 }
+@article{wagemann_2022_FiveGuidingPrinciples,
+  title = {Five {{Guiding Principles}} to {{Make Jupyter Notebooks Fit}} for {{Earth Observation Data Education}}},
+  author = {Wagemann, Julia and Fierli, Federico and Mantovani, Simone and Siemen, Stephan and Seeger, Bernhard and Bendix, J{\"o}rg},
+  year = {2022},
+  month = jul,
+  journal = {Remote Sensing},
+  volume = {14},
+  number = {14},
+  pages = {3359},
+  issn = {2072-4292},
+  doi = {10.3390/rs14143359},
+  urldate = {2025-04-28},
+  abstract = {There is a growing demand to train Earth Observation (EO) data users in how to access and use existing and upcoming data. A promising tool for data-related training is computational notebooks, which are interactive web applications that combine text, code and computational output. Here, we present the Learning Tool for Python (LTPy), which is a training course (based on Jupyter notebooks) on atmospheric composition data. LTPy consists of more than 70 notebooks and has taught over 1000 EO data users so far, whose feedback is overall positive. We adapted five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make the Jupyter notebooks more educational and reusable. The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible and (v) aim for being reproducible. We see two areas for future developments: first, to collect feedback and evaluate whether the instructional design elements proposed meet their objective; and second, to develop tools that automatize the implementation of best practices.},
+  copyright = {https://creativecommons.org/licenses/by/4.0/},
+  langid = {english},
+}
+
 
 @Article{Wickham_2014_Tidy,
   author = {Hadley Wickham},
diff --git a/book/introduction.md b/book/introduction.md
@@ -13,4 +13,6 @@ Underpinning these examples is a focus on understanding the different components
 
 ```{figure} background/imgs/cube.png
 :width: 75%
+
+Illustration of a Xarray 3-d data cube.
 ```
diff --git a/paper.bib b/paper.bib
@@ -263,6 +263,23 @@ @article{wagemann_2021_user
     pages = {1758--1774},
 }
 
+@article{wagemann_2022_FiveGuidingPrinciples,
+  title = {Five {{Guiding Principles}} to {{Make Jupyter Notebooks Fit}} for {{Earth Observation Data Education}}},
+  author = {Wagemann, Julia and Fierli, Federico and Mantovani, Simone and Siemen, Stephan and Seeger, Bernhard and Bendix, J{\"o}rg},
+  year = {2022},
+  month = jul,
+  journal = {Remote Sensing},
+  volume = {14},
+  number = {14},
+  pages = {3359},
+  issn = {2072-4292},
+  doi = {10.3390/rs14143359},
+  urldate = {2025-04-28},
+  abstract = {There is a growing demand to train Earth Observation (EO) data users in how to access and use existing and upcoming data. A promising tool for data-related training is computational notebooks, which are interactive web applications that combine text, code and computational output. Here, we present the Learning Tool for Python (LTPy), which is a training course (based on Jupyter notebooks) on atmospheric composition data. LTPy consists of more than 70 notebooks and has taught over 1000 EO data users so far, whose feedback is overall positive. We adapted five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make the Jupyter notebooks more educational and reusable. The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible and (v) aim for being reproducible. We see two areas for future developments: first, to collect feedback and evaluate whether the instructional design elements proposed meet their objective; and second, to develop tools that automatize the implementation of best practices.},
+  copyright = {https://creativecommons.org/licenses/by/4.0/},
+  langid = {english},
+}
+
 @misc{Gardner_Scambos_2022, 
     title={MEaSUREs ITS_LIVE Landsat Image-Pair Glacier and Ice Sheet Surface Velocities, Version 1}, 
     url={https://nsidc.org/data/NSIDC-0775/versions/1}, DOI={10.5067/IMR9D3PEI28U}, 
diff --git a/paper.md b/paper.md
@@ -39,10 +39,10 @@ bibliography: "paper.bib"
 ---
 
 # Summary
-Advances in cloud computing, remote sensing, and engineering are transforming earth system science into an increasingly data-intensive field, requiring students and scientists to learn a broad range of new skills related to scientific programming, data management, and cloud infrastructure [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @ramachandran_2021_open; @wagemann_2021_user]. This work contains educational modules designed to reduce barriers to interacting with large, complex, cloud-hosted remote sensing datasets using open-source computational tools and software. The goal of these materials is to demonstrate and promote the rigorous investigation of n-dimensional multi-sensor satellite imagery datasets through scientific programming. These tutorials feature publicly available satellite imagery with global coverage and commonly used sensors such as optical and synthetic aperture radar data with different levels of processing. We include thorough discussions of specific data formats and demonstrate access patterns for two popular cloud infrastructure platforms (Amazon Web Services and Microsoft Planetary Computer) as well as public cloud computational resources for remote sensing data processing at Alaska Satellite Facility (ASF).
+Advances in cloud computing, remote sensing, and engineering are transforming earth system science into an increasingly data-intensive field, requiring students and scientists to learn a broad range of new skills related to scientific programming, data management, and cloud infrastructure [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @ramachandran_2021_open; @wagemann_2021_user; wagemann_2022_FiveGuidingPrinciples]. This work contains educational modules designed to reduce barriers to interacting with large, complex, cloud-hosted remote sensing datasets using open-source computational tools and software. The goal of these materials is to demonstrate and promote the rigorous investigation of n-dimensional multi-sensor satellite imagery datasets through scientific programming. These tutorials feature publicly available satellite imagery with global coverage and commonly used sensors such as optical and synthetic aperture radar data with different levels of processing. We include thorough discussions of specific data formats and demonstrate access patterns for two popular cloud infrastructure platforms (Amazon Web Services and Microsoft Planetary Computer) as well as public cloud computational resources for remote sensing data processing at Alaska Satellite Facility (ASF).
 
 # Statement of Need
-Research on the transition to data-intensive, cloud-based science highlights the need for knowledge development to accompany technological advances in order to realize the benefit of these transformations [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @palumbo_2017_building; @radocaj_2020_global; @ramachandran_2021_open; @Sudmanns_2020_big; @wagemann_2021_user].
+Research on the transition to data-intensive, cloud-based science highlights the need for knowledge development to accompany technological advances in order to realize the benefit of these transformations [@abernathey_2021_cloud; @gentemann_2021_science; @guo_2017_big; @mathieu_2017_esas; @palumbo_2017_building; @radocaj_2020_global; @ramachandran_2021_open; @Sudmanns_2020_big; @wagemann_2021_user; @wagemann_2022_FiveGuidingPrinciples].
 
 These educational modules address this need and are guided by principles identified in Diataxis [@Procida_Diataxis_documentation_framework] in order to help analysts engage in data-driven scientific discovery using cloud-based data and open-source tools.