Merge pull request #211 from troyraen/raen/issues/191/add-euclid-hats-tutorials-II

troyraen · web-flow · commit 02a9ff9b02a1 · 2025-12-23T17:26:31.000-08:00
Add Euclid HATS Magnitude tutorial
diff --git a/toc.yml b/toc.yml
@@ -16,7 +16,9 @@ project:
         - file: tutorials/euclid_access/2_Euclid_intro_MER_catalog.md
         - file: tutorials/euclid_access/4_Euclid_intro_PHZ_catalog.md
         - file: tutorials/euclid_access/5_Euclid_intro_SPE_catalog.md
-        - file: tutorials/parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md
+        - title: Merged Objects HATS Catalog
+          children:
+            - pattern: tutorials/parquet-catalog-demos/euclid-q1-hats/*-euclid-q1-hats-*.md
         - file: tutorials/cloud_access/euclid-cloud-access.md
         - file: tutorials/euclid_access/Euclid_ERO.md
     - title: WISE
diff --git a/tutorials/euclid_access/euclid.md b/tutorials/euclid_access/euclid.md
@@ -24,7 +24,9 @@ Data products include MERged mosaics of calibrated and stacked frames; combined
 - [PHZ Catalogs](4_Euclid_intro_PHZ_catalog.md) — Join the PHZ and MER catalogs and do a box search for galaxies with quality redshifts, load a MER mosaic cutout of the box, and plot the cutout with the catalog results overlaid.
   Then plot the SIR spectrum of the brightest galaxy and look at a MER mosaic cutout of the galaxy in Firefly.
 - [SPE Catalogs](5_Euclid_intro_SPE_catalog.md) — Join the SPE and MER catalogs and query for galaxies with H-alpha line detections, then plot the SIR spectrum of a galaxy with a high SNR H-alpha line measurement.
-- [Merged Objects HATS Catalog](../parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md) — Understand the content and format of the Euclid Q1 Merged Objects HATS Catalog, then perform a basic query.
+- **Merged Objects HATS Catalog** — This product was created by IRSA and contains the Euclid MER, PHZ, and SPE catalogs in a single [HATS](https://hats.readthedocs.io/en/latest/) catalog.
+  - [Introduction](../parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md) — Understand the content and format of the Euclid Q1 Merged Objects HATS Catalog, then perform a basic query.
+  - [Magnitudes](../parquet-catalog-demos/euclid-q1-hats/4-euclid-q1-hats-magnitudes.md) — Review the types of flux measurements available, load template-fit and aperture magnitudes, and plot distributions and comparisons for different object types.
 
 ## Special Topics
 
diff --git a/tutorials/parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md b/tutorials/parquet-catalog-demos/euclid-q1-hats/1-euclid-q1-hats-intro.md
@@ -1,11 +1,12 @@
 ---
-short_title: "Merged Objects HATS Catalog"
+short_title: Introduction
 jupytext:
   text_representation:
     extension: .md
     format_name: myst
     format_version: 0.13
     jupytext_version: 1.18.1
+  root_level_metadata_filter: -short_title
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -18,16 +19,17 @@ kernelspec:
 
 This tutorial is an introduction to the content and format of the Euclid Q1 Merged Objects HATS Catalog.
 Later tutorials in this series will show how to load quality samples.
+See [Euclid Tutorial Notebooks: Catalogs](../../euclid_access/euclid.md#catalogs) for a list of tutorials in this series.
 
 +++
 
 ## Learning Goals
 
 In this tutorial, we will:
 
-- Learn about the Euclid Merged Objects catalog that IRSA created by combining information from multiple Euclid Quick Release 1 catalogs
+- Learn about the Euclid Merged Objects catalog that IRSA created by combining information from multiple Euclid Quick Release 1 (Q1) catalogs.
 - Find columns of interest.
-- Perform a basic spatial query in each of the Euclid Deep Fields using the Python library PyArrow.
+- Perform a basic query using the Python library PyArrow.
 
 +++
 
@@ -51,12 +53,12 @@ Access is free and no credentials are required.
 
 ## 2. Imports
 
-```{code-cell} python3
+```{code-cell} ipython3
 # # Uncomment the next line to install dependencies if needed.
 # %pip install hpgeom pandas pyarrow
 ```
 
-```{code-cell} python3
+```{code-cell} ipython3
 import hpgeom  # Find HEALPix indexes from RA and Dec
 import pyarrow.compute as pc  # Filter the catalog
 import pyarrow.dataset  # Load the catalog
@@ -70,7 +72,7 @@ First we'll load the Parquet schema (column information) of the Merged Objects c
 The Parquet schema is accessible from a few locations, all of which include the column names and types.
 Here, we load it from the `_common_metadata` file because it also includes the column units and descriptions.
 
-```{code-cell} python3
+```{code-cell} ipython3
 # AWS S3 paths.
 s3_bucket = "nasa-irsa-euclid-q1"
 dataset_prefix = "contributed/q1/merged_objects/hats/euclid_q1_merged_objects-hats/dataset"
@@ -82,7 +84,7 @@ schema_path = f"{dataset_path}/_common_metadata"
 s3 = pyarrow.fs.S3FileSystem(anonymous=True)
 ```
 
-```{code-cell} python3
+```{code-cell} ipython3
 # Load the Parquet schema.
 schema = pyarrow.parquet.read_schema(schema_path, filesystem=s3)
 
@@ -136,7 +138,7 @@ The tables are:
 
 Find all columns from these tables in the Parquet schema:
 
-```{code-cell} python3
+```{code-cell} ipython3
 mer_prefixes = ["mer_", "morph_", "cutouts_"]
 mer_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in mer_prefixes}
 
@@ -193,7 +195,7 @@ The tables are:
 
 Find all columns from these tables in the Parquet schema:
 
-```{code-cell} python3
+```{code-cell} ipython3
 phz_prefixes = ["phz_", "class_", "physparam_", "galaxysed_", "physparamqso_",
                 "starclass_", "starsed_", "physparamnir_"]
 phz_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in phz_prefixes}
@@ -240,7 +242,7 @@ The tables are:
 
 Find all columns from these tables in the Parquet schema:
 
-```{code-cell} python3
+```{code-cell} ipython3
 spe_prefixes = ["z_", "lines_", "models_"]
 spe_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in spe_prefixes}
 
@@ -272,7 +274,7 @@ They are useful for spatial queries, as demonstrated in the Euclid Deep Fields s
 
 The HEALPix, Euclid object ID, and Euclid tile ID columns appear first:
 
-```{code-cell} python3
+```{code-cell} ipython3
 schema.names[:5]
 ```
 
@@ -288,7 +290,7 @@ However, PyArrow automatically makes them available as regular columns when the
 
 The HATS columns appear at the end:
 
-```{code-cell} python3
+```{code-cell} ipython3
 schema.names[-3:]
 ```
 
@@ -297,12 +299,12 @@ schema.names[-3:]
 The subsections above show how to find all columns from a given Euclid table as well as the additional columns.
 Here we show some additional techniques for finding columns.
 
-```{code-cell} python3
+```{code-cell} ipython3
 # Access the data type using the `field` method.
 schema.field("mer_flux_y_2fwhm_aper")
 ```
 
-```{code-cell} python3
+```{code-cell} ipython3
 # The column metadata includes unit and description.
 # Parquet metadata is always stored as bytestrings, which are denoted by a leading 'b'.
 schema.field("mer_flux_y_2fwhm_aper").metadata
@@ -311,7 +313,7 @@ schema.field("mer_flux_y_2fwhm_aper").metadata
 Euclid Q1 offers many flux measurements, both from Euclid detections and from external ground-based surveys.
 They are given in microjanskys, so all flux columns can be found by searching the metadata for this unit.
 
-```{code-cell} python3
+```{code-cell} ipython3
 # Find all flux columns.
 flux_columns = [field.name for field in schema if field.metadata[b"unit"] == b"uJy"]
 
@@ -321,7 +323,7 @@ flux_columns[:4]
 
 Columns associated with external surveys are identified by the inclusion of "ext" in the name.
 
-```{code-cell} python3
+```{code-cell} ipython3
 external_flux_columns = [name for name in flux_columns if "ext" in name]
 print(f"{len(external_flux_columns)} flux columns from external surveys. First four are:")
 external_flux_columns[:4]
@@ -332,14 +334,14 @@ external_flux_columns[:4]
 +++
 
 Euclid Q1 includes data from three Euclid Deep Fields: EDF-N (North), EDF-S (South), EDF-F (Fornax; also in the southern hemisphere).
-There is also a small amount of data from a fourth field: LDN1641 (Lynds' Dark Nebula 1641), which was observed for technical reasons during Euclid's verification phase and mostly ignored here.
+There is also a small amount of data from a fourth field: LDN1641 (Lynds' Dark Nebula 1641), which was observed for technical reasons during Euclid's verification phase.
 The fields are described in [Euclid Collaboration: Aussel et al., 2025](https://arxiv.org/pdf/2503.15302) and can be seen on this [skymap](https://irsa.ipac.caltech.edu/data/download/parquet/euclid/q1/merged_objects/hats/euclid_q1_merged_objects-hats/skymap.png).
 
 The regions are well separated, so we can distinguish them using a simple cone search without having to be too picky about the radius.
 We can load data more efficiently using the HEALPix order 9 pixels that cover each area rather than using RA and Dec values directly.
 These will be used in later tutorials.
 
-```{code-cell} python3
+```{code-cell} ipython3
 # EDF-N (Euclid Deep Field - North)
 ra, dec, radius = 269.733, 66.018, 4  # 20 sq deg
 edfn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)
@@ -360,9 +362,10 @@ ldn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, i
 ## 6. Basic Query
 
 To demonstrate a basic query, we'll search for objects with a galaxy photometric redshift estimate of 6.0 (largest possible).
-Other tutorials in this series will show more complex queries and describe the redshifts and other data in more detail.
+Other tutorials in this series will show more complex queries, and describe the redshifts and other data in more detail.
+PyArrow dataset filters are described at [Filtering by Expressions](https://arrow.apache.org/docs/python/compute.html#filtering-by-expressions), and the list of available functions is at [Compute Functions](https://arrow.apache.org/docs/python/api/compute.html).
 
-```{code-cell} python3
+```{code-cell} ipython3
 dataset = pyarrow.dataset.dataset(dataset_path, partitioning="hive", filesystem=s3, schema=schema)
 
 highz_objects = dataset.to_table(
@@ -375,6 +378,6 @@ highz_objects
 
 **Authors:** Troy Raen, Vandana Desai, Andreas Faisst, Shoubaneh Hemmati, Jaladh Singhal, Brigitta Sipőcz, Jessica Krick, the IRSA Data Science Team, and the Euclid NASA Science Center at IPAC (ENSCI).
 
-**Updated:** 2025-12-22
+**Updated:** 2025-12-23
 
 **Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html)
diff --git a/tutorials/parquet-catalog-demos/euclid-q1-hats/4-euclid-q1-hats-magnitudes.md b/tutorials/parquet-catalog-demos/euclid-q1-hats/4-euclid-q1-hats-magnitudes.md