Skip to content

Commit 8413dda

Browse files
authored
Merge pull request #147 from Caltech-IPAC/raen/add/spherex-cutouts-and-psf
2 parents 9ae95a6 + 7a4d7fe commit 8413dda

File tree

5 files changed

+615
-0
lines changed

5 files changed

+615
-0
lines changed

deployed_notebooks_manifest.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,5 @@ tutorials/parquet-catalog-demos/neowise-source-table-strategies.md
2424
tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md
2525
tutorials/roman_simulations/roman_hlss_number_density.md
2626
tutorials/spherex/spherex_intro.md
27+
tutorials/spherex/spherex_cutouts.md
28+
tutorials/spherex/spherex_psf.md

toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ project:
66
- title: SPHEREx
77
children:
88
- file: tutorials/spherex/spherex_intro.md
9+
- file: tutorials/spherex/spherex_cutouts.md
10+
- file: tutorials/spherex/spherex_psf.md
911
- title: Euclid
1012
children:
1113
- title: Euclid Early Release Observations
Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.13
7+
jupytext_version: 1.17.3
8+
kernelspec:
9+
display_name: Python 3 (ipykernel)
10+
language: python
11+
name: python3
12+
---
13+
14+
(spherex-cutouts)=
15+
# Download a collection of SPHEREx Spectral Image cutouts as a multi-extension FITS file
16+
17+
## 1. Learning Goals
18+
19+
- Perform a query for the list of SPHEREx Spectral Image Multi-Extension FITS files (MEFs) that overlap a given coordinate.
20+
- Retrieve cutouts for every entry in this list and package the cutouts as a new MEF.
21+
- Learn how to use parallel or serial processing to retrieve the cutouts.
22+
23+
## 2. SPHEREx Overview
24+
25+
SPHEREx is a NASA Astrophysics Medium Explorer mission that launched in March 2025.
26+
During its planned two-year mission, SPHEREx will obtain 0.75-5 micron spectroscopy over the entire sky, with deeper data in the SPHEREx Deep Fields.
27+
SPHEREx data will be used to:
28+
29+
* **constrain the physics of inflation** by measuring its imprints on the three-dimensional large-scale distribution of matter,
30+
* **trace the history of galactic light production** through a deep multi-band measurement of large-scale clustering,
31+
* **investigate the abundance and composition of water and biogenic ices** in the early phases of star and planetary disk formation.
32+
33+
The community will also mine SPHEREx data and combine it with synergistic data sets to address a variety of additional topics in astrophysics.
34+
35+
More information is available in the [SPHEREx Explanatory Supplement](https://irsa.ipac.caltech.edu/data/SPHEREx/docs/SPHEREx_Expsupp_QR.pdf).
36+
37+
## 3. Imports
38+
39+
The following packages must be installed to run this notebook.
40+
41+
```{code-cell} ipython3
42+
# Uncomment the next line to install dependencies if needed.
43+
# !pip install numpy astropy pyvo matplotlib
44+
```
45+
46+
```{code-cell} ipython3
47+
import concurrent
48+
import time
49+
50+
import astropy.units as u
51+
import matplotlib.pyplot as plt
52+
import numpy as np
53+
import pyvo
54+
from astropy.coordinates import SkyCoord
55+
from astropy.io import fits
56+
from astropy.table import Table
57+
from astropy.wcs import WCS
58+
59+
# Suppress logging temporarily to prevent astropy
60+
# from repeatedly printing out warning notices related to alternate WCSs
61+
import logging
62+
logging.getLogger('astropy').setLevel(logging.ERROR)
63+
```
64+
65+
## 4. Specify inputs and outputs
66+
67+
Specify a right ascension, declination, cutout size, and a SPHEREx bandpass (e.g., 'SPHEREx-D2').
68+
69+
In this example, we are creating cutouts of the Pinwheel galaxy (M101) for the SPHEREx detector D2.
70+
71+
```{code-cell} ipython3
72+
# Choose a position.
73+
ra = 210.80227 * u.degree
74+
dec = 54.34895 * u.degree
75+
76+
# Choose a cutout size.
77+
size = 0.1 * u.degree
78+
79+
# Choose the bandpass of interest.
80+
bandpass = 'SPHEREx-D2'
81+
82+
# Choose an output filename root for the output MEF file.
83+
output_filename = 'spherex_cutouts_mef.fits'
84+
```
85+
86+
## 5. Query IRSA for a list of cutouts that satisfy the criteria specified above.
87+
88+
Here we show how to use the `pyvo` TAP SQL query to retrieve all images that overlap with the position defined above.
89+
This query will retrieve a table of URLs that link to the MEF cutouts.
90+
Each row in the table corresponds to a single cutout and includes the data access URL and an observation timestamp.
91+
The results are sorted from oldest to newest.
92+
93+
```{code-cell} ipython3
94+
# Define the service endpoint for IRSA's Table Access Protocol (TAP)
95+
# so that we can query SPHEREx metadata tables.
96+
service = pyvo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP")
97+
98+
# Define a query that will search the appropriate SPHEREx metadata tables
99+
# for spectral images that cover the chosen coordinates and match the
100+
# specified bandpass. Return the cutout data access URL and the time of observation.
101+
# Sort by observation time.
102+
query = f"""
103+
SELECT
104+
'https://irsa.ipac.caltech.edu/' || a.uri || '?center={ra.to(u.degree).value},{dec.to(u.degree).value}d&size={size.to(u.degree).value}' AS uri,
105+
p.time_bounds_lower
106+
FROM spherex.artifact a
107+
JOIN spherex.plane p ON a.planeid = p.planeid
108+
WHERE 1 = CONTAINS(POINT('ICRS', {ra.to(u.degree).value}, {dec.to(u.degree).value}), p.poly)
109+
AND p.energy_bandpassname = '{bandpass}'
110+
ORDER BY p.time_bounds_lower
111+
"""
112+
113+
# Execute the query and return as an astropy Table.
114+
t1 = time.time()
115+
results = service.search(query)
116+
print("Time to do TAP query: {:2.2f} seconds.".format(time.time() - t1))
117+
print("Number of images found: {}".format(len(results)))
118+
```
119+
120+
## 6. Define a function that processes a list of SPHEREx Spectral Image Cutouts
121+
122+
This function takes in a row of the catalog that we created above and does the following:
123+
124+
- It downloads the cutout
125+
- It computes the wavelength of the center pixel of the cutout (in micro-meters)
126+
- It combines the image HDUs into a new HDU and adds it to the table row.
127+
128+
Note that the values of the rows are being added in place.
129+
130+
```{code-cell} ipython3
131+
def process_cutout(row, ra, dec, cache):
132+
'''
133+
Downloads the cutouts given in a row of the table including all SPHEREx images overlapping with a position.
134+
135+
Parameters:
136+
===========
137+
138+
row : astropy.table row
139+
Row of a table that will be changed in place by this function. The table
140+
is created by the SQL TAP query.
141+
ra,dec : coordinates (astropy units)
142+
Ra and Dec coordinates (same as used for the TAP query) with attached astropy units
143+
cache : bool
144+
If set to `True`, the output of cached and the cutout processing will run faster next time.
145+
Turn this feature off by setting `cache = False`.
146+
'''
147+
148+
with fits.open(row["uri"], cache=cache) as hdulist:
149+
# There are seven HDUs:
150+
# 0 contains minimal metadata in the header and no data.
151+
# 1 through 6 are: IMAGE, FLAGS, VARIANCE, ZODI, PSF, WCS-WAVE
152+
header = hdulist[1].header
153+
154+
# Compute pixel coordinates corresponding to cutout position.
155+
spatial_wcs = WCS(header)
156+
x, y = spatial_wcs.world_to_pixel(SkyCoord(ra=ra, dec=dec, unit="deg", frame="icrs"))
157+
158+
# Compute wavelength at cutout position.
159+
spectral_wcs = WCS(header, fobj=hdulist, key="W")
160+
spectral_wcs.sip = None
161+
wavelength, bandpass = spectral_wcs.pixel_to_world(x, y)
162+
row["central_wavelength"] = wavelength.to(u.micrometer).value
163+
164+
# Collect the HDUs for this cutout and append the row's cutout_index to the EXTNAME.
165+
hdus = []
166+
for hdu in hdulist[1:]: # skip the primary header
167+
hdu.header["EXTNAME"] = f"{hdu.header['EXTNAME']}{row['cutout_index']}"
168+
hdus.append(hdu.copy()) # Copy so the data is available after the file is closed
169+
row["hdus"] = hdus
170+
```
171+
172+
## 7. Download the Cutouts
173+
174+
This process can take a while.
175+
If run in series, it can take about 5 minutes for 700 images on a typical laptop machine.
176+
Here, we therefore exploit two different methods.
177+
First we show the serial approach and next we show how to parallelize the methods.
178+
The latter can be run on many CPUs and is therefore significantly faster.
179+
180+
### 7.1 Serial Approach
181+
182+
First, we implement the serial approach -- a simple `for` loop.
183+
Before that, we turn the results into an astropy table and add some place holders that will be filled in by the `process_cutout()` function.
184+
185+
```{warning}
186+
Running the cell below may take a while for a large number of cutouts.
187+
Approximately 5-7 minutes for 700 images of cutout size 0.01 degree on a typical machine.
188+
```
189+
190+
```{tip}
191+
The astropy `fits.open()` supports a caching argument.
192+
This can be passed through in the `process_cutout()` function.
193+
If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!).
194+
The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`).
195+
If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred.
196+
197+
To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management).
198+
```
199+
200+
```{code-cell} ipython3
201+
results_table_serial = results.to_table()
202+
results_table_serial["cutout_index"] = range(1, len(results_table_serial) + 1)
203+
results_table_serial["central_wavelength"] = np.full(len(results_table_serial), np.nan)
204+
results_table_serial["hdus"] = np.full(len(results_table_serial), None)
205+
206+
t1 = time.time()
207+
for row in results_table_serial:
208+
process_cutout(row, ra, dec, cache=False)
209+
print("Time to create cutouts in serial mode: {:2.2f} minutes.".format((time.time() - t1) / 60))
210+
```
211+
212+
### 7.2 Parallel Approach
213+
214+
Next, we implement parallel processing, which will make the cutout creation faster.
215+
The maximal number of workers can be limited by setting the `max_workers` argument.
216+
The choice of this value depends on the number of cores but also on the number of parallel calls that can be digested by the IRSA server.
217+
218+
```{tip}
219+
A good value for the maximum number of workers is between 7 and 12 for a machine with 8 cores.
220+
```
221+
222+
```{tip}
223+
The astropy `fits.open()` supports a caching argument.
224+
This can be passed through in the `process_cutout()` function.
225+
If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!).
226+
The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`).
227+
If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred.
228+
229+
To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management).
230+
```
231+
232+
Again, before running the cutout processing we define some place holders.
233+
234+
```{code-cell} ipython3
235+
results_table_parallel = results.to_table()
236+
results_table_parallel["cutout_index"] = range(1, len(results_table_parallel) + 1)
237+
results_table_parallel["central_wavelength"] = np.full(len(results_table_parallel), np.nan)
238+
results_table_parallel["hdus"] = np.full(len(results_table_parallel), None)
239+
240+
t1 = time.time()
241+
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
242+
futures = [executor.submit(process_cutout, row, ra, dec, False) for row in results_table_parallel]
243+
concurrent.futures.wait(futures)
244+
print("Time to create cutouts in parallel mode: {:2.2f} minutes.".format((time.time() - t1) / 60))
245+
```
246+
247+
## 8. Create a summary table HDU with renamed columns
248+
249+
In the following, we continue to use the output of the parallel mode.
250+
The following cell does the following:
251+
252+
- Create a summary FITS table.
253+
- Create the final FITS HDU including the summary table.
254+
255+
```{code-cell} ipython3
256+
# Create a summary table HDU with renamed columns
257+
cols = fits.ColDefs([
258+
fits.Column(name="cutout_index", format="J", array=results_table_parallel["cutout_index"], unit=""),
259+
fits.Column(name="observation_date", format="D", array=results_table_parallel["time_bounds_lower"], unit="d"),
260+
fits.Column(name="central_wavelength", format="D", array=results_table_parallel["central_wavelength"], unit="um"),
261+
fits.Column(name="access_url", format="A200", array=results_table_parallel["uri"], unit=""),
262+
])
263+
table_hdu = fits.BinTableHDU.from_columns(cols)
264+
table_hdu.header["EXTNAME"] = "CUTOUT_INFO"
265+
```
266+
267+
## 9. Create the final MEF
268+
269+
Now, we create a primary HDU and combine it with the summary table HDU and the cutout image and wavelength HDUs to a final HDU list.
270+
271+
```{code-cell} ipython3
272+
primary_hdu = fits.PrimaryHDU()
273+
hdulist_list = [primary_hdu, table_hdu]
274+
hdulist_list.extend(hdu for fits_hdulist in results_table_parallel["hdus"] for hdu in fits_hdulist)
275+
combined_hdulist = fits.HDUList(hdulist_list)
276+
```
277+
278+
## 10. Write the final MEF
279+
280+
Finally, we save the full HDU list to a multi-extension FITS file to disk.
281+
282+
```{code-cell} ipython3
283+
# Write the final MEF
284+
combined_hdulist.writeto(output_filename, overwrite=True)
285+
```
286+
287+
## 11. Test and visualize the final result
288+
289+
We can now open the new MEF FITS file that we have created above.
290+
291+
Loading the summary table (including wavelength information of each cutout) is straight forward:
292+
293+
```{code-cell} ipython3
294+
summary_table = Table.read(output_filename , hdu=1)
295+
```
296+
297+
Let's also extract the first 10 images (or all of the extracted cutouts if less than 10).
298+
299+
```{code-cell} ipython3
300+
nbr_images = np.nanmin([10, len(summary_table)])
301+
imgs = []
302+
with fits.open(output_filename) as hdul:
303+
for ii in range(nbr_images):
304+
extname = "IMAGE{}".format(summary_table["cutout_index"][ii])
305+
imgs.append(hdul[extname].data)
306+
```
307+
308+
Plot the images with the wavelength corresponding to their central pixel.
309+
310+
```{code-cell} ipython3
311+
fig = plt.figure(figsize=(15, 5))
312+
axs = [fig.add_subplot(2, 5, ii + 1) for ii in range(10)]
313+
314+
for ii, img in enumerate(imgs):
315+
axs[ii].imshow(imgs[ii], norm="log", origin="lower")
316+
axs[ii].text(0.05, 0.05, r"$\lambda_{\rm center} = %2.4g \,{\rm \mu m}$" % summary_table["central_wavelength"][ii],
317+
va="bottom", ha="left", color="white", transform=axs[ii].transAxes)
318+
319+
plt.show()
320+
```
321+
322+
## Acknowledgements
323+
324+
- [Caltech/IPAC-IRSA](https://irsa.ipac.caltech.edu/)
325+
326+
## About this notebook
327+
328+
**Authors:** IRSA Data Science Team, including Vandana Desai, Andreas Faisst, Troy Raen, Brigitta Sipőcz, Jessica Krick, Shoubaneh Hemmati
329+
330+
**Updated:** 2025-09-30
331+
332+
**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or problems.
333+
334+
**Runtime:** As of the date above, this notebook takes about 3 minutes to run to completion on a machine with 8GB RAM and 4 CPU.

tutorials/spherex/spherex_intro.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ kernelspec:
1212
name: python3
1313
---
1414

15+
(spherex-intro)=
1516
# Introduction to SPHEREx Spectral Images
1617

1718
+++

0 commit comments

Comments
 (0)