|
| 1 | +--- |
| 2 | +jupytext: |
| 3 | + text_representation: |
| 4 | + extension: .md |
| 5 | + format_name: myst |
| 6 | + format_version: 0.13 |
| 7 | + jupytext_version: 1.17.3 |
| 8 | +kernelspec: |
| 9 | + display_name: Python 3 (ipykernel) |
| 10 | + language: python |
| 11 | + name: python3 |
| 12 | +--- |
| 13 | + |
| 14 | +(spherex-cutouts)= |
| 15 | +# Download a collection of SPHEREx Spectral Image cutouts as a multi-extension FITS file |
| 16 | + |
| 17 | +## 1. Learning Goals |
| 18 | + |
| 19 | +- Perform a query for the list of SPHEREx Spectral Image Multi-Extension FITS files (MEFs) that overlap a given coordinate. |
| 20 | +- Retrieve cutouts for every entry in this list and package the cutouts as a new MEF. |
| 21 | +- Learn how to use parallel or serial processing to retrieve the cutouts. |
| 22 | + |
| 23 | +## 2. SPHEREx Overview |
| 24 | + |
| 25 | +SPHEREx is a NASA Astrophysics Medium Explorer mission that launched in March 2025. |
| 26 | +During its planned two-year mission, SPHEREx will obtain 0.75-5 micron spectroscopy over the entire sky, with deeper data in the SPHEREx Deep Fields. |
| 27 | +SPHEREx data will be used to: |
| 28 | + |
| 29 | +* **constrain the physics of inflation** by measuring its imprints on the three-dimensional large-scale distribution of matter, |
| 30 | +* **trace the history of galactic light production** through a deep multi-band measurement of large-scale clustering, |
| 31 | +* **investigate the abundance and composition of water and biogenic ices** in the early phases of star and planetary disk formation. |
| 32 | + |
| 33 | +The community will also mine SPHEREx data and combine it with synergistic data sets to address a variety of additional topics in astrophysics. |
| 34 | + |
| 35 | +More information is available in the [SPHEREx Explanatory Supplement](https://irsa.ipac.caltech.edu/data/SPHEREx/docs/SPHEREx_Expsupp_QR.pdf). |
| 36 | + |
| 37 | +## 3. Imports |
| 38 | + |
| 39 | +The following packages must be installed to run this notebook. |
| 40 | + |
| 41 | +```{code-cell} ipython3 |
| 42 | +# Uncomment the next line to install dependencies if needed. |
| 43 | +# !pip install numpy astropy pyvo matplotlib |
| 44 | +``` |
| 45 | + |
| 46 | +```{code-cell} ipython3 |
| 47 | +import concurrent |
| 48 | +import time |
| 49 | +
|
| 50 | +import astropy.units as u |
| 51 | +import matplotlib.pyplot as plt |
| 52 | +import numpy as np |
| 53 | +import pyvo |
| 54 | +from astropy.coordinates import SkyCoord |
| 55 | +from astropy.io import fits |
| 56 | +from astropy.table import Table |
| 57 | +from astropy.wcs import WCS |
| 58 | +
|
| 59 | +# Suppress logging temporarily to prevent astropy |
| 60 | +# from repeatedly printing out warning notices related to alternate WCSs |
| 61 | +import logging |
| 62 | +logging.getLogger('astropy').setLevel(logging.ERROR) |
| 63 | +``` |
| 64 | + |
| 65 | +## 4. Specify inputs and outputs |
| 66 | + |
| 67 | +Specify a right ascension, declination, cutout size, and a SPHEREx bandpass (e.g., 'SPHEREx-D2'). |
| 68 | + |
| 69 | +In this example, we are creating cutouts of the Pinwheel galaxy (M101) for the SPHEREx detector D2. |
| 70 | + |
| 71 | +```{code-cell} ipython3 |
| 72 | +# Choose a position. |
| 73 | +ra = 210.80227 * u.degree |
| 74 | +dec = 54.34895 * u.degree |
| 75 | +
|
| 76 | +# Choose a cutout size. |
| 77 | +size = 0.1 * u.degree |
| 78 | +
|
| 79 | +# Choose the bandpass of interest. |
| 80 | +bandpass = 'SPHEREx-D2' |
| 81 | +
|
| 82 | +# Choose an output filename root for the output MEF file. |
| 83 | +output_filename = 'spherex_cutouts_mef.fits' |
| 84 | +``` |
| 85 | + |
| 86 | +## 5. Query IRSA for a list of cutouts that satisfy the criteria specified above. |
| 87 | + |
| 88 | +Here we show how to use the `pyvo` TAP SQL query to retrieve all images that overlap with the position defined above. |
| 89 | +This query will retrieve a table of URLs that link to the MEF cutouts. |
| 90 | +Each row in the table corresponds to a single cutout and includes the data access URL and an observation timestamp. |
| 91 | +The results are sorted from oldest to newest. |
| 92 | + |
| 93 | +```{code-cell} ipython3 |
| 94 | +# Define the service endpoint for IRSA's Table Access Protocol (TAP) |
| 95 | +# so that we can query SPHEREx metadata tables. |
| 96 | +service = pyvo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP") |
| 97 | +
|
| 98 | +# Define a query that will search the appropriate SPHEREx metadata tables |
| 99 | +# for spectral images that cover the chosen coordinates and match the |
| 100 | +# specified bandpass. Return the cutout data access URL and the time of observation. |
| 101 | +# Sort by observation time. |
| 102 | +query = f""" |
| 103 | +SELECT |
| 104 | + 'https://irsa.ipac.caltech.edu/' || a.uri || '?center={ra.to(u.degree).value},{dec.to(u.degree).value}d&size={size.to(u.degree).value}' AS uri, |
| 105 | + p.time_bounds_lower |
| 106 | +FROM spherex.artifact a |
| 107 | +JOIN spherex.plane p ON a.planeid = p.planeid |
| 108 | +WHERE 1 = CONTAINS(POINT('ICRS', {ra.to(u.degree).value}, {dec.to(u.degree).value}), p.poly) |
| 109 | + AND p.energy_bandpassname = '{bandpass}' |
| 110 | +ORDER BY p.time_bounds_lower |
| 111 | +""" |
| 112 | +
|
| 113 | +# Execute the query and return as an astropy Table. |
| 114 | +t1 = time.time() |
| 115 | +results = service.search(query) |
| 116 | +print("Time to do TAP query: {:2.2f} seconds.".format(time.time() - t1)) |
| 117 | +print("Number of images found: {}".format(len(results))) |
| 118 | +``` |
| 119 | + |
| 120 | +## 6. Define a function that processes a list of SPHEREx Spectral Image Cutouts |
| 121 | + |
| 122 | +This function takes in a row of the catalog that we created above and does the following: |
| 123 | + |
| 124 | +- It downloads the cutout |
| 125 | +- It computes the wavelength of the center pixel of the cutout (in micro-meters) |
| 126 | +- It combines the image HDUs into a new HDU and adds it to the table row. |
| 127 | + |
| 128 | +Note that the values of the rows are being added in place. |
| 129 | + |
| 130 | +```{code-cell} ipython3 |
| 131 | +def process_cutout(row, ra, dec, cache): |
| 132 | + ''' |
| 133 | + Downloads the cutouts given in a row of the table including all SPHEREx images overlapping with a position. |
| 134 | +
|
| 135 | + Parameters: |
| 136 | + =========== |
| 137 | +
|
| 138 | + row : astropy.table row |
| 139 | + Row of a table that will be changed in place by this function. The table |
| 140 | + is created by the SQL TAP query. |
| 141 | + ra,dec : coordinates (astropy units) |
| 142 | + Ra and Dec coordinates (same as used for the TAP query) with attached astropy units |
| 143 | + cache : bool |
| 144 | + If set to `True`, the output of cached and the cutout processing will run faster next time. |
| 145 | + Turn this feature off by setting `cache = False`. |
| 146 | + ''' |
| 147 | +
|
| 148 | + with fits.open(row["uri"], cache=cache) as hdulist: |
| 149 | + # There are seven HDUs: |
| 150 | + # 0 contains minimal metadata in the header and no data. |
| 151 | + # 1 through 6 are: IMAGE, FLAGS, VARIANCE, ZODI, PSF, WCS-WAVE |
| 152 | + header = hdulist[1].header |
| 153 | +
|
| 154 | + # Compute pixel coordinates corresponding to cutout position. |
| 155 | + spatial_wcs = WCS(header) |
| 156 | + x, y = spatial_wcs.world_to_pixel(SkyCoord(ra=ra, dec=dec, unit="deg", frame="icrs")) |
| 157 | +
|
| 158 | + # Compute wavelength at cutout position. |
| 159 | + spectral_wcs = WCS(header, fobj=hdulist, key="W") |
| 160 | + spectral_wcs.sip = None |
| 161 | + wavelength, bandpass = spectral_wcs.pixel_to_world(x, y) |
| 162 | + row["central_wavelength"] = wavelength.to(u.micrometer).value |
| 163 | +
|
| 164 | + # Collect the HDUs for this cutout and append the row's cutout_index to the EXTNAME. |
| 165 | + hdus = [] |
| 166 | + for hdu in hdulist[1:]: # skip the primary header |
| 167 | + hdu.header["EXTNAME"] = f"{hdu.header['EXTNAME']}{row['cutout_index']}" |
| 168 | + hdus.append(hdu.copy()) # Copy so the data is available after the file is closed |
| 169 | + row["hdus"] = hdus |
| 170 | +``` |
| 171 | + |
| 172 | +## 7. Download the Cutouts |
| 173 | + |
| 174 | +This process can take a while. |
| 175 | +If run in series, it can take about 5 minutes for 700 images on a typical laptop machine. |
| 176 | +Here, we therefore exploit two different methods. |
| 177 | +First we show the serial approach and next we show how to parallelize the methods. |
| 178 | +The latter can be run on many CPUs and is therefore significantly faster. |
| 179 | + |
| 180 | +### 7.1 Serial Approach |
| 181 | + |
| 182 | +First, we implement the serial approach -- a simple `for` loop. |
| 183 | +Before that, we turn the results into an astropy table and add some place holders that will be filled in by the `process_cutout()` function. |
| 184 | + |
| 185 | +```{warning} |
| 186 | +Running the cell below may take a while for a large number of cutouts. |
| 187 | +Approximately 5-7 minutes for 700 images of cutout size 0.01 degree on a typical machine. |
| 188 | +``` |
| 189 | + |
| 190 | +```{tip} |
| 191 | +The astropy `fits.open()` supports a caching argument. |
| 192 | +This can be passed through in the `process_cutout()` function. |
| 193 | +If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!). |
| 194 | +The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`). |
| 195 | +If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred. |
| 196 | +
|
| 197 | +To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management). |
| 198 | +``` |
| 199 | + |
| 200 | +```{code-cell} ipython3 |
| 201 | +results_table_serial = results.to_table() |
| 202 | +results_table_serial["cutout_index"] = range(1, len(results_table_serial) + 1) |
| 203 | +results_table_serial["central_wavelength"] = np.full(len(results_table_serial), np.nan) |
| 204 | +results_table_serial["hdus"] = np.full(len(results_table_serial), None) |
| 205 | +
|
| 206 | +t1 = time.time() |
| 207 | +for row in results_table_serial: |
| 208 | + process_cutout(row, ra, dec, cache=False) |
| 209 | +print("Time to create cutouts in serial mode: {:2.2f} minutes.".format((time.time() - t1) / 60)) |
| 210 | +``` |
| 211 | + |
| 212 | +### 7.2 Parallel Approach |
| 213 | + |
| 214 | +Next, we implement parallel processing, which will make the cutout creation faster. |
| 215 | +The maximal number of workers can be limited by setting the `max_workers` argument. |
| 216 | +The choice of this value depends on the number of cores but also on the number of parallel calls that can be digested by the IRSA server. |
| 217 | + |
| 218 | +```{tip} |
| 219 | +A good value for the maximum number of workers is between 7 and 12 for a machine with 8 cores. |
| 220 | +``` |
| 221 | + |
| 222 | +```{tip} |
| 223 | +The astropy `fits.open()` supports a caching argument. |
| 224 | +This can be passed through in the `process_cutout()` function. |
| 225 | +If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!). |
| 226 | +The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`). |
| 227 | +If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred. |
| 228 | +
|
| 229 | +To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management). |
| 230 | +``` |
| 231 | + |
| 232 | +Again, before running the cutout processing we define some place holders. |
| 233 | + |
| 234 | +```{code-cell} ipython3 |
| 235 | +results_table_parallel = results.to_table() |
| 236 | +results_table_parallel["cutout_index"] = range(1, len(results_table_parallel) + 1) |
| 237 | +results_table_parallel["central_wavelength"] = np.full(len(results_table_parallel), np.nan) |
| 238 | +results_table_parallel["hdus"] = np.full(len(results_table_parallel), None) |
| 239 | +
|
| 240 | +t1 = time.time() |
| 241 | +with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor: |
| 242 | + futures = [executor.submit(process_cutout, row, ra, dec, False) for row in results_table_parallel] |
| 243 | + concurrent.futures.wait(futures) |
| 244 | +print("Time to create cutouts in parallel mode: {:2.2f} minutes.".format((time.time() - t1) / 60)) |
| 245 | +``` |
| 246 | + |
| 247 | +## 8. Create a summary table HDU with renamed columns |
| 248 | + |
| 249 | +In the following, we continue to use the output of the parallel mode. |
| 250 | +The following cell does the following: |
| 251 | + |
| 252 | +- Create a summary FITS table. |
| 253 | +- Create the final FITS HDU including the summary table. |
| 254 | + |
| 255 | +```{code-cell} ipython3 |
| 256 | +# Create a summary table HDU with renamed columns |
| 257 | +cols = fits.ColDefs([ |
| 258 | + fits.Column(name="cutout_index", format="J", array=results_table_parallel["cutout_index"], unit=""), |
| 259 | + fits.Column(name="observation_date", format="D", array=results_table_parallel["time_bounds_lower"], unit="d"), |
| 260 | + fits.Column(name="central_wavelength", format="D", array=results_table_parallel["central_wavelength"], unit="um"), |
| 261 | + fits.Column(name="access_url", format="A200", array=results_table_parallel["uri"], unit=""), |
| 262 | +]) |
| 263 | +table_hdu = fits.BinTableHDU.from_columns(cols) |
| 264 | +table_hdu.header["EXTNAME"] = "CUTOUT_INFO" |
| 265 | +``` |
| 266 | + |
| 267 | +## 9. Create the final MEF |
| 268 | + |
| 269 | +Now, we create a primary HDU and combine it with the summary table HDU and the cutout image and wavelength HDUs to a final HDU list. |
| 270 | + |
| 271 | +```{code-cell} ipython3 |
| 272 | +primary_hdu = fits.PrimaryHDU() |
| 273 | +hdulist_list = [primary_hdu, table_hdu] |
| 274 | +hdulist_list.extend(hdu for fits_hdulist in results_table_parallel["hdus"] for hdu in fits_hdulist) |
| 275 | +combined_hdulist = fits.HDUList(hdulist_list) |
| 276 | +``` |
| 277 | + |
| 278 | +## 10. Write the final MEF |
| 279 | + |
| 280 | +Finally, we save the full HDU list to a multi-extension FITS file to disk. |
| 281 | + |
| 282 | +```{code-cell} ipython3 |
| 283 | +# Write the final MEF |
| 284 | +combined_hdulist.writeto(output_filename, overwrite=True) |
| 285 | +``` |
| 286 | + |
| 287 | +## 11. Test and visualize the final result |
| 288 | + |
| 289 | +We can now open the new MEF FITS file that we have created above. |
| 290 | + |
| 291 | +Loading the summary table (including wavelength information of each cutout) is straight forward: |
| 292 | + |
| 293 | +```{code-cell} ipython3 |
| 294 | +summary_table = Table.read(output_filename , hdu=1) |
| 295 | +``` |
| 296 | + |
| 297 | +Let's also extract the first 10 images (or all of the extracted cutouts if less than 10). |
| 298 | + |
| 299 | +```{code-cell} ipython3 |
| 300 | +nbr_images = np.nanmin([10, len(summary_table)]) |
| 301 | +imgs = [] |
| 302 | +with fits.open(output_filename) as hdul: |
| 303 | + for ii in range(nbr_images): |
| 304 | + extname = "IMAGE{}".format(summary_table["cutout_index"][ii]) |
| 305 | + imgs.append(hdul[extname].data) |
| 306 | +``` |
| 307 | + |
| 308 | +Plot the images with the wavelength corresponding to their central pixel. |
| 309 | + |
| 310 | +```{code-cell} ipython3 |
| 311 | +fig = plt.figure(figsize=(15, 5)) |
| 312 | +axs = [fig.add_subplot(2, 5, ii + 1) for ii in range(10)] |
| 313 | +
|
| 314 | +for ii, img in enumerate(imgs): |
| 315 | + axs[ii].imshow(imgs[ii], norm="log", origin="lower") |
| 316 | + axs[ii].text(0.05, 0.05, r"$\lambda_{\rm center} = %2.4g \,{\rm \mu m}$" % summary_table["central_wavelength"][ii], |
| 317 | + va="bottom", ha="left", color="white", transform=axs[ii].transAxes) |
| 318 | +
|
| 319 | +plt.show() |
| 320 | +``` |
| 321 | + |
| 322 | +## Acknowledgements |
| 323 | + |
| 324 | +- [Caltech/IPAC-IRSA](https://irsa.ipac.caltech.edu/) |
| 325 | + |
| 326 | +## About this notebook |
| 327 | + |
| 328 | +**Authors:** IRSA Data Science Team, including Vandana Desai, Andreas Faisst, Troy Raen, Brigitta Sipőcz, Jessica Krick, Shoubaneh Hemmati |
| 329 | + |
| 330 | +**Updated:** 2025-09-30 |
| 331 | + |
| 332 | +**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or problems. |
| 333 | + |
| 334 | +**Runtime:** As of the date above, this notebook takes about 3 minutes to run to completion on a machine with 8GB RAM and 4 CPU. |
0 commit comments