-
Notifications
You must be signed in to change notification settings - Fork 5
Add spherex_cutouts and spherex_psf #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8206f5b
3763cb9
16daf32
a120ea2
9af68c6
fa99485
3caa02c
b9c0f3a
bc75f76
44f4bcb
9ca87f0
813d7f2
7a4d7fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,334 @@ | ||
| --- | ||
| jupytext: | ||
| text_representation: | ||
| extension: .md | ||
| format_name: myst | ||
| format_version: 0.13 | ||
| jupytext_version: 1.17.3 | ||
| kernelspec: | ||
| display_name: Python 3 (ipykernel) | ||
| language: python | ||
| name: python3 | ||
| --- | ||
|
|
||
| (spherex-cutouts)= | ||
| # Download a collection of SPHEREx Spectral Image cutouts as a multi-extension FITS file | ||
|
|
||
| ## 1. Learning Goals | ||
|
|
||
| - Perform a query for the list of SPHEREx Spectral Image Multi-Extension FITS files (MEFs) that overlap a given coordinate. | ||
| - Retrieve cutouts for every entry in this list and package the cutouts as a new MEF. | ||
| - Learn how to use parallel or serial processing to retrieve the cutouts. | ||
|
|
||
| ## 2. SPHEREx Overview | ||
|
|
||
| SPHEREx is a NASA Astrophysics Medium Explorer mission that launched in March 2025. | ||
| During its planned two-year mission, SPHEREx will obtain 0.75-5 micron spectroscopy over the entire sky, with deeper data in the SPHEREx Deep Fields. | ||
| SPHEREx data will be used to: | ||
|
|
||
| * **constrain the physics of inflation** by measuring its imprints on the three-dimensional large-scale distribution of matter, | ||
| * **trace the history of galactic light production** through a deep multi-band measurement of large-scale clustering, | ||
| * **investigate the abundance and composition of water and biogenic ices** in the early phases of star and planetary disk formation. | ||
|
|
||
| The community will also mine SPHEREx data and combine it with synergistic data sets to address a variety of additional topics in astrophysics. | ||
|
|
||
| More information is available in the [SPHEREx Explanatory Supplement](https://irsa.ipac.caltech.edu/data/SPHEREx/docs/SPHEREx_Expsupp_QR.pdf). | ||
|
|
||
| ## 3. Imports | ||
|
|
||
| The following packages must be installed to run this notebook. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| # Uncomment the next line to install dependencies if needed. | ||
| # !pip install numpy astropy pyvo matplotlib | ||
| ``` | ||
|
|
||
| ```{code-cell} ipython3 | ||
| import concurrent | ||
| import time | ||
|
|
||
| import astropy.units as u | ||
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import pyvo | ||
| from astropy.coordinates import SkyCoord | ||
| from astropy.io import fits | ||
| from astropy.table import Table | ||
| from astropy.wcs import WCS | ||
|
|
||
| # Suppress logging temporarily to prevent astropy | ||
| # from repeatedly printing out warning notices related to alternate WCSs | ||
| import logging | ||
| logging.getLogger('astropy').setLevel(logging.ERROR) | ||
| ``` | ||
|
|
||
| ## 4. Specify inputs and outputs | ||
|
|
||
| Specify a right ascension, declination, cutout size, and a SPHEREx bandpass (e.g., 'SPHEREx-D2'). | ||
|
|
||
| In this example, we are creating cutouts of the Pinwheel galaxy (M101) for the SPHEREx detector D2. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| # Choose a position. | ||
| ra = 210.80227 * u.degree | ||
| dec = 54.34895 * u.degree | ||
|
|
||
| # Choose a cutout size. | ||
| size = 0.1 * u.degree | ||
|
|
||
| # Choose the bandpass of interest. | ||
| bandpass = 'SPHEREx-D2' | ||
|
|
||
| # Choose an output filename root for the output MEF file. | ||
| output_filename = 'spherex_cutouts_mef.fits' | ||
| ``` | ||
|
|
||
| ## 5. Query IRSA for a list of cutouts that satisfy the criteria specified above. | ||
|
|
||
| Here we show how to use the `pyvo` TAP SQL query to retrieve all images that overlap with the position defined above. | ||
| This query will retrieve a table of URLs that link to the MEF cutouts. | ||
| Each row in the table corresponds to a single cutout and includes the data access URL and an observation timestamp. | ||
| The results are sorted from oldest to newest. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| # Define the service endpoint for IRSA's Table Access Protocol (TAP) | ||
| # so that we can query SPHEREx metadata tables. | ||
| service = pyvo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP") | ||
|
|
||
| # Define a query that will search the appropriate SPHEREx metadata tables | ||
| # for spectral images that cover the chosen coordinates and match the | ||
| # specified bandpass. Return the cutout data access URL and the time of observation. | ||
| # Sort by observation time. | ||
| query = f""" | ||
| SELECT | ||
| 'https://irsa.ipac.caltech.edu/' || a.uri || '?center={ra.to(u.degree).value},{dec.to(u.degree).value}d&size={size.to(u.degree).value}' AS uri, | ||
| p.time_bounds_lower | ||
| FROM spherex.artifact a | ||
| JOIN spherex.plane p ON a.planeid = p.planeid | ||
| WHERE 1 = CONTAINS(POINT('ICRS', {ra.to(u.degree).value}, {dec.to(u.degree).value}), p.poly) | ||
| AND p.energy_bandpassname = '{bandpass}' | ||
| ORDER BY p.time_bounds_lower | ||
| """ | ||
|
|
||
| # Execute the query and return as an astropy Table. | ||
| t1 = time.time() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this? |
||
| results = service.search(query) | ||
| print("Time to do TAP query: {:2.2f} seconds.".format(time.time() - t1)) | ||
| print("Number of images found: {}".format(len(results))) | ||
| ``` | ||
|
|
||
| ## 6. Define a function that processes a list of SPHEREx Spectral Image Cutouts | ||
|
|
||
| This function takes in a row of the catalog that we created above and does the following: | ||
|
|
||
| - It downloads the cutout | ||
| - It computes the wavelength of the center pixel of the cutout (in micro-meters) | ||
| - It combines the image HDUs into a new HDU and adds it to the table row. | ||
|
|
||
| Note that the values of the rows are being added in place. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| def process_cutout(row, ra, dec, cache): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should instead use a SkyCoord everywhere, after all a SkyCoord is created in here, too, IMO it's just cleaner than dealing with separate ra and dec |
||
| ''' | ||
| Downloads the cutouts given in a row of the table including all SPHEREx images overlapping with a position. | ||
|
|
||
| Parameters: | ||
| =========== | ||
|
|
||
| row : astropy.table row | ||
| Row of a table that will be changed in place by this function. The table | ||
| is created by the SQL TAP query. | ||
| ra,dec : coordinates (astropy units) | ||
| Ra and Dec coordinates (same as used for the TAP query) with attached astropy units | ||
| cache : bool | ||
| If set to `True`, the output of cached and the cutout processing will run faster next time. | ||
| Turn this feature off by setting `cache = False`. | ||
| ''' | ||
|
|
||
| with fits.open(row["uri"], cache=cache) as hdulist: | ||
| # There are seven HDUs: | ||
| # 0 contains minimal metadata in the header and no data. | ||
| # 1 through 6 are: IMAGE, FLAGS, VARIANCE, ZODI, PSF, WCS-WAVE | ||
| header = hdulist[1].header | ||
|
|
||
| # Compute pixel coordinates corresponding to cutout position. | ||
| spatial_wcs = WCS(header) | ||
| x, y = spatial_wcs.world_to_pixel(SkyCoord(ra=ra, dec=dec, unit="deg", frame="icrs")) | ||
|
|
||
| # Compute wavelength at cutout position. | ||
| spectral_wcs = WCS(header, fobj=hdulist, key="W") | ||
| spectral_wcs.sip = None | ||
| wavelength, bandpass = spectral_wcs.pixel_to_world(x, y) | ||
| row["central_wavelength"] = wavelength.to(u.micrometer).value | ||
|
|
||
| # Collect the HDUs for this cutout and append the row's cutout_index to the EXTNAME. | ||
| hdus = [] | ||
| for hdu in hdulist[1:]: # skip the primary header | ||
| hdu.header["EXTNAME"] = f"{hdu.header['EXTNAME']}{row['cutout_index']}" | ||
| hdus.append(hdu.copy()) # Copy so the data is available after the file is closed | ||
| row["hdus"] = hdus | ||
| ``` | ||
|
|
||
| ## 7. Download the Cutouts | ||
|
|
||
| This process can take a while. | ||
| If run in series, it can take about 5 minutes for 700 images on a typical laptop machine. | ||
| Here, we therefore exploit two different methods. | ||
| First we show the serial approach and next we show how to parallelize the methods. | ||
| The latter can be run on many CPUs and is therefore significantly faster. | ||
|
|
||
| ### 7.1 Serial Approach | ||
|
|
||
| First, we implement the serial approach -- a simple `for` loop. | ||
| Before that, we turn the results into an astropy table and add some place holders that will be filled in by the `process_cutout()` function. | ||
|
|
||
| ```{warning} | ||
| Running the cell below may take a while for a large number of cutouts. | ||
| Approximately 5-7 minutes for 700 images of cutout size 0.01 degree on a typical machine. | ||
| ``` | ||
|
|
||
| ```{tip} | ||
| The astropy `fits.open()` supports a caching argument. | ||
| This can be passed through in the `process_cutout()` function. | ||
| If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!). | ||
| The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`). | ||
| If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred. | ||
|
|
||
| To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management). | ||
| ``` | ||
|
|
||
| ```{code-cell} ipython3 | ||
| results_table_serial = results.to_table() | ||
| results_table_serial["cutout_index"] = range(1, len(results_table_serial) + 1) | ||
| results_table_serial["central_wavelength"] = np.full(len(results_table_serial), np.nan) | ||
| results_table_serial["hdus"] = np.full(len(results_table_serial), None) | ||
|
|
||
| t1 = time.time() | ||
| for row in results_table_serial: | ||
| process_cutout(row, ra, dec, cache=False) | ||
| print("Time to create cutouts in serial mode: {:2.2f} minutes.".format((time.time() - t1) / 60)) | ||
| ``` | ||
|
|
||
| ### 7.2 Parallel Approach | ||
|
|
||
| Next, we implement parallel processing, which will make the cutout creation faster. | ||
| The maximal number of workers can be limited by setting the `max_workers` argument. | ||
| The choice of this value depends on the number of cores but also on the number of parallel calls that can be digested by the IRSA server. | ||
|
|
||
| ```{tip} | ||
| A good value for the maximum number of workers is between 7 and 12 for a machine with 8 cores. | ||
| ``` | ||
|
|
||
| ```{tip} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this box is repeated from the previous section, consider to just keep one of them? |
||
| The astropy `fits.open()` supports a caching argument. | ||
| This can be passed through in the `process_cutout()` function. | ||
| If `cache=True` is set, the images are cached and the cutout creation is sped up next time the code is run (even if the Jupyter kernel is restarted!). | ||
| The downside is that the images are saved on the machine where this notebook is run (usually in `~/.astropy/cache/`). | ||
| If many cutouts are created, this can sum up to a large cached data volume, in which case `cache=False` is preferred. | ||
|
|
||
| To learn more about the cache please read the [astropy cache management documentation](https://docs.astropy.org/en/stable/utils/data.html#cache-management). | ||
| ``` | ||
|
|
||
| Again, before running the cutout processing we define some place holders. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| results_table_parallel = results.to_table() | ||
| results_table_parallel["cutout_index"] = range(1, len(results_table_parallel) + 1) | ||
| results_table_parallel["central_wavelength"] = np.full(len(results_table_parallel), np.nan) | ||
| results_table_parallel["hdus"] = np.full(len(results_table_parallel), None) | ||
|
|
||
| t1 = time.time() | ||
| with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor: | ||
| futures = [executor.submit(process_cutout, row, ra, dec, False) for row in results_table_parallel] | ||
| concurrent.futures.wait(futures) | ||
| print("Time to create cutouts in parallel mode: {:2.2f} minutes.".format((time.time() - t1) / 60)) | ||
| ``` | ||
|
|
||
| ## 8. Create a summary table HDU with renamed columns | ||
|
|
||
| In the following, we continue to use the output of the parallel mode. | ||
| The following cell does the following: | ||
|
|
||
| - Create a summary FITS table. | ||
| - Create the final FITS HDU including the summary table. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| # Create a summary table HDU with renamed columns | ||
| cols = fits.ColDefs([ | ||
| fits.Column(name="cutout_index", format="J", array=results_table_parallel["cutout_index"], unit=""), | ||
| fits.Column(name="observation_date", format="D", array=results_table_parallel["time_bounds_lower"], unit="d"), | ||
| fits.Column(name="central_wavelength", format="D", array=results_table_parallel["central_wavelength"], unit="um"), | ||
| fits.Column(name="access_url", format="A200", array=results_table_parallel["uri"], unit=""), | ||
| ]) | ||
| table_hdu = fits.BinTableHDU.from_columns(cols) | ||
| table_hdu.header["EXTNAME"] = "CUTOUT_INFO" | ||
| ``` | ||
|
|
||
| ## 9. Create the final MEF | ||
|
|
||
| Now, we create a primary HDU and combine it with the summary table HDU and the cutout image and wavelength HDUs to a final HDU list. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| primary_hdu = fits.PrimaryHDU() | ||
| hdulist_list = [primary_hdu, table_hdu] | ||
| hdulist_list.extend(hdu for fits_hdulist in results_table_parallel["hdus"] for hdu in fits_hdulist) | ||
| combined_hdulist = fits.HDUList(hdulist_list) | ||
| ``` | ||
|
|
||
| ## 10. Write the final MEF | ||
|
|
||
| Finally, we save the full HDU list to a multi-extension FITS file to disk. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| # Write the final MEF | ||
| combined_hdulist.writeto(output_filename, overwrite=True) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we sure about the overwrite here? I'm usually rather wary about overwriting data files. |
||
| ``` | ||
|
|
||
| ## 11. Test and visualize the final result | ||
|
|
||
| We can now open the new MEF FITS file that we have created above. | ||
|
|
||
| Loading the summary table (including wavelength information of each cutout) is straight forward: | ||
|
|
||
| ```{code-cell} ipython3 | ||
| summary_table = Table.read(output_filename , hdu=1) | ||
| ``` | ||
|
|
||
| Let's also extract the first 10 images (or all of the extracted cutouts if less than 10). | ||
|
|
||
| ```{code-cell} ipython3 | ||
| nbr_images = np.nanmin([10, len(summary_table)]) | ||
| imgs = [] | ||
| with fits.open(output_filename) as hdul: | ||
| for ii in range(nbr_images): | ||
| extname = "IMAGE{}".format(summary_table["cutout_index"][ii]) | ||
| imgs.append(hdul[extname].data) | ||
| ``` | ||
|
|
||
| Plot the images with the wavelength corresponding to their central pixel. | ||
|
|
||
| ```{code-cell} ipython3 | ||
| fig = plt.figure(figsize=(15, 5)) | ||
| axs = [fig.add_subplot(2, 5, ii + 1) for ii in range(10)] | ||
|
|
||
| for ii, img in enumerate(imgs): | ||
| axs[ii].imshow(imgs[ii], norm="log", origin="lower") | ||
| axs[ii].text(0.05, 0.05, r"$\lambda_{\rm center} = %2.4g \,{\rm \mu m}$" % summary_table["central_wavelength"][ii], | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way using native mpl plotting of astropy quantities? then there would be no need of manually setting units on the plot, and it ensures it always corresponds to the actual thing you plot |
||
| va="bottom", ha="left", color="white", transform=axs[ii].transAxes) | ||
|
|
||
| plt.show() | ||
| ``` | ||
|
|
||
| ## Acknowledgements | ||
|
|
||
| - [Caltech/IPAC-IRSA](https://irsa.ipac.caltech.edu/) | ||
|
|
||
| ## About this notebook | ||
|
|
||
| **Authors:** IRSA Data Science Team, including Vandana Desai, Andreas Faisst, Troy Raen, Brigitta Sipőcz, Jessica Krick, Shoubaneh Hemmati | ||
|
|
||
| **Updated:** 2025-09-30 | ||
|
|
||
| **Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or problems. | ||
|
|
||
| **Runtime:** As of the date above, this notebook takes about 3 minutes to run to completion on a machine with 8GB RAM and 4 CPU. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,7 @@ kernelspec: | |
| name: python3 | ||
| --- | ||
|
|
||
| (spherex-intro)= | ||
| # Introduction to SPHEREx Spectral Images | ||
|
|
||
| +++ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use astroquery.ipac.irsa.Irsa.query_tap? Then there is no need to set the URL