Skip to content

Commit 88b1350

Browse files
authored
Merge pull request #50 from jaladh-singhal/euclid-cloud-access
Add a Euclid cloud access notebook
2 parents 5e54558 + 91619c9 commit 88b1350

File tree

1 file changed

+317
-0
lines changed

1 file changed

+317
-0
lines changed
Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.13
7+
jupytext_version: 1.16.3
8+
kernelspec:
9+
display_name: Python 3 (ipykernel)
10+
language: python
11+
name: python3
12+
---
13+
14+
# Euclid Quick Release 1: Cloud Access
15+
16+
+++
17+
18+
## Learning Goals
19+
By the end of this tutorial, you will:
20+
- Learn where Euclid Q1 data are stored in the cloud.
21+
- Retrieve an image cutout from the cloud.
22+
- Retrieve a spectrum from the cloud.
23+
24+
+++
25+
26+
## 1. Introduction
27+
Euclid launched in July 2023 as a European Space Agency (ESA) mission with involvement by NASA. The primary science goals of Euclid are to better understand the composition and evolution of the dark Universe. The Euclid mission is providing space-based imaging and spectroscopy as well as supporting ground-based imaging to achieve these primary goals. These data will be archived by multiple global repositories, including IRSA, where they will support transformational work in many areas of astrophysics.
28+
29+
Euclid Quick Release 1 (Q1) consists of consists of ~30 TB of imaging, spectroscopy, and catalogs covering four non-contiguous fields: Euclid Deep Field North (22.9 sq deg), Euclid Deep Field Fornax (12.1 sq deg), Euclid Deep Field South (28.1 sq deg), and LDN1641.
30+
31+
Euclid Q1 data were released on-premises at IPAC and in the cloud via Amazon Web Services' Open Data Repository. This notebook introduces users to accessing Euclid Q1 data from the cloud. Additional Euclid-specific notebooks can be found at ["Accessing Euclid data" section](https://caltech-ipac.github.io/irsa-tutorials/#accessing-euclid-data) and additional notebooks about how to access IRSA-curated data from the AWS ODR can be found at ["Accessing IRSA's cloud holdings" section](https://caltech-ipac.github.io/irsa-tutorials/#accessing-irsa-s-cloud-holdings).
32+
33+
+++
34+
35+
## 2. Imports
36+
- `s3fs` for browsing S3 buckets
37+
- `astropy` for handling coordinates, units, FITS I/O, tables, images, etc.
38+
- `astroquery` for querying Euclid data products from IRSA
39+
- `matplotlib` for visualization
40+
- `json` for decoding JSON strings
41+
42+
```{code-cell} ipython3
43+
# Uncomment the next line to install dependencies if needed.
44+
# !pip install s3fs astropy astroquery matploltib
45+
```
46+
47+
```{code-cell} ipython3
48+
import s3fs
49+
from astropy.coordinates import SkyCoord
50+
import astropy.units as u
51+
from astropy.visualization import ImageNormalize, PercentileInterval, AsinhStretch
52+
from astropy.io import fits
53+
from astropy.nddata import Cutout2D
54+
from astropy.wcs import WCS
55+
from astropy.table import Table
56+
from astroquery.ipac.irsa import Irsa
57+
from matplotlib import pyplot as plt
58+
import json
59+
```
60+
61+
## 3. Browse Euclid Q1 cloud-hosted data
62+
63+
```{code-cell} ipython3
64+
BUCKET_NAME = 'nasa-irsa-euclid-q1'
65+
```
66+
67+
[s3fs](https://s3fs.readthedocs.io/en/latest/) provides a filesystem-like python interface for AWS S3 buckets. First we create a s3 client:
68+
69+
```{code-cell} ipython3
70+
s3 = s3fs.S3FileSystem(anon=True)
71+
```
72+
73+
Then we list the `q1` directory that contains Euclid Q1 data products:
74+
75+
```{code-cell} ipython3
76+
s3.ls(f'{BUCKET_NAME}/q1')
77+
```
78+
79+
Let's navigate to MER images (available as FITS files):
80+
81+
```{code-cell} ipython3
82+
s3.ls(f'{BUCKET_NAME}/q1/MER')[:10] # ls only top 10 to limit the long output
83+
```
84+
85+
```{code-cell} ipython3
86+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211') # pick any tile ID from above
87+
```
88+
89+
```{code-cell} ipython3
90+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211/VIS') # pick any instrument from above
91+
```
92+
93+
As per "Browsable Directories" section in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf), we need `MER/{tile_id}/{instrument}/EUC_MER_BGSUB-MOSAIC*.fits` for displaying background-subtracted mosiac images. But these images are stored under TILE IDs so first we need to find TILE ID for a coordinate search we are interested in. We will use astroquery (in next section) to retrieve FITS file paths for our coordinates by doing spatial search.
94+
95+
+++
96+
97+
## 4. Do a spatial search for MER mosaics
98+
99+
Pick a target and search radius:
100+
101+
```{code-cell} ipython3
102+
target_name = 'TYC 4429-1677-1'
103+
coord = SkyCoord.from_name(target_name)
104+
search_radius = 10 * u.arcsec
105+
```
106+
107+
List all Simple Image Access (SIA) collections for IRSA:
108+
109+
```{code-cell} ipython3
110+
collections = Irsa.list_collections(servicetype='SIA')
111+
len(collections)
112+
```
113+
114+
Filter to only those containing "euclid":
115+
116+
```{code-cell} ipython3
117+
collections[['euclid' in v for v in collections['collection']]]
118+
```
119+
120+
As per "Data Products Overview" in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf) and above table, we identify that MER Mosiacs are available as the following collection:
121+
122+
```{code-cell} ipython3
123+
img_collection = 'euclid_DpdMerBksMosaic'
124+
```
125+
126+
Now query this collection for our target's coordinates and search radius:
127+
128+
```{code-cell} ipython3
129+
img_tbl = Irsa.query_sia(pos=(coord, search_radius), collection=img_collection).to_table()
130+
img_tbl
131+
```
132+
133+
Let's narrow it down to the images with science dataproduct subtype and Euclid facility:
134+
135+
```{code-cell} ipython3
136+
euclid_sci_img_tbl = img_tbl[[row['facility_name']=='Euclid'
137+
and row['dataproduct_subtype']=='science'
138+
for row in img_tbl]]
139+
euclid_sci_img_tbl
140+
```
141+
142+
We can see there's a `cloud_access` column that gives us the location info of the image files we are interested in. So let's extract the S3 bucket file path from it:
143+
144+
```{code-cell} ipython3
145+
def get_s3_fpath(cloud_access):
146+
cloud_info = json.loads(cloud_access) # converts str to dict
147+
bucket_name = cloud_info['aws']['bucket_name']
148+
key = cloud_info['aws']['key']
149+
150+
return f'{bucket_name}/{key}'
151+
```
152+
153+
```{code-cell} ipython3
154+
[get_s3_fpath(row['cloud_access']) for row in euclid_sci_img_tbl]
155+
```
156+
157+
Let's also extract filter names to use when displaying the images:
158+
159+
```{code-cell} ipython3
160+
def get_filter_name(instrument, bandpass):
161+
return f'{instrument}_{bandpass}' if instrument!=bandpass else instrument
162+
```
163+
164+
```{code-cell} ipython3
165+
[get_filter_name(row['instrument_name'], row['energy_bandpassname']) for row in euclid_sci_img_tbl]
166+
```
167+
168+
## 5. Efficiently retrieve mosaic cutouts
169+
These image files are very big (~1.4GB), so we use astropy's lazy-loading capability of FITS for better performance. (See [Obtaining subsets from cloud-hosted FITS files](https://docs.astropy.org/en/stable/io/fits/usage/cloud.html#fits-io-cloud).)
170+
171+
```{code-cell} ipython3
172+
cutout_size = 1 * u.arcmin
173+
```
174+
175+
```{code-cell} ipython3
176+
cutouts = []
177+
filters = []
178+
179+
for row in euclid_sci_img_tbl:
180+
s3_fpath = get_s3_fpath(row['cloud_access'])
181+
filter_name = get_filter_name(row['instrument_name'], row['energy_bandpassname'])
182+
183+
with fits.open(f's3://{s3_fpath}', fsspec_kwargs={"anon": True}) as hdul:
184+
print(f'Retrieving cutout for {filter_name} ...')
185+
cutout = Cutout2D(hdul[0].section,
186+
position=coord,
187+
size=cutout_size,
188+
wcs=WCS(hdul[0].header))
189+
cutouts.append(cutout)
190+
filters.append(filter_name)
191+
```
192+
193+
```{code-cell} ipython3
194+
fig, axes = plt.subplots(2, 2, figsize=(4 * 2, 4 * 2), subplot_kw={'projection': cutouts[0].wcs})
195+
196+
for idx, ax in enumerate(axes.flat):
197+
norm = ImageNormalize(cutouts[idx].data, interval=PercentileInterval(99), stretch=AsinhStretch())
198+
ax.imshow(cutouts[idx].data, cmap='gray', origin='lower', norm=norm)
199+
ax.set_xlabel('RA')
200+
ax.set_ylabel('Dec')
201+
ax.text(0.95, 0.05, filters[idx], color='white', fontsize=14, transform=ax.transAxes, va='bottom', ha='right')
202+
203+
plt.tight_layout()
204+
```
205+
206+
## 6. Find the MER catalog for a given tile
207+
Let's navigate to MER catalog in the Euclid Q1 bucket:
208+
209+
```{code-cell} ipython3
210+
s3.ls(f'{BUCKET_NAME}/q1/catalogs')
211+
```
212+
213+
```{code-cell} ipython3
214+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG')[:10] # ls only top 10 to limit the long output
215+
```
216+
217+
```{code-cell} ipython3
218+
mer_tile_id = 102160339 # from the image paths for the target we picked
219+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG/{mer_tile_id}')
220+
```
221+
222+
As per "Browsable Directiories" section in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf), we can use `catalogs/MER_FINAL_CATALOG/{tile_id}/EUC_MER_FINAL-CAT*.fits` for listing the objects catalogued. We can read the identified FITS file as table and do filtering on ra, dec columns to find object ID(s) only for the target we picked. But it will be an expensive operation so we will instead use astroquery (in next section) to do a spatial search in the MER catalog provided by IRSA.
223+
224+
```{note}
225+
Once the catalogs are available as Parquet files in the cloud, we can efficiently do spatial filtering directly on the cloud-hosted file to identify object ID(s) for our target. But for the time being, we can use catalog VO services through astroquery to do the same.
226+
```
227+
228+
+++
229+
230+
## 7. Find the MER Object ID for our target
231+
First, list the Euclid catalogs provided by IRSA:
232+
233+
```{code-cell} ipython3
234+
catalogs = Irsa.list_catalogs(full=True).to_table()
235+
len(catalogs)
236+
```
237+
238+
```{code-cell} ipython3
239+
catalogs[['euclid' in v for v in catalogs['schema_name']]]
240+
```
241+
242+
From this table, we can extract the MER catalog name. We also see several other interesting catalogs, let's also extract spectral file association catalog for retrieving spectra later.
243+
244+
```{code-cell} ipython3
245+
euclid_mer_catalog = 'euclid_q1_mer_catalogue'
246+
euclid_spec_association_catalog = 'euclid.objectid_spectrafile_association_q1'
247+
```
248+
249+
Now, we do a region search within a cone of 5 arcsec around our target to pinpoint its object ID in Euclid catalog:
250+
251+
```{code-cell} ipython3
252+
search_radius = 5 * u.arcsec
253+
254+
mer_catalog_tbl = Irsa.query_region(coordinates=coord, spatial='Cone',
255+
catalog=euclid_mer_catalog, radius=search_radius)
256+
mer_catalog_tbl
257+
```
258+
259+
```{code-cell} ipython3
260+
object_id = int(mer_catalog_tbl['object_id'][0])
261+
object_id
262+
```
263+
264+
## 8. Find the spectrum of an object in the MER catalog
265+
Using the object ID(s) we extracted above, we can narrow down the spectral file association catalog to identify spectra file path(s). So we do the following TAP search:
266+
267+
```{code-cell} ipython3
268+
adql_query = f"SELECT * FROM {euclid_spec_association_catalog} \
269+
WHERE objectid = {object_id}"
270+
271+
spec_association_tbl = Irsa.query_tap(adql_query).to_table()
272+
spec_association_tbl
273+
```
274+
275+
```{warning}
276+
If you picked a target other than what this notebook uses, it's possible that there is no spectrum associated for your target's object ID. In that case, `spec_association_tbl` will contain 0 rows.
277+
```
278+
279+
In above table, we can see that the `uri` column gives us location of spectra file on IBE. We can map it to S3 bucket key to retrieve spectra file from the cloud. This is a very big FITS spectra file with multiple extensions where each extension contains spectrum of one object. The `hdu` column gives us the extension number for our object. So let's extract both of these.
280+
281+
```{code-cell} ipython3
282+
spec_fpath_key = spec_association_tbl['uri'][0].replace('ibe/data/euclid/', '')
283+
spec_fpath_key
284+
```
285+
286+
```{code-cell} ipython3
287+
object_hdu_idx = int(spec_association_tbl['hdu'][0])
288+
object_hdu_idx
289+
```
290+
291+
Again, we use astropy's lazy-loading capability of FITS to only retrieve the spectrum table of our object from the S3 bucket.
292+
293+
```{code-cell} ipython3
294+
with fits.open(f's3://{BUCKET_NAME}/{spec_fpath_key}', fsspec_kwargs={'anon': True}) as hdul:
295+
spec_hdu = hdul[object_hdu_idx]
296+
spec_tbl = Table.read(spec_hdu)
297+
```
298+
299+
```{code-cell} ipython3
300+
spec_tbl
301+
```
302+
303+
```{code-cell} ipython3
304+
plt.plot(spec_tbl['WAVELENGTH'], spec_tbl['SIGNAL'])
305+
plt.xlabel(spec_tbl['WAVELENGTH'].unit.to_string('latex_inline'))
306+
plt.ylabel(spec_tbl['SIGNAL'].unit.to_string('latex_inline'))
307+
308+
plt.title(f'Spectrum of Target: {target_name}\n(Euclid Object ID: {object_id})');
309+
```
310+
311+
## About this Notebook
312+
313+
**Author:** Jaladh Singhal (IRSA Developer) in conjunction with Vandana Desai, Brigitta Sipőcz, Tiffany Meshkat and the IPAC Science Platform team
314+
315+
**Updated:** 2025-03-17
316+
317+
**Contact:** the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or reporting problems.

0 commit comments

Comments
 (0)