Skip to content

Commit e706b63

Browse files
Add a Euclid cloud access notebook
1 parent f486eac commit e706b63

File tree

1 file changed

+303
-0
lines changed

1 file changed

+303
-0
lines changed
Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.13
7+
jupytext_version: 1.16.3
8+
kernelspec:
9+
display_name: irsa-tutorials
10+
language: python
11+
name: python3
12+
---
13+
14+
# Euclid Quick Release 1: Cloud Access
15+
16+
+++
17+
18+
## Learning Goals
19+
- Learn where Euclid QR1 data is present in the cloud
20+
- Find an image and retireve its cutout from cloud
21+
- Find an object and retrieve its spectrum from cloud
22+
23+
+++
24+
25+
## Introduction
26+
TODO: fill with links of docs/specs?
27+
28+
+++
29+
30+
## Imports
31+
32+
```{code-cell} ipython3
33+
# Uncomment the next line to install dependencies if needed.
34+
# !pip install s3fs astropy astroquery matploltib
35+
```
36+
37+
```{code-cell} ipython3
38+
import s3fs
39+
from astropy.coordinates import SkyCoord
40+
import astropy.units as u
41+
from astropy.visualization import ImageNormalize, PercentileInterval, AsinhStretch
42+
from astropy.io import fits
43+
from astropy.nddata import Cutout2D
44+
from astropy.wcs import WCS
45+
from astropy.table import Table
46+
from astroquery.ipac.irsa import Irsa
47+
from matplotlib import pyplot as plt
48+
import json
49+
```
50+
51+
## Browse Euclid QR1 Bucket
52+
53+
```{code-cell} ipython3
54+
BUCKET_NAME='nasa-irsa-euclid-q1' # internal to IPAC until public release (use LAN or VPN w/ Tunnel-all)
55+
```
56+
57+
```{code-cell} ipython3
58+
s3 = s3fs.S3FileSystem(anon=True)
59+
```
60+
61+
```{code-cell} ipython3
62+
s3.ls(f'{BUCKET_NAME}/q1')
63+
```
64+
65+
## Find images for a coordinate search
66+
67+
+++
68+
69+
### Locate MER images in the bucket
70+
71+
```{code-cell} ipython3
72+
s3.ls(f'{BUCKET_NAME}/q1/MER')
73+
```
74+
75+
```{code-cell} ipython3
76+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211')
77+
```
78+
79+
```{code-cell} ipython3
80+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211/VIS')
81+
```
82+
83+
As per doc specification, we need `MER/{tile_id}/{instrument}/EUC_MER_BGSUB-MOSAIC*.fits` for displaying background-subtracted mosiac images. But these images are stored under TILE IDs so first we need to find TILE ID for a coordinate search we are interested in. We will use astroquery (in next section) to retrieve FITS file paths for our coordinates.
84+
85+
+++
86+
87+
### Get image file paths for a coordinate search of interest
88+
89+
```{code-cell} ipython3
90+
# Star TYC 4429-1677-1
91+
ra = 273.11730230 * u.deg
92+
dec = 68.20761349 * u.deg
93+
```
94+
95+
```{code-cell} ipython3
96+
coord = SkyCoord(ra=ra, dec=dec)
97+
search_radius = 10 * u.arcsec
98+
```
99+
100+
List all Simple Image Access (SIA) collections for IRSA.
101+
102+
```{code-cell} ipython3
103+
tbl = Irsa.list_collections(servicetype='SIA')
104+
len(tbl)
105+
```
106+
107+
Filter to only those containing "euclid":
108+
109+
```{code-cell} ipython3
110+
tbl[['euclid' in v for v in tbl['collection']]]
111+
```
112+
113+
We identify the collection we need for MER images:
114+
115+
```{code-cell} ipython3
116+
img_collection = 'euclid_DpdMerBksMosaic'
117+
```
118+
119+
```{code-cell} ipython3
120+
img_tbl = Irsa.query_sia(pos=(coord, search_radius), collection=img_collection).to_table()
121+
img_tbl
122+
```
123+
124+
Now we narrow it down to the images with science dataproduct subtype and Euclid facility:
125+
126+
```{code-cell} ipython3
127+
euclid_sci_img_tbl = img_tbl[[row['facility_name']=='Euclid' and row['dataproduct_subtype']=='science' for row in img_tbl]]
128+
euclid_sci_img_tbl
129+
```
130+
131+
We can see there's a `cloud_access` column that gives us the location info of the image files we are interested in. So let's extract the S3 bucket file path from it.
132+
133+
```{code-cell} ipython3
134+
def get_s3_fpath(cloud_access):
135+
cloud_info = json.loads(cloud_access) # converts str to dict
136+
bucket_name = cloud_info['aws']['bucket_name']
137+
key = cloud_info['aws']['key']
138+
139+
return f'{bucket_name}/{key}'
140+
```
141+
142+
```{code-cell} ipython3
143+
[get_s3_fpath(row['cloud_access']) for row in euclid_sci_img_tbl]
144+
```
145+
146+
Let's also extract filter names to use when displaying the images:
147+
148+
```{code-cell} ipython3
149+
def get_filter_name(instrument, bandpass):
150+
return f'{instrument}_{bandpass}' if instrument!=bandpass else instrument
151+
```
152+
153+
```{code-cell} ipython3
154+
[get_filter_name(row['instrument_name'], row['energy_bandpassname']) for row in euclid_sci_img_tbl]
155+
```
156+
157+
## Retrieve image cutouts from the cloud
158+
These image files are very big (~1.4GB), so we use astropy's lazy-loading capability of FITS for better performance. (See [Obtaining subsets from cloud-hosted FITS files](https://docs.astropy.org/en/stable/io/fits/usage/cloud.html#fits-io-cloud).)
159+
160+
```{code-cell} ipython3
161+
cutout_size = 1 * u.arcmin
162+
```
163+
164+
```{code-cell} ipython3
165+
cutouts = []
166+
filters = []
167+
168+
for row in euclid_sci_img_tbl:
169+
s3_fpath = get_s3_fpath(row['cloud_access'])
170+
filter_name = get_filter_name(row['instrument_name'], row['energy_bandpassname'])
171+
172+
with fits.open(f's3://{s3_fpath}', fsspec_kwargs={"anon": True}) as hdul:
173+
print(f'Retrieving cutout for {filter_name} ...')
174+
cutout = Cutout2D(hdul[0].section,
175+
position=coord,
176+
size=cutout_size,
177+
wcs=WCS(hdul[0].header))
178+
cutouts.append(cutout)
179+
filters.append(filter_name)
180+
```
181+
182+
```{code-cell} ipython3
183+
fig, axes = plt.subplots(2, 2, figsize=(4 * 2, 4 * 2), subplot_kw={'projection': cutouts[0].wcs})
184+
185+
for idx, ax in enumerate(axes.flat):
186+
norm = ImageNormalize(cutouts[idx].data, interval=PercentileInterval(99), stretch=AsinhStretch())
187+
ax.imshow(cutouts[idx].data, cmap='gray', origin='lower', norm=norm)
188+
ax.set_xlabel('RA')
189+
ax.set_ylabel('Dec')
190+
ax.text(0.95, 0.05, filters[idx], color='white', fontsize=14, transform=ax.transAxes, va='bottom', ha='right')
191+
192+
plt.tight_layout()
193+
```
194+
195+
## Find objects for the coordinates of our interest
196+
197+
+++
198+
199+
### Locate MER catalogs in the bucket
200+
201+
```{code-cell} ipython3
202+
s3.ls(f'{BUCKET_NAME}/q1/catalogs')
203+
```
204+
205+
```{code-cell} ipython3
206+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG')
207+
```
208+
209+
```{code-cell} ipython3
210+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG/102018211')
211+
```
212+
213+
As per doc specification, we need `catalogs/MER_FINAL_CATALOG/{tile_id}/EUC_MER_FINAL-CAT*.fits` for listing the objects catalogued. But we only need to find objects in our coordinates of interest so we will use astroquery to do a spatial search in MER catalog (combined for all tiles).
214+
215+
+++
216+
217+
### Get object IDs for the coordinates of our interest
218+
219+
```{code-cell} ipython3
220+
tbl_catalogs = Irsa.list_catalogs(full=True).to_table()
221+
len(tbl_catalogs)
222+
```
223+
224+
```{code-cell} ipython3
225+
tbl_catalogs[['euclid' in v for v in tbl_catalogs['schema_name']]]
226+
```
227+
228+
From this table, we can extract the MER catalog name. We also see several other interesting catalogs, let's also extract spectral file association catalog for retrieving spectra later.
229+
230+
```{code-cell} ipython3
231+
euclid_mer_catalog = 'euclid_q1_mer_catalogue'
232+
euclid_spec_association_catalog = 'euclid.objectid_spectrafile_association_q1'
233+
```
234+
235+
Now, we do a TAP search with spatial constraints for our coordinates. We use cone of 5 arcsec around our source to pinpoint its object ID in Euclid catalog.
236+
237+
```{code-cell} ipython3
238+
search_radius = (5 * u.arcsec).to('deg')
239+
240+
adql_query = f"SELECT * \
241+
FROM {euclid_mer_catalog} \
242+
WHERE CONTAINS(POINT('ICRS', ra, dec), \
243+
CIRCLE('ICRS', {ra.value}, {dec.value}, {search_radius.value})) = 1"
244+
245+
mer_catalog_tbl = Irsa.query_tap(query=adql_query).to_table()
246+
mer_catalog_tbl
247+
```
248+
249+
```{code-cell} ipython3
250+
object_id = int(mer_catalog_tbl['object_id'][0])
251+
object_id
252+
```
253+
254+
## Find spectra for the coordinates of our interest
255+
Using the object ID(s) we extracted above, we can narrow down the spectral file association catalog to identify spectra file path(s).
256+
257+
```{code-cell} ipython3
258+
adql_query = f"SELECT * FROM {euclid_spec_association_catalog} \
259+
WHERE objectid = {object_id}"
260+
261+
spec_association_tbl = Irsa.query_tap(adql_query).to_table()
262+
spec_association_tbl
263+
```
264+
265+
We can see the `uri` column that gives us location of spectra file on IBE, we can map it to S3 bucket key to retrieve spectra file from the cloud. This is a very big FITS spectra file with multiple extensions where each extension contains spectrum of one object. The `hdu` column gives us the extension number for our object. So let's extract both of these.
266+
267+
```{code-cell} ipython3
268+
spec_fpath_key = spec_association_tbl['uri'][0].replace('ibe/data/euclid/', '')
269+
spec_fpath_key
270+
```
271+
272+
```{code-cell} ipython3
273+
object_hdu_idx = int(spec_association_tbl['hdu'][0])
274+
object_hdu_idx
275+
```
276+
277+
## Retrieve spectrum from the cloud
278+
Again we use astropy's lazy-loading capability of FITS to only retrieve the spectrum table of our object from the S3 bucket.
279+
280+
```{code-cell} ipython3
281+
with fits.open(f's3://{BUCKET_NAME}/{spec_fpath_key}', fsspec_kwargs={'anon': True}) as hdul:
282+
spec_hdu = hdul[object_hdu_idx]
283+
spec_tbl = Table.read(spec_hdu)
284+
```
285+
286+
```{code-cell} ipython3
287+
spec_tbl
288+
```
289+
290+
```{code-cell} ipython3
291+
plt.plot(spec_tbl['WAVELENGTH'], spec_tbl['SIGNAL'])
292+
plt.xlabel(spec_tbl['WAVELENGTH'].unit.to_string('latex_inline'))
293+
plt.ylabel(spec_tbl['SIGNAL'].unit.to_string('latex_inline'))
294+
295+
plt.title(f'Euclid Object ID: {object_id}');
296+
```
297+
298+
## About this Notebook
299+
Author: Jaladh Singhal (IRSA Developer) in conjunction with Tiffany Meshkat, Vandana Desai, Brigitta Sipőcz, and the IPAC Science Platform team
300+
301+
Updated: 2025-03-13
302+
303+
Contact: the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or reporting problems.

0 commit comments

Comments
 (0)