Skip to content

Commit 18dc42b

Browse files
committed
Swapping out pandas
1 parent ee3f93f commit 18dc42b

File tree

2 files changed

+89
-93
lines changed

2 files changed

+89
-93
lines changed

tutorials/euclid_access/1_Euclid_intro_MER_images.md

Lines changed: 21 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ Each MER image is approximately 1.47 GB. Downloading can take some time.
6565
import re
6666
6767
import numpy as np
68-
import pandas as pd
6968
7069
import matplotlib.pyplot as plt
7170
from matplotlib.patches import Ellipse
@@ -80,9 +79,6 @@ from astropy import units as u
8079
8180
from astroquery.ipac.irsa import Irsa
8281
import sep
83-
84-
# Copy-on-write is more performant and avoids unexpected modifications of the original DataFrame.
85-
pd.options.mode.copy_on_write = True
8682
```
8783

8884
## 1. Search for multiwavelength Euclid Q1 MER mosaics that cover the star HD 168151
@@ -123,20 +119,23 @@ science_images
123119
Note that 'access_estsize' is in units of kb
124120

125121
```{code-cell} ipython3
126-
filename = science_images[science_images['energy_bandpassname']=='VIS']['access_url'][0]
127-
filesize = science_images[science_images['energy_bandpassname']=='VIS']['access_estsize'][0]/1000000
128-
122+
filename = science_images[science_images['energy_bandpassname'] == 'VIS']['access_url'][0]
123+
filesize = science_images[science_images['energy_bandpassname'] == 'VIS']['access_estsize'][0] / 1000000
129124
print(filename)
130125
131126
print(f'Please note this image is {filesize} GB. With 230 Mbps internet download speed, it takes about 1 minute to download.')
132127
```
133128

129+
```{code-cell} ipython3
130+
science_images
131+
```
132+
134133
### Extract the tileID of this image from the filename
135134

136135
```{code-cell} ipython3
137-
tileID=re.search(r'TILE\s*(\d{9})', filename).group(1)
136+
tileID = science_images[science_images['energy_bandpassname'] == 'VIS']['obs_id'][0][:9]
138137
139-
print('The MER tile ID for this object is :',tileID)
138+
print(f'The MER tile ID for this object is : {tileID}')
140139
```
141140

142141
Retrieve the MER image -- note this file is about 1.46 GB
@@ -146,7 +145,7 @@ fname = download_file(filename, cache=True)
146145
hdu_mer_irsa = fits.open(fname)
147146
print(hdu_mer_irsa.info())
148147
149-
head_mer_irsa = hdu_mer_irsa[0].header
148+
header_mer_irsa = hdu_mer_irsa[0].header
150149
```
151150

152151
If you would like to save the MER mosaic to disk, uncomment the following cell.
@@ -160,21 +159,22 @@ Please also define a suitable download directory; by default it will be `data` a
160159
Have a look at the header information for this image.
161160

162161
```{code-cell} ipython3
163-
head_mer_irsa
162+
header_mer_irsa
164163
```
165164

166165
Lets extract just the primary image.
167166

168167
```{code-cell} ipython3
169-
im_mer_irsa=hdu_mer_irsa[0].data
168+
im_mer_irsa = hdu_mer_irsa[0].data
170169
171170
print(im_mer_irsa.shape)
172171
```
173172

174173
Due to the large field of view of the MER mosaic, let's cut out a smaller section (2"x2")of the MER mosaic to inspect the image
175174

176175
```{code-cell} ipython3
177-
plt.imshow(im_mer_irsa[0:1200,0:1200], cmap='gray', origin='lower', norm=ImageNormalize(im_mer_irsa[0:1200,0:1200], interval=PercentileInterval(99.9), stretch=AsinhStretch()))
176+
plt.imshow(im_mer_irsa[0:1200,0:1200], cmap='gray', origin='lower',
177+
norm=ImageNormalize(im_mer_irsa[0:1200,0:1200], interval=PercentileInterval(99.9), stretch=AsinhStretch()))
178178
colorbar = plt.colorbar()
179179
```
180180

@@ -203,21 +203,20 @@ urls
203203
Create an array with the instrument and filter name so we can add this to the plots.
204204

205205
```{code-cell} ipython3
206-
df_im_euclid.loc[:, "filters"] = df_im_euclid["instrument_name"] + "_" + df_im_euclid["energy_bandpassname"]
206+
science_images['filters'] = science_images['instrument_name'] + "_" + science_images['energy_bandpassname']
207207
208-
## Note that VIS_VIS appears in the filters, so update that filter to just say VIS
209-
df_im_euclid.loc[df_im_euclid["filters"] == "VIS_VIS", "filters"] = "VIS"
208+
# VIS_VIS appears in the filters, so update that filter to just say VIS
209+
science_images['filters'][science_images['filters']== 'VIS_VIS'] = "VIS"
210210
211-
filters = df_im_euclid['filters'].to_numpy()
212-
filters
211+
science_images['filters']
213212
```
214213

215214
## The image above is very large, so let's cut out a smaller image to inspect these data.
216215

217216
```{code-cell} ipython3
218217
######################## User defined section ############################
219218
## How large do you want the image cutout to be?
220-
im_cutout= 1.0 * u.arcmin
219+
im_cutout = 1.0 * u.arcmin
221220
222221
## What is the center of the cutout?
223222
## For now choosing a random location on the image
@@ -229,7 +228,7 @@ dec = 64.525
229228
# ra = 273.474451
230229
# dec = 64.397273
231230
232-
coords_cutout = SkyCoord(ra, dec, unit=(u.deg, u.deg), frame='icrs')
231+
coords_cutout = SkyCoord(ra, dec, unit='deg', frame='icrs')
233232
234233
##########################################################################
235234
@@ -296,13 +295,9 @@ plt.show()
296295
First we list all the filters so you can choose which cutout you want to extract sources on. We will choose VIS.
297296

298297
```{code-cell} ipython3
299-
filters
300-
```
301-
302-
```{code-cell} ipython3
303-
filt_index = np.where(filters == 'VIS')[0][0]
298+
filt_index = np.where(science_images['filters'] == 'VIS')[0][0]
304299
305-
img1=final_hdulist[filt_index].data
300+
img1 = final_hdulist[filt_index].data
306301
```
307302

308303
### Extract some sources from the cutout using sep (python package based on source extractor)

tutorials/euclid_access/5_Euclid_intro_SPE_catalog.md

Lines changed: 68 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ If you have questions about this notebook, please contact the [IRSA helpdesk](ht
5050

5151
```{code-cell} ipython3
5252
# Uncomment the next line to install dependencies if needed
53-
# !pip install matplotlib pandas astropy 'astroquery>=0.4.10'
53+
# !pip install matplotlib astropy 'astroquery>=0.4.10'
5454
```
5555

5656
```{code-cell} ipython3
@@ -59,14 +59,13 @@ import urllib
5959
6060
import matplotlib.pyplot as plt
6161
import numpy as np
62-
import pandas as pd
6362
6463
from astropy.coordinates import SkyCoord
6564
from astropy.io import fits
66-
from astropy.table import Table
65+
from astropy.table import QTable
6766
from astropy import units as u
6867
from astropy.utils.data import download_file
69-
from astropy.visualization import ImageNormalize, PercentileInterval, AsinhStretch
68+
from astropy.visualization import ImageNormalize, PercentileInterval, AsinhStretch, quantity_support
7069
7170
from astroquery.ipac.irsa import Irsa
7271
```
@@ -80,41 +79,41 @@ search_radius = 10 * u.arcsec
8079
coord = SkyCoord.from_name('HD 168151')
8180
```
8281

83-
### Use IRSA to search for all Euclid data on this target
82+
```{tip}
83+
The IRSA SIA collections can be listed using using the ``list_collections`` method, we can filter on the ones containing "euclid" in the collection name:
8484
85-
This searches specifically in the euclid_DpdMerBksMosaic "collection" which is the MER images and catalogs.
85+
Irsa.list_collections(filter='euclid')
86+
```
8687

87-
```{code-cell} ipython3
88-
im_table = Irsa.query_sia(pos=(coord, search_radius), collection='euclid_DpdMerBksMosaic')
88+
+++
8989

90-
## Convert the table to pandas dataframe
91-
df_im_irsa=im_table.to_pandas()
92-
```
90+
### Use IRSA to search for all Euclid data on this target
91+
92+
This searches specifically in the ``euclid_DpdMerBksMosaic`` collection which is the MER images and catalogs.
9393

9494
```{code-cell} ipython3
95-
## Change the settings so we can see all the columns in the dataframe and the full column width
96-
## (to see the full long URL)
97-
pd.set_option('display.max_columns', None)
98-
pd.set_option('display.max_colwidth', None)
95+
image_table = Irsa.query_sia(pos=(coord, search_radius), collection='euclid_DpdMerBksMosaic')
9996
```
10097

101-
#### This dataframe contains other non-Euclid datasets that have been "Euclidized", meaning they have been put on the same pixel scale as the Euclid data. For this example we just want to look at the Euclid data, so select Euclid for the facility name, and choose science as the data product subtype.
98+
This table lists all MER mosaic images available in this search position. These mosaics include the Euclid VIS, Y, J, H images, as well as ground-based telescopes which have been put on the same pixel scale. For more information, see the [Euclid documentation at IPAC](https://euclid.caltech.edu/page/euclid-faq-tech/).
10299

103-
```{code-cell} ipython3
104-
df_im_euclid=df_im_irsa[ (df_im_irsa['dataproduct_subtype']=='science') & (df_im_irsa['facility_name']=='Euclid')]
100+
Note that there are various image types are returned as well, we filter out the `science` images from these:
105101

106-
df_im_euclid.head()
102+
```{code-cell} ipython3
103+
science_images = image_table[image_table['dataproduct_subtype'] == 'science']
104+
science_images
107105
```
108106

109-
## Choose the VIS image and pull the filename:
107+
### Choose the VIS image and pull the Tile ID
110108

111-
```{code-cell} ipython3
112-
filename=df_im_euclid[df_im_euclid['energy_bandpassname']=='VIS']['access_url'].to_list()[0]
109+
+++
113110

114-
# ## Extract the tileID from the filename
115-
tileID=re.search(r'TILE\s*(\d{9})', filename).group(1)
111+
Extract the tile ID from the ``obs_id`` column. The values in this column are made a combination of the 9 digit tile ID and the abbreviation of the instrument.
116112

117-
print('The MER tile ID for this object is :',tileID)
113+
```{code-cell} ipython3
114+
tileID = science_images[science_images['energy_bandpassname'] == 'VIS']['obs_id'][0][:9]
115+
116+
print(f'The MER tile ID for this object is : {tileID}')
118117
```
119118

120119
## 2. Download SPE catalog from IRSA directly to this notebook
@@ -137,10 +136,14 @@ table_lines = 'euclid_q1_spe_lines_line_features'
137136
- List the column names
138137

139138
```{code-cell} ipython3
140-
columns_info = Irsa.list_columns(catalog=table_lines)
139+
columns_info = Irsa.list_columns(catalog=table_mer)
141140
print(len(columns_info))
142141
```
143142

143+
```{code-cell} ipython3
144+
Irsa.list_columns(catalog=table_1dspectra, full=True)
145+
```
146+
144147
```{code-cell} ipython3
145148
# Full list of columns and their description
146149
columns_info
@@ -159,89 +162,87 @@ We specify the following conditions on our search:
159162
Finally we sort the data by descending spe_line_snr_gf to have the largest SNR H-alpha lines detected at the top.
160163

161164
```{code-cell} ipython3
162-
adql = f"SELECT DISTINCT mer.object_id,mer.ra, mer.dec, mer.tileid, mer.flux_y_templfit, \
163-
lines.spe_line_snr_gf,lines.spe_line_snr_di, lines.spe_line_name, lines.spe_line_central_wl_gf,\
164-
lines.spe_line_ew_gf, galaxy.spe_z_err, galaxy.spe_z,galaxy.spe_z_prob, lines.spe_line_flux_gf, lines.spe_line_flux_err_gf \
165-
FROM {table_mer} AS mer \
166-
JOIN {table_lines} AS lines \
167-
ON mer.object_id = lines.object_id \
168-
JOIN {table_galaxy_candidates} AS galaxy \
169-
ON lines.object_id = galaxy.object_id AND lines.spe_rank = galaxy.spe_rank \
170-
WHERE lines.spe_line_snr_gf >5 \
171-
AND lines.spe_line_name = 'Halpha' \
172-
AND mer.tileid = {tileID} \
173-
AND galaxy.spe_z_prob > 0.99 \
174-
AND galaxy.spe_z BETWEEN 1.4 AND 1.6 \
175-
AND lines.spe_line_flux_gf > 2E-16 \
176-
ORDER BY lines.spe_line_snr_gf DESC \
177-
"
165+
adql_query = ("SELECT DISTINCT mer.object_id,mer.ra, mer.dec, mer.tileid, mer.flux_y_templfit, "
166+
"lines.spe_line_snr_gf,lines.spe_line_snr_di, lines.spe_line_name, lines.spe_line_central_wl_gf, "
167+
"lines.spe_line_ew_gf, galaxy.spe_z_err, galaxy.spe_z,galaxy.spe_z_prob, "
168+
"lines.spe_line_flux_gf, lines.spe_line_flux_err_gf "
169+
f"FROM {table_mer} AS mer "
170+
f"JOIN {table_lines} AS lines "
171+
"ON mer.object_id = lines.object_id "
172+
f"JOIN {table_galaxy_candidates} AS galaxy "
173+
"ON lines.object_id = galaxy.object_id AND lines.spe_rank = galaxy.spe_rank "
174+
"WHERE lines.spe_line_snr_gf >5 "
175+
"AND lines.spe_line_name = 'Halpha' "
176+
f"AND mer.tileid = {tileID} "
177+
"AND galaxy.spe_z_prob > 0.99 "
178+
"AND galaxy.spe_z BETWEEN 1.4 AND 1.6 "
179+
"AND lines.spe_line_flux_gf > 2E-16 "
180+
"ORDER BY lines.spe_line_snr_gf DESC ")
178181
179182
# Use TAP with this ADQL string
180-
result = Irsa.query_tap(adql)
181-
182-
# Convert table to pandas dataframe and drop duplicates
183-
result_table = result.to_qtable()
183+
result_table = Irsa.query_tap(adql_query).to_qtable()
184184
185185
result_table['spe_line_flux_gf'].info.format = ".8e" # Scientific notation with 8 decimal places
186186
result_table['spe_line_flux_err_gf'].info.format = ".8e"
187-
result_table['object_id'] = result['object_id'].astype('int64')
187+
result_table['object_id'] = result_table['object_id'].astype('int64')
188188
```
189189

190190
### Choose an object of interest, lets look at an object with a strong Halpha line detected with high SNR.
191191

192192
```{code-cell} ipython3
193193
obj_id = 2737659721646729968
194194
195-
obj_tab = result_table[(result_table['object_id'] == obj_id)]
195+
obj_row = result_table[(result_table['object_id'] == obj_id)]
196196
197-
obj_tab
197+
obj_row
198198
```
199199

200200
### Pull the spectrum of this object
201201

202202
```{code-cell} ipython3
203203
adql_object = f"SELECT * FROM {table_1dspectra} WHERE objectid = {obj_id}"
204204
205-
result2 = Irsa.query_tap(adql_object)
206-
df2 = result2.to_table().to_pandas()
207-
df2
205+
result_table2 = Irsa.query_tap(adql_object).to_qtable()
208206
```
209207

210208
### The following steps to read in the spectrum follows the 3_Euclid_intro_1D_spectra notebook.
211209

212210
This involves reading in the spectrum without readin in the full FITS file, just pulling the extension we want.
213211

214212
```{code-cell} ipython3
215-
file_uri = urllib.parse.urljoin(Irsa.tap_url, result2['uri'][0])
213+
file_uri = urllib.parse.urljoin(Irsa.tap_url, result_table2['uri'][0])
216214
file_uri
217215
```
218216

219217
```{code-cell} ipython3
220218
with fits.open(file_uri) as hdul:
221-
hdu = hdul[df2['hdu'].iloc[0]]
222-
dat = Table.read(hdu, format='fits', hdu=1)
223-
df_obj_irsa = dat.to_pandas()
219+
spectrum = QTable.read(hdul[result_table2['hdu'][0]], format='fits')
220+
spec_header = hdul[result_table2['hdu'][0]].header
224221
```
225222

226223
### Now the data are read in, plot the spectrum with the H-alpha line labeled
227224

228-
Divide by 10000 to convert from Angstrom to micron
225+
```{tip}
226+
As we use astropy.visualization's ``quantity_support``, matplotlib automatically picks up the axis units from the quantitites we plot.
227+
```
228+
229+
```{code-cell} ipython3
230+
quantity_support()
231+
```
229232

230233
```{code-cell} ipython3
231-
wavelengths = obj_tab['spe_line_central_wl_gf']/10000.
232-
line_names = obj_tab['spe_line_name']
233-
snr_gf = obj_tab['spe_line_snr_gf']
234+
# Note that the units are missing from the lines table, we manually add Angstrom
235+
line_wavelengths = obj_row['spe_line_central_wl_gf'] * u.angstrom
236+
line_names = obj_row['spe_line_name']
237+
snr_gf = obj_row['spe_line_snr_gf']
234238
235-
plt.plot(df_obj_irsa['WAVELENGTH']/10000., df_obj_irsa['SIGNAL'])
239+
plt.plot(spectrum['WAVELENGTH'].to(u.micron), spectrum['SIGNAL'])
236240
237-
for wl, name, snr in zip(np.atleast_1d(wavelengths), np.atleast_1d(line_names), np.atleast_1d(snr_gf)):
241+
for wl, name, snr in zip(np.atleast_1d(line_wavelengths), np.atleast_1d(line_names), np.atleast_1d(snr_gf)):
238242
plt.axvline(wl, color='b', linestyle='--', alpha=0.3)
239-
plt.text(wl+0.02, .2, name+' SNR='+str(round(snr)), rotation=90, ha='center', va='bottom', fontsize=10)
243+
plt.text(wl, .2, name+' SNR='+str(round(snr)), rotation=90, ha='center', va='bottom', fontsize=10)
240244
241-
plt.xlabel('Wavelength (microns)')
242-
plt.ylabel('Flux (erg / (s cm2))')
243-
plt.xlim(1.25, 1.85)
244-
plt.title('Object ID is '+str(obj_id))
245+
plt.title(f'Object ID {obj_id}')
245246
```
246247

247248
## About this Notebook

0 commit comments

Comments
 (0)