Skip to content

Commit 668a59a

Browse files
Wording changes & content reduction as per @jkrick feedback
1 parent d8a7dab commit 668a59a

File tree

1 file changed

+8
-42
lines changed

1 file changed

+8
-42
lines changed

tutorials/parquet-catalog-demos/irsa-hats-with-lsdb.md

Lines changed: 8 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ kernelspec:
1111
name: python3
1212
---
1313

14-
# Access IRSA HATS collections using lsdb
14+
# Access HATS Collections Using LSDB: Euclid Q1 and ZTF DR23
1515

1616
+++
1717

@@ -246,13 +246,9 @@ These can be useful for our **column filters** — the columns we want to SELECT
246246
We can also filter the schema DataFrame by name, unit, type, etc., to identify columns most relevant for our **row filters** — WHERE rows satisfy conditions on column values for our query.
247247
For example, let's explore the columns that are part of the PHZ (photometric redshift) catalog to identify photometric redshifts and source types:
248248

249-
```{code-cell} ipython3
250-
euclid_schema_df[euclid_schema_df["name"].str.startswith("phz_")] # phz_ prefix is for PHZ catalog columns in this merged catalog
251-
```
252-
253249
```{code-cell} ipython3
254250
euclid_schema_df[
255-
euclid_schema_df["name"].str.startswith("phz_")
251+
euclid_schema_df["name"].str.startswith("phz_") # phz_ prefix is for PHZ catalog columns in this merged catalog
256252
& euclid_schema_df["type"].str.contains("int") # to see flag type columns
257253
]
258254
```
@@ -322,21 +318,13 @@ ztf_schema_df = pq_schema_to_df(ztf_schema)
322318
ztf_schema_df
323319
```
324320

325-
```{code-cell} ipython3
326-
ztf_schema_df[ztf_schema_df["unit"].str.contains("mag")] # to identify magnitude quantities
327-
```
328-
329-
You can explore the schema further to identify other columns of interest.
321+
You can filter the schema further by units, type, etc. to identify other columns of interest.
330322
It's also useful to go through the [ZTF DR23 release notes](https://irsa.ipac.caltech.edu/data/ZTF/docs/releases/ztf_release_notes_latest) and [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) at IRSA for more details on column selections and caveats.
331323

332324
For this tutorial, the following columns are most relevant to us:
333325

334326
```{code-cell} ipython3
335327
ztf_columns = ztf_schema_df["name"].tolist()[:6]
336-
ztf_columns
337-
```
338-
339-
```{code-cell} ipython3
340328
ztf_columns.extend([
341329
'fid', 'filtercode',
342330
'ngoodobsrel',
@@ -436,7 +424,9 @@ with Client(n_workers=get_nworkers(euclid_x_ztf),
436424
euclid_x_ztf_df
437425
```
438426

439-
[Optional] Let's purify the crossmatched catalog by analyzing the distance between matched sources and removing the matches that don't meet a quality cut on percentile.
427+
### 5.3 [Optional] Filter the crossmatched catalog
428+
429+
Let's purify the crossmatched catalog by analyzing the distance between matched sources and removing the matches that don't meet a quality cut on percentile.
440430
We also keep the matches that are outside this cutoff but are still within the same 19th order HEALPix tile.
441431

442432
```{code-cell} ipython3
@@ -467,7 +457,7 @@ Going forward, we will use this purified crossmatched catalog `euclid_x_ztf_filt
467457

468458
+++
469459

470-
### 5.3 Identify objects of interest from the crossmatch
460+
### 5.4 Identify objects of interest from the crossmatch
471461

472462
+++
473463

@@ -573,32 +563,8 @@ magrms_threshold
573563
variable_galaxies = euclid_x_ztf_filtered_df.query(
574564
f"chisq_ztf >= {chisq_threshold} & magrms_ztf >= {magrms_threshold}"
575565
).sort_values("chisq_ztf", ascending=False) # sort by significant variability
576-
variable_galaxies
577-
```
578-
579-
```{code-cell} ipython3
580-
plt.figure(figsize=(10, 6))
581-
all_redshifts = euclid_x_ztf_filtered_df["phz_phz_median_euclid"]
582-
bins = np.histogram_bin_edges(all_redshifts, bins=100)
583-
584-
plt.hist(all_redshifts, bins=bins, label='Galaxies', histtype='step', alpha=0.8)
585-
plt.hist(
586-
variable_galaxies["phz_phz_median_euclid"],
587-
bins=bins,
588-
label=f'Variable Galaxies \nwith RMS mag > 95% ({magrms_threshold:.6f})\nand Chi-sq > 95% ({chisq_threshold:.6f})',
589-
histtype='step',
590-
alpha=0.8
591-
)
592-
593-
plt.xlabel("Euclid redshift")
594-
plt.ylabel("Galaxy Count")
595-
plt.legend()
596-
plt.yscale("log")
597-
plt.xlim(0, 2.5)
598-
plt.show()
599566
```
600567

601-
Most of the variable galaxies histogram is concentrated in the low redshift (z < 1) range, as expected from the hexbin density plots above.
602568
Let's inspect the top variable galaxies that we can use for plotting their light curves:
603569

604570
```{code-cell} ipython3
@@ -645,7 +611,7 @@ ztf_lcs
645611
```
646612

647613
As earlier, this creates a lazy catalog object with the partition(s) that contains our IDs.
648-
We can load the light curves data into a DataFrame by using the `compute()` method:
614+
We can load the light curves data into a DataFrame by using the `compute()` method:
649615

650616
```{code-cell} ipython3
651617
# Uncomment following if ztf_lcs contain more than 1 partition

0 commit comments

Comments
 (0)