IRSA-7104: Add "Access IRSA HATS using lsdb" notebook #136

jaladh-singhal · 2025-09-14T22:38:29Z

NOTE: The time performance of the notebook depends on the spatial filters defined in the beginning. I'm working with a cone at Euclid Deep Field North center, with following radius size:

cone size	notebook executed where	total execution time	comments
0.5 deg	locally	4.5-5 min	~2 min for each of two lsdb .compute() calls (was 4+4min on Friday)
0.5 deg	CircleCI rendering	101s = 1.68min (average of two runs)	surprisingly less! I was expecting it to be more than local. Is it because CircleCI is also on AWS or something else?
0.5 deg	Fornax SP	75s = 1.25min	fastest of all as expected
3 deg (entire field)	Fornax SP	192s = 3.2 min (1.5min for 1st compute())	6 times more cone radius is still faster than locally! Crossmatch results include more interesting candidates

index.md

troyraen

I'll come back and look in more detail, but here's a couple of things I noticed quickly.

Will you make the markdown text one line per sentence? We try to stick with that in these repos because it makes the reviews a lot easier.

index.md

.binder/requirements.txt

bsipocz · 2025-09-16T00:46:53Z

@jaladh-singhal - could you roll back your last commit and break the Markdown at the end of the sentences? Breaking them at char lenght will result way bigger than necessary diffs in the future --> the markdowns we thus break at the end of the sentences. (and a good rule of thumb is to write shorter sentences if a line becomes really long :) )

jkrick

As I said in slack, I think this is a super useful notebook, so thanks for writing it. I have a few comments below, but traditionally this repo doesn't get quite the scrutiny that fornax-demo-notebooks does, so please take these as suggestions.

I think this title does work, but to be nitpicky, I would like to also see the words "Euclid" and "ZTF" in there to help users wanting those particular datasets. Some suggestions:
"Working with parquet Collections: Euclid Q1 & ZTF DR23 via lsdb" or "HATS-Based Analysis of Euclid and ZTF Catalogs" or "Using lsdb to Filter and Match Euclid and ZTF Cloud Catalogs"

split section 5.2 into two sections, 5.2 Execute the query and 5.3 filter data frame

My biggest comment is that the notebook feels very long to me. Sections I think could be cut:

don't need to see the whole euclid_schema_df, so no need to print it out in full in section 3.2
I think one filtering example on the column names would be enough, so suggest to keep: "euclid_schema_df[
euclid_schema_df["name"].str.startswith("phz_")
& euclid_schema_df["type"].str.contains("int") # to see flag type columns
]" and remove "euclid_schema_df[euclid_schema_df["name"].str.startswith("phz_")] # phz_ prefix is for PHZ catalog columns in this merged catalog"
section 4 can be shortened because it is really a repeat of section 3 with replacing "Euclid" with "ZTF", so I would suggest limiting to just what we need for the rest of the notebook.
section 5.3, suggest remove the parts about "check if there is any ZTF object that has observations in multiple filters", go straight to the plots
I don't think the number of galaxies as a function of redshift is an important plot since much of this information can be gleaned from the hex bin plots above.

And I put these in slack before I knew I was doing a full review, so here are the requested comments on the specific filters, copied here so we can track them:

In section 5.2, not sure why you keep those sources within the same healpix area but outside of 75percentile.
I wonder if you limited to redshifts below 1 if you would be able to see trends in the hexbin plots as a function of redshift better. I would probably filter on redshift when you do the initial filtering with "galaxy_filters"
what is the magnitude limit of ZTF in r band? Those data points in the early times with the larger error bars appear to be pulling up the RMS mag and other metrics. I wonder if they are good data points or not. For the light curve notebook we wrote a while ago we filtered on "catflags < 32768" to remove bad flags, can we do that with this notebook as well?

jkrick · 2025-09-18T17:32:50Z

I just tried running this on fornax, and am getting the following AWS access error on the first cell in section 3.2 which calls pq.read_schema on the euclid path.

OSError Traceback (most recent call last)
Cell In[20], line 1
----> 1 euclid_schema = pq.read_schema(
2 f"s3://{euclid_q1_bucket}/{euclid_q1_hats_prefix}/{euclid_q1_schema_path}"
3 )
4 type(euclid_schema)

File /opt/envs/python3/lib/python3.12/site-packages/pyarrow/parquet/core.py:2389, in read_schema(where, memory_map, decryption_properties, filesystem)
2387 file_ctx = nullcontext()
2388 if filesystem is not None:
-> 2389 file_ctx = where = filesystem.open_input_file(where)
2391 with file_ctx:
2392 file = ParquetFile(
2393 where, memory_map=memory_map,
2394 decryption_properties=decryption_properties)

File /opt/envs/python3/lib/python3.12/site-packages/pyarrow/_fs.pyx:814, in pyarrow._fs.FileSystem.open_input_file()

File /opt/envs/python3/lib/python3.12/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File /opt/envs/python3/lib/python3.12/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

OSError: When reading information for key 'contributed/q1/merged_objects/hats/euclid_q1_merged_objects-hats/dataset/_common_metadata' in bucket 'nasa-irsa-euclid-q1': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

jaladh-singhal · 2025-09-18T17:38:40Z

I just tried running this on fornax, and am getting the following AWS access error on the first cell in section 3.2 which calls pq.read_schema on the euclid path.

@jkrick good catch. I also ran it at Fornax and ran into this problem. Will push a fix soon to PR so that it works at Fornax as well as here in CI.

jaladh-singhal · 2025-09-18T19:01:26Z

@jkrick you can try it now - pushed an update and tested it on Fornax myself, it works! Note: you might wanna increase cone radius, adjust plot X/Y bounds, etc. on Fornax

bsipocz · 2025-09-18T19:10:11Z

but traditionally this repo doesn't get quite the scrutiny that fornax-demo-notebooks does

@jkrick - I like to think the other way around, I'm more certain that these notebooks will work out of the box. But I see what you may mean about the content -- and for that I think we should be very open for incoming improvements, or better document that these are showcases to focus on the data rather than the scientific process. If you have any good description that makes this approach clear I think we should add that to the index page

bsipocz · 2025-09-23T19:35:11Z

Just a heads-up, I'm kind of hijacking this PR to do some caching experiment, too. --> ignore the circleCI statuses and/or if you see any strange errors for circleCI

bsipocz · 2025-09-26T18:11:32Z

This one needs a careful rebase now as we switch the system. Let me know if you rather have me doing it.

jaladh-singhal · 2025-09-26T21:44:53Z

This one needs a careful rebase now as we switch the system. Let me know if you rather have me doing it.

@bsipocz sure go for it!

bsipocz · 2025-09-26T22:47:59Z

OK, so question: which new subject category this should go into?

… to install them

bsipocz · 2025-09-26T23:01:31Z

Right now it's in "Special Topics" -- I feel that should not "other" the cloud and parquet/hats stuff, but probably also should not hide it underneath one of the surveys. Maybe "big data astronomy?"

bsipocz · 2025-09-27T00:00:24Z

Closed and reopened to trigger the GHA docs build, too. I do that to make sure we don't run out of resources for deployment (that is not an issue on circleCI as a bunch of other notebooks are excluded, but I noticed that this one piked up to 70%)

bsipocz

Rendering looks good to me, testing, too, so I would say let's go ahead and merge this.

And I have added the GHA html build job, too to ensure this would not make that build run out of resources. IMO we should do add that extra job when we add a new notebook or do significant (performance affecting) changes.

jaladh-singhal · 2025-09-29T17:59:56Z

@bsipocz - I still need to address some of the science feedback I got from Jessica, Harry/Anhita so I'll let you know when it's ready.

Thanks for the infra updates and making it ready!

bsipocz · 2025-09-29T18:37:03Z

I still need to address some of the science feedback I got from Jessica, Harry/Anhita so I'll let you know when it's ready.

That's all good, I don't need to sign off on this again unless you feel you made significant infra choices or want me to double check something.

bsipocz

@jaladh-singhal - I keep seeing this error in rendering: https://github.com/Caltech-IPAC/irsa-tutorials/actions/runs/18145382851/job/51645786712?pr=136#step:5:291

It's really puzzling that it doesn't show up as a test failure with pytest, could you have a look at it though?

In the meantime I'm blocking this PR from merging as we need to understand what is going on.

jaladh-singhal · 2025-09-30T23:11:49Z

@bsipocz I'll take a look - how's "buildhtml testing" job different than pytest ones and circle ci rendering? I mean in terms of its purpose because the other jobs execute the "id search" just fine.

bsipocz · 2025-09-30T23:21:37Z

and circle ci rendering

It's known ante-feature that it doesn't have a failing status when there is a failing notebook in it: it you look into the build log the traceback is there

jaladh-singhal · 2025-10-01T04:00:24Z

@jaladh-singhal - I keep seeing this error in rendering: https://github.com/Caltech-IPAC/irsa-tutorials/actions/runs/18145382851/job/51645786712?pr=136#step:5:291

It's really puzzling that it doesn't show up as a test failure with pytest, could you have a look at it though?

In the meantime I'm blocking this PR from merging as we need to understand what is going on.

The TypeError was happening since the lsdb->hats->pyarrow machinery was picking newest universal_pathlib v0.3 released yesterday which changed the object/type. Pinning it for now until they resolve the typing mismatch (opened issue upstream: astronomy-commons/lsdb#1047).

@bsipocz the buildhtml job now passes for my notebook, it's failing for unrelated reason (some DOI is not found)

troyraen · 2025-10-01T04:35:39Z

I'm in the middle of reviewing this. Please don't merge yet.

bsipocz · 2025-10-01T04:41:17Z

The DOIs are not expected to fail the build, and I'm dealing with the making sure tracebacks will fail the job in a separate PR.

bsipocz

One more comment, then this is good to go from my end.

.binder/requirements.txt

troyraen

Thanks @jaladh-singhal!

tutorials/parquet-catalog-demos/irsa-hats-with-lsdb.md

Co-authored-by: Brigitta Sipőcz <[email protected]>

Co-authored-by: Troy Raen <[email protected]>

troyraen · 2025-10-02T21:51:11Z

Thanks @jaladh-singhal this looks great! And I just ran it on Fornax and it all went well. Merging now.

IRSA-7104: Add "Access IRSA HATS using lsdb" notebook d4cfd94

jaladh-singhal self-assigned this Sep 14, 2025

jaladh-singhal added the content Content related issues/PRs. label Sep 14, 2025

jaladh-singhal requested review from bsipocz and troyraen September 15, 2025 01:57

jaladh-singhal commented Sep 15, 2025

View reviewed changes

index.md Outdated Show resolved Hide resolved

troyraen reviewed Sep 15, 2025

View reviewed changes

index.md Outdated Show resolved Hide resolved

.binder/requirements.txt Outdated Show resolved Hide resolved

troyraen self-requested a review September 15, 2025 17:23

jaladh-singhal force-pushed the IRSA-7104-hats-lsdb branch from 8b8d306 to 8484fba Compare September 16, 2025 01:09

jkrick approved these changes Sep 18, 2025

View reviewed changes

bsipocz force-pushed the IRSA-7104-hats-lsdb branch from bed117a to a82d725 Compare September 18, 2025 22:31

bsipocz force-pushed the IRSA-7104-hats-lsdb branch from 7846e3c to ba60bd8 Compare September 23, 2025 20:21

jaladh-singhal added 3 commits September 26, 2025 15:43

Add IRSA HATS using lsdb notebook

86cec03

Add lsdb to requirements.txt

d4f645d

Add learning goals and introduction + proofreading edits of text

6dffc78

jaladh-singhal and others added 5 commits September 26, 2025 15:49

Add irsa-hats-with-lsdb tutorial to TOC

1e296e0

Break markdown text per sentence

38b437e

Add filesystem in reading schema, & change to correct version of lsdb

712826d

CI: ignore lsdb notebooks for oldestdeps testing

70eca8b

CI: more workaround to ensure we patch the requirements before trying…

d8a7dab

… to install them

bsipocz force-pushed the IRSA-7104-hats-lsdb branch from 24b6c24 to d8a7dab Compare September 26, 2025 23:00

bsipocz added the GHA buildhtml Enable extra buildhtml job on GHA label Sep 26, 2025

bsipocz closed this Sep 26, 2025

bsipocz reopened this Sep 26, 2025

bsipocz approved these changes Sep 29, 2025

View reviewed changes

Wording changes & content reduction as per @jkrick feedback

668a59a

bsipocz requested changes Sep 30, 2025

View reviewed changes

jaladh-singhal added 4 commits September 30, 2025 19:41

Remove separation distance filtering on crossmatched catalog

b88c696

Update ID search code and plotting of LCs

9a4591c

Pin universal_pathlib for build error

f52a1b4

Remove dask client from ID search

e4127c3

bsipocz approved these changes Oct 1, 2025

View reviewed changes

.binder/requirements.txt Show resolved Hide resolved

troyraen approved these changes Oct 1, 2025

View reviewed changes

jaladh-singhal and others added 3 commits October 1, 2025 13:01

Add comment to universal_pathlib pinning

50b039e

Co-authored-by: Brigitta Sipőcz <[email protected]>

Apply wording suggestions from code review

9fd61fa

Co-authored-by: Troy Raen <[email protected]>

Apply other feedback from @troyraen's review

7a66306

bsipocz removed the GHA buildhtml Enable extra buildhtml job on GHA label Oct 1, 2025

jaladh-singhal added 2 commits October 2, 2025 11:24

Upadate about section

6993834

Fix chart labels and layout

24b0da1

troyraen merged commit d4cfd94 into Caltech-IPAC:main Oct 2, 2025
8 checks passed

github-actions bot pushed a commit that referenced this pull request Oct 3, 2025

Merge pull request #136 from jaladh-singhal/IRSA-7104-hats-lsdb

971b56c

IRSA-7104: Add "Access IRSA HATS using lsdb" notebook d4cfd94

IRSA-7104: Add "Access IRSA HATS using lsdb" notebook #136

IRSA-7104: Add "Access IRSA HATS using lsdb" notebook #136

Uh oh!

Conversation

jaladh-singhal commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bsipocz commented Sep 16, 2025

Uh oh!

jkrick left a comment

Choose a reason for hiding this comment

Uh oh!

jkrick commented Sep 18, 2025

Uh oh!

jaladh-singhal commented Sep 18, 2025

Uh oh!

jaladh-singhal commented Sep 18, 2025

Uh oh!

bsipocz commented Sep 18, 2025

Uh oh!

bsipocz commented Sep 23, 2025

Uh oh!

bsipocz commented Sep 26, 2025

Uh oh!

jaladh-singhal commented Sep 26, 2025

Uh oh!

bsipocz commented Sep 26, 2025

Uh oh!

bsipocz commented Sep 26, 2025

Uh oh!

bsipocz commented Sep 27, 2025

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

jaladh-singhal commented Sep 29, 2025

Uh oh!

bsipocz commented Sep 29, 2025

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

jaladh-singhal commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsipocz commented Sep 30, 2025

Uh oh!

jaladh-singhal commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

troyraen commented Oct 1, 2025

Uh oh!

bsipocz commented Oct 1, 2025

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

troyraen commented Oct 2, 2025

Uh oh!

Uh oh!

jaladh-singhal commented Sep 14, 2025 •

edited

Loading

jaladh-singhal commented Sep 30, 2025 •

edited

Loading

jaladh-singhal commented Oct 1, 2025 •

edited

Loading