Skip to content

Conversation

@olivialynn
Copy link
Member

@olivialynn olivialynn commented Nov 5, 2025

The second of three planned PRs for LSDB's #449.

Overall plan:

A little more detail:

  1. Set up stage:
    1. We initialize the ResumePlan to have a concept of threshold_mode. The threshold mode defaults to row_count, but is set to mem_size if byte_pixel_threshold is in the input args (and is not None).
    2. We also specify some paths in ResumePlan: MEM_SIZE_HISTOGRAM_BINARY_FILE and MEM_SIZE_HISTOGRAMS_DIR.
    3. We run ResumePlan's gather_plan, which creates histogram directory(/ies) among other set up stuff.
  2. Mapping stage:
    1. Here's where we map input files to Healpix pixels (via the call to map_reduce's map_to_pixels); and in doing so, we create the histogram. We pass threshold_mode to map_to_pixels, and if it's mem_size, we additionally make the mem_size histogram.
    2. We add memory size calculating method _get_mem_size_of_chunk and its two helpers _get_row_mem_size_data_frame and _get_row_mem_size_pa_table
  3. Binning stage:
    1. No changes for now, except that we add an explicit parameter which_histogram to read_histogram to show that we're reading the row_count histogram. It's the default, but I wanted to include it for readability/safety.

@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.64%. Comparing base (fb2e1f3) to head (250fd09).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/hats_import/catalog/map_reduce.py 93.61% 3 Missing ⚠️
src/hats_import/catalog/resume_plan.py 97.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #619      +/-   ##
==========================================
+ Coverage   92.57%   92.64%   +0.06%     
==========================================
  Files          32       32              
  Lines        1926     1998      +72     
==========================================
+ Hits         1783     1851      +68     
- Misses        143      147       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@delucchi-cmu delucchi-cmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments. Otherwise LGTM!

@olivialynn olivialynn merged commit 87aef5e into main Nov 11, 2025
12 checks passed
@olivialynn olivialynn deleted the u/olynn/add_mem_size_hist branch November 11, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants