Skip to content

Reduce peak memory in demand flex to fix OOM on ConEd/NiMo runs 15-16#379

Merged
alxsmith merged 8 commits intomainfrom
378-reduce-peak-memory-in-demand-flex-to-fix-oom-on-conednimo-runs-15-16
Mar 25, 2026
Merged

Reduce peak memory in demand flex to fix OOM on ConEd/NiMo runs 15-16#379
alxsmith merged 8 commits intomainfrom
378-reduce-peak-memory-in-demand-flex-to-fix-oom-on-conednimo-runs-15-16

Conversation

@alxsmith
Copy link
Contributor

Closes #378

This PR fixes OOM kills on ConEd and NiMo runs 15-16 (and intermittently 13-14) by reducing peak RSS during the demand-flex pipeline and adding correctness/config guardrails.

What's in this PR

Memory optimizations (committed earlier in 1348c4d, included here):

  • Vectorize process_residential_hourly_demand_response_shift: replace per-building groupby loop + pd.concat with groupby.transform + dict lookup, returning numpy arrays instead of DataFrames. This was the primary memory bottleneck for large utilities (~15k buildings).
  • Eliminate tou_df (full TOU cohort copy) and shifted_chunks in apply_runtime_tou_demand_response: extract each season slice just-in-time and write back in-place.
  • Add inplace=True mode so apply_demand_flex can skip a redundant DataFrame copy.
  • Precompute per-TOU-key original system loads as tiny 8760-row Series before copying raw_load_elec, then del raw_load_elec to free the original before shifting begins.
  • del raw_load_elec in run_scenario.py after the flex branch so the caller's reference is also freed before bs.simulate().

Validated numerically: CenHud runs 13-16 produce zero diff vs pre-optimization gold baseline across all 8 output artifacts.

Phase 2.5 bypass (1e3b26b): Skip the per-TOU-subclass MC delta computation when run_includes_subclasses=False. Phase 2.5 scans the full effective load DataFrame for each TOU key — unnecessary for single-tariff runs (NiMo, CenHud, etc.) that don't split revenue requirements by subclass.

Config validation (8daef7d): validate_config.py now warns if run_includes_subclasses disagrees with the number of keys in path_tariffs_electric, catching YAML inconsistencies before a run starts.

ConEd TOU schedule fix (5779d9a): Correct the HP seasonal TOU peak window in coned_hp_seasonalTOU_flex.json — peak period started at hour 16 (4pm) but should start at hour 15 (3pm).

IDE memory (eb1ad7a): Set python.analysis.diagnosticMode: openFilesOnly in .vscode/settings.json to prevent Pyright from consuming 4+ GB on shared EC2 instances.

Validation tool (ad02c9c): utils/post/compare_cairo_runs.py — CLI to compare two CAIRO run directories on S3 numerically. Used throughout this branch to confirm outputs are unchanged after each optimization step.

Reviewer focus

  • The vectorized shift in process_residential_hourly_demand_response_shift (dict lookup + groupby.transform) is the highest-impact change — worth a close read to confirm the zero-sum writeback is correct.
  • The Phase 2.5 bypass is guarded by the same run_includes_subclasses flag that run_scenario.py already uses to decide whether to split revenue requirements — so the logic is consistent.

Made with Cursor

Five coordinated changes that together cut peak RSS during demand flex
by ~18 GB for large utilities (ConEd ~15k buildings):

1. Eliminate per-building loop + pd.concat in process_residential_hourly_demand_response_shift:
   replace with groupby.transform for Q_orig and a dict lookup for load_shift,
   avoiding a full merge that doubled memory for tens-of-millions-of-row slices.
   Return (shifted_net, hourly_shift, tracker) numpy arrays instead of DataFrames.

2. Eliminate tou_df (full TOU cohort copy) in apply_runtime_tou_demand_response:
   each season slice is now extracted just-in-time from the output DataFrame,
   and shifts are written back in-place rather than collected for a final concat.

3. Add inplace=True mode to apply_runtime_tou_demand_response:
   callers that already hold a copy can skip the internal copy entirely.

4. In apply_demand_flex, make one copy upfront, precompute the per-TOU-key
   original weighted system loads (tiny 8760-row Series) before copying,
   then del raw_load_elec so the original is released before the shift begins.
   Phase 2.5 uses the precomputed Series instead of the full original DataFrame.

5. In run_scenario.py, del raw_load_elec after the flex branch so the caller's
   reference is also freed before bs.simulate() runs.

All changes validated numerically: CenHud runs 13-16 produce bit-identical
outputs (zero max abs/rel diff) vs the pre-optimization gold baseline across
all 8 artifacts (BAT, bills, elasticity tracker, metadata, tariff config).

Made-with: Cursor
Phase 2.5 computes per-TOU-subclass MC deltas for revenue requirement
splitting between HP and non-HP customer classes. This work is only
needed when a run has multiple tariff subclasses (e.g. ConEd runs 13-16
with hp/nonhp). Single-tariff runs (e.g. NiMo, CenHud) can skip it
entirely, saving memory and compute on the full effective_load_elec scan.

Pass run_includes_subclasses from ScenarioSettings through to
apply_demand_flex so it can guard the Phase 2.5 block.

Made-with: Cursor
Add a pre-run cross-check to validate_config.py: for each run in the
scenario YAML, compare the explicit run_includes_subclasses flag against
whether path_tariffs_electric has more than one key (the canonical
source of truth). Print a warning to stderr if they disagree so config
mistakes are caught before CAIRO starts.

Made-with: Cursor
The TOU schedule had peak period (period 1/3) starting at hour index 16
(4pm); correct start is hour 15 (3pm). Shift the on-peak block back one
hour in both weekday and weekend schedules for all seasons in the flex
and flex_calibrated tariffs.

Made-with: Cursor
Set python.analysis.diagnosticMode to openFilesOnly so Pyright does not
index the entire workspace. On a shared EC2 instance this prevents the
language server from consuming 4+ GB of RAM that competes with CAIRO runs.

Made-with: Cursor
New utils/post/compare_cairo_runs.py compares two S3 CAIRO run directories
file-by-file (BAT values, bills, elasticity tracker, metadata, tariff config)
using configurable rtol/atol tolerances. Exits non-zero if any diff exceeds
tolerance so it can be used in CI or as a manual regression check.

Used during this branch to validate that memory optimizations produced
bit-identical outputs against the CenHud gold baseline (zero max diff
across all 8 artifacts for runs 13-16).

Made-with: Cursor
@alxsmith alxsmith linked an issue Mar 25, 2026 that may be closed by this pull request
4 tasks
- cairo.py: cast get_level_values result to DatetimeIndex before
  accessing .month; ty stubs don't expose .month on the generic Index
  return type, suppress with type: ignore[attr-defined]
- compare_cairo_runs.py: guard df_chal.height behind an explicit
  is-not-None check; suppress overly-wide Polars .max() return type
  on float() conversion with type: ignore[arg-type]

Made-with: Cursor
@alxsmith alxsmith merged commit ab436c5 into main Mar 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce peak memory in demand flex to fix OOM on ConEd/Nimo runs 15-16

1 participant