Add MVP function for electrical tariff mapping #137

alexhyunminlee · 2026-02-06T04:09:44Z

Summary

This PR adds python script and function that generates electrical tariff mapping for bldg_id's based on SB_analysis_scenario, electrical_utility, customer_class, etc.

Closes #134

alexhyunminlee · 2026-02-06T19:02:22Z

utils/electric_tariff_mapper.py

+    match SB_scenario_name:
+        # Default scenario
+        case "default_1" | "default_2" | "default_3":
+            tariff_key = "default"


tariff_key = f"{electric_utility}{SB_scenario_name}{HP or flat or default, etc.}"

jpvelez · 2026-02-08T21:52:56Z

rate_design/ny/hp_rates/data/tariff_map/Justfile

@@ -0,0 +1,13 @@
+# Resolve project root as absolute path from this Justfile's location (works for any clone)
+project_root := absolute_path(justfile_directory() / ".." / ".." / ".." / ".." / "..")
+data_base := "s3://data.sb/nrel/resstock/res_2024_amy2018_2/metadata"


Not a database, call it path_s3

Addressed in latest commit

jpvelez · 2026-02-08T21:53:52Z

rate_design/ny/hp_rates/data/tariff_map/Justfile

+# Example: just map-electric-tariff Coned default 1 00
+map-electric-tariff electric_utility SB_scenario_name SB_scenario_year upgrade_id:
+  cd {{project_root}} && uv run python {{project_root}}/utils/electric_tariff_mapper.py \
+    --metadata_path "{{data_base}}/state=NY/upgrade={{upgrade_id}}/metadata-sb.parquet" \


Pass the path_s3 only and use polars filters to select the state (which is already a param) and upgrade (which shouldn't vary?)

Addressed in the latest commit to do the following:

Only the path_s3 is passed

The python script builds the full metadata_path using the state and upgrade_id provided

Reads in LazyFrame via scan_parquet

Passes the LazyFrame directly to map_electric_tariff function

jpvelez · 2026-02-08T21:59:02Z

utils/electric_tariff_mapper.py

+    # For now, we will manually add the electric utility name column. Later on, the metadata parquet will be updated to include this column.
+    # Assign first ~1/3 to Coned, next ~1/3 to National Grid, last ~1/3 to NYSEG.
+    metadata_path = S3Path(args.metadata_path)
+    metadata_df = pl.read_parquet(io.BytesIO(metadata_path.read_bytes()))


As I've explained several times now, we need to use scan_parquet to get a pl.LazyFrame, not read_parquet to get a pl.DataFrame.

Also, you don't need to read out the bytes like this. Just pass the string.

Addressed in latest commit

jpvelez · 2026-02-08T22:04:00Z

utils/electric_tariff_mapper.py

+    temp_path = metadata_path.parent / f"{metadata_path.stem}_temp.parquet"
+    buf = io.BytesIO()
+    metadata_df.write_parquet(buf)
+    temp_path.write_bytes(buf.getvalue())


What's all this BytesIO and write_bytes stuff about? You shouldn't need it, you can just use write_parquet

More to the point, you don't need to save this to disk at all! You can refactor map_electric_tariff to accept a LazyFrame rather than a path, which is probably a better design anyway, and then you can just pass metadata_df to the function directly.

It's a better design because:

if map_electric_tariff is being used from the CLI, then electric_mapper_py can have the job of accepting the path and calling scan_parquet and passing the LazyFrame to the function.

if map_electric_tariff is being imported into another script and called directly from python code to serve a more complicated use case, then it offers more flexibility to accept a LazyFrame, rather than a path to a file on disk—who knows how the code calling this function is manufacturing a LazyFrame.

Addressed in the latest commit

jpvelez · 2026-02-08T22:05:02Z

utils/electric_tariff_mapper.py

+
+    if (
+        args.SB_scenario_name != "default"
+        and args.SB_scenario_name != "seasonal"


use args.SB_scenario_name in <list>

You should really have an LLM check your code and make suggestions for improvement before you commit.

Addressed in latest commit

jpvelez · 2026-02-08T22:05:45Z

utils/electric_tariff_mapper.py

+    ):
+        raise ValueError(f"Invalid SB scenario name: {args.SB_scenario_name}")
+
+    sb_scenario = SB_scenario(args.SB_scenario_name, args.SB_scenario_year)


If you're going to make a class for SB_scenario, you could move the validation into here

Addressed in latest commit

jpvelez · 2026-02-08T23:18:45Z

rate_design/ny/hp_rates/data/tariff_map/Justfile

@@ -0,0 +1,13 @@
+# Resolve project root as absolute path from this Justfile's location (works for any clone)
+project_root := absolute_path(justfile_directory() / ".." / ".." / ".." / ".." / "..")


Resolve it with project_root := git rev-parse --show-toplevel`

More elegant

Less brittle if you move the file around

Standard way to do this

Addressed in the latest commit

jpvelez · 2026-02-08T23:19:37Z

rate_design/ny/hp_rates/data/tariff_map/Justfile

+# Usage: just map-electric-tariff [electric_utility] [SB_scenario_type] [SB_scenario_year] [upgrade_id]
+# Example: just map-electric-tariff Coned default 1 00
+map-electric-tariff electric_utility SB_scenario_name SB_scenario_year upgrade_id:
+  cd {{project_root}} && uv run python {{project_root}}/utils/electric_tariff_mapper.py \


You don't need to cd if project_root is an absolute path.

Addressed in latest commit

jpvelez · 2026-02-08T23:30:38Z

rate_design/ri/hp_rates/data/tariff_map/Justfile

+
+# Usage: just map-electric-tariff [electric_utility] [SB_scenario_type] [SB_scenario_year] [upgrade_id]
+# Example: just map-electric-tariff Coned default 1 00
+map-electric-tariff electric_utility SB_scenario_name SB_scenario_year upgrade_id:


This parameterized task is useful, because it abstracts away from the verbose CLI invocation.

However, the job of the Justfile is to enumerate all of the different utility-scenario-year combos that we need for the analysis, so that we have version-controlled records of the tariff_maps we use, and an easy way to reproduce them. We don't want to invocations to get lost in the terminal of the user that calls this task.

Here's a way to build on your parameterized task to accomplish this:

Convenience recipes - add more as needed # These call the main recipe with specific arguments, keeping the abstraction map-coned-default: just map-electric-tariff Coned default 1 00 map-coned-seasonal: just map-electric-tariff Coned seasonal 1 00 # Add more convenience recipes here following the pattern: # map-<utility>-<scenario>: # just map-electric-tariff <utility> <scenario> <year> <upgrade_id>

jpvelez · 2026-02-08T23:32:35Z

utils/electric_tariff_mapper.py

+from pathlib import Path
+
+import polars as pl
+from cloudpathlib import S3Path


Did you forget to add this to pyproject.toml and uv.lock? Come on now. That's the last time I should catch this type of oversight.

Addressed in latest commit

jpvelez · 2026-02-08T23:44:53Z

utils/electric_tariff_mapper.py

+    metadata_df = pl.read_parquet(io.BytesIO(metadata_path.read_bytes()))
+    n = len(metadata_df)
+    metadata_df = (
+        metadata_df.with_row_index("_row_idx")


This algorithm doesn't work with LazyFrames, because len(metadata_df) requires materialization—think about it, you need to have the entire dataset in memory because you can count how long it is. If you're streaming a (potentially massive) DataFrame off disk, chunk by chunk, you need a streaming algorithm to accomplish this job:

# Instead of: n = len(metadata_df) # This materializes if LazyFrame metadata_df = ( metadata_df.with_row_index("_row_idx") .with_columns(...) ) # Use this for LazyFrames: metadata_df = metadata_df.with_columns( pl.when(pl.col("bldg_id").hash() % 3 == 0) .then(pl.lit("Coned")) .when(pl.col("bldg_id").hash() % 3 == 1) .then(pl.lit("National Grid")) .otherwise(pl.lit("NYSEG")) .alias("sb.electric_utility") )

Addressed in latest commit

jpvelez · 2026-02-08T23:45:55Z

utils/electric_tariff_mapper.py

+def main():
+    parser = argparse.ArgumentParser(description="Map electrical tariff.")
+    parser.add_argument(
+        "--metadata_path", required=True, help="Base path for resstock data"


"Absolute or s3 path to ResStock metadata"

Addressed in latest commit

jpvelez · 2026-02-08T23:48:27Z

utils/electric_tariff_mapper.py

+
+
+def map_electric_tariff(
+    SB_metadata_path: S3Path,


This should work with either an local path or an S3 path. You can accomplish this using the AnyPath class from cloudpathlib:

from cloudpathlib import AnyPath # Create an AnyPath instance for a local file local_file = AnyPath("/path/to/local/file.txt") # Create an AnyPath instance for an S3 object s3_file = AnyPath("s3://bucket/path/to/object.txt") # You can then perform operations on either, and the appropriate # class methods will be used automatically. if local_file.exists(): print(f"Local file exists: {local_file.read_text()}") s3_file.write_text("Hello, S3!")

Addressed in the latest commit to accept pl.LazyFrame directly as discussed above ^

jpvelez

There are two broad issue with this:

The I/O could be improved
This should be refactored to work with LazyFrames, not DataFrame. Not strictly necessary in this case since the data is small, but it's a standard we should enforce, and a good learning opportunity.

jpvelez · 2026-02-08T23:50:16Z

utils/electric_tariff_mapper.py

+    if not SB_metadata_path.exists():
+        raise FileNotFoundError(f"SB metadata path {SB_metadata_path} does not exist")
+
+    metadata_df = pl.read_parquet(io.BytesIO(SB_metadata_path.read_bytes()))


Again, this should always be pl.scan_parquet, and you don't need that BytesIO stuff. Though the point is moot since we'll be moving the scan out of this function anyway (see other comment).

Addressed in latest commit to:

Use pl.scan_parquet to load pl.LazyFrame outside of the function

Pass the pl.LazyFrame scanned directly to this function

No BytesIO stuff

jpvelez · 2026-02-08T23:51:00Z

utils/electric_tariff_mapper.py

+
+    #########################################################
+    # Remove the temp file that contains electric utility name column.
+    temp_path.unlink()


Get rid of this

Addressed in latest commit

jpvelez · 2026-02-08T23:51:47Z

utils/electric_tariff_mapper.py

+    utility_metadata_df = metadata_df.filter(
+        pl.col("sb.electric_utility") == electric_utility
+    )
+    if utility_metadata_df.is_empty():


The script should fail loudly if the utility is not found; you want the user to know.

LazyFrames don't have is_empty(). This will raise an AttributeError.

To make it work with LazyFrames, you need to materialize first using .collect(). But that means you stop the query plan right there, even though it's useful for the polars engine to know that it only needs to read sb.electric_utility, bldg_id, and postprocess_group.has_hp off of disk. So you can move the select() to before the .collect() and .is_empty(), like this:

# Step 1: Filter by utility (still lazy) utility_metadata_df = metadata_df.filter( pl.col("sb.electric_utility") == electric_utility ) # Step 2: Select only needed columns (still lazy, building query plan) metadata_has_hp = ( utility_metadata_df .select(pl.col("bldg_id", "postprocess_group.has_hp")) .collect() # Step 3: Materialize (executes filter + select) ) # Step 4: Check if empty (if empty, the filter found no matching utility) if metadata_has_hp.is_empty(): raise ValueError(...)

Addressed in latest commit to:

Filter by electric_utility

Select columns bldg_id and postprocess_group.has_hp

Only collect the first row of the LazyFrame and materialize it

Checks if this first row exists (which means there's at least one match in the metadata-sb.parquet for the requested electric_utility

jpvelez · 2026-02-09T00:02:31Z

utils/electric_tariff_mapper.py

+    print(electrical_tariff_mapping_df.head(20))
+
+    output_path = (
+        RATE_DESIGN_DIR


This is super brittle. That's why the CLI should accept an output path. For it's part, this function should return a DataFrame, not a CSV; the rest of the script can handle writing it to disk.

Addressed in the latest commit. The output_path input is optional for the Justfile command. If it's provided, it saves the csv to that path. If not, it saves to rate_design/<state>/hp_rates/data/tariff_map/<electric_utliyt>_<SB_scenario>.csv. The map_electric_tariff function returns the LazyFrame and the outputting is done in the rest of the script.

jpvelez · 2026-02-09T00:09:57Z

utils/electric_tariff_mapper.py

+    SB_scenario: SB_scenario,
+    electric_utility: electric_utility,
+) -> pl.DataFrame:
+    has_hp = metadata_has_hp["postprocess_group.has_hp"]


This is an anti-pattern... you're materializing has_hp early. The idiomatic polars way to do this is define_electrical_tariff_key(..., pl.col("postprocess_group.has_hp")) # Pass Expr

pl.col("column") returns a pl.Expr that represents an operation in the query plan and is not executed yet. You're building up a big formula that doesn't get actually executed until you call collect. During execution, polars will compute on the Series, but you write an Expr to keep the overall query lazy and optimizable.

Addressed in the latest commit. The metadata_has_hp is passed as a LazyFrame and is not materialized until later.

jpvelez · 2026-02-09T00:10:27Z

utils/electric_tariff_mapper.py

+def define_electrical_tariff_key(
+    SB_scenario: SB_scenario,
+    electric_utility: electric_utility,
+    has_hp: pl.Series,


Change to pl.Exp, see below.

Addressed in latest commit

jpvelez · 2026-02-09T00:12:09Z

utils/electric_tariff_mapper.py

+
+# Project root (rate-design-platform); independent of cwd or caller
+_PROJECT_ROOT = Path(__file__).resolve().parent.parent
+RATE_DESIGN_DIR = _PROJECT_ROOT / "rate_design"


This is brittle; accept output_dir as a CLI arg instead.

Addressed in the latest commit

jpvelez · 2026-02-09T00:13:07Z

utils/types.py

+from typing import Literal
+
+
+class SB_scenario:


Per PEP 8, class names should be CamelCase (PascalCase). SB_scenario should be SBScenario

Addressed in the latest commit

jpvelez · 2026-02-09T00:22:13Z

utils/electric_tariff_mapper.py

+    return
+
+
+def main():


This is not a function you'd ever want to import from a different context... consequently, there's not need for this code to be wrapped in a function. It can literally just go below the if name == main

alexhyunminlee added 4 commits February 5, 2026 19:01

Add json file to keep track of scenario names and tariff mappings

2be01eb

Add dummy sb.electric_utility columns and pass to map_electric_tariff

0e75a29

Use python function to assign tariff key instead of json

48dc749

Write tariff map to a csv

33cb52a

alexhyunminlee linked an issue Feb 6, 2026 that may be closed by this pull request

MVP function to generate electrical tariff map #134

Closed

alexhyunminlee added 2 commits February 6, 2026 04:33

Add Justfile command for electrical utility mapping

db8242e

Add Justfile command for electrical utility mapping

48d2bdb

alexhyunminlee requested review from alxsmith, jpvelez and sherryzuo February 6, 2026 04:35

alexhyunminlee commented Feb 6, 2026

View reviewed changes

alexhyunminlee added 3 commits February 8, 2026 03:45

Simplify tariff_key assigning function

eb2d45d

Simplify tariff_key assigning function

e21cb13

Clean up code and minor fixes

9a02a7d

jpvelez reviewed Feb 8, 2026

View reviewed changes

jpvelez requested changes Feb 9, 2026

View reviewed changes

jpvelez reviewed Feb 9, 2026

View reviewed changes

alexhyunminlee added 2 commits February 9, 2026 19:29

Address comments and reformat code

44da847

Resolve conflict with main

5802363

alexhyunminlee added 2 commits February 9, 2026 19:31

Resolve conflict with main

1c65248

Resolve conflict with main

659495f

alexhyunminlee merged commit 1a6f7df into main Feb 9, 2026
2 of 4 checks passed

alexhyunminlee deleted the 134-mvp-function-to-generate-electrical-tariff-map branch February 9, 2026 19:48

		@@ -0,0 +1,13 @@
		# Resolve project root as absolute path from this Justfile's location (works for any clone)
		project_root := absolute_path(justfile_directory() / ".." / ".." / ".." / ".." / "..")

Add MVP function for electrical tariff mapping #137

Add MVP function for electrical tariff mapping #137

Uh oh!

Conversation

alexhyunminlee commented Feb 6, 2026

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpvelez Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpvelez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpvelez Feb 8, 2026 •

edited

Loading