420 add skill score for ensemble metrics by samlamont · Pull Request #494 · RTIInternational/teehr

samlamont · 2025-07-15T14:34:33Z

This could use a review and some additional discussion. Here are the main changes:

Adds calculate_climatology() method to timeseries table for calculating long-term averages of a timeseries based on some time interval (ie, day-of-year) (resolves Investigate adding simple baselines #228 )
Adds create_reference_forecast() method to secondary timeseries table
- Includes method based on a reference timeseries (ie, "climatology" and template forecast)
- Allows for timeseries aggregation (based on rolling average) and ffill/bfill of nan values
Updates transform function for probablistic methods
Adds HourOfYear as row-level calculated field
Fixes filtering by forecast_lead_time (pyspark time interval) (resolves Fix filtering by forecast_lead_time #451)
Adds skill score as an optional metric post-processing step
Adds load_dataframe() as a loading method to load in-memory dataframe (resolves Add _write_pandas() method top base table #471)
Removes redundant load_table calls
Also sets default pyspark driver memory based on available memory (resolves Set default PySpark driver memory configurations #477 )

… to secondary table

…kill-score-for-ensemble-metrics

mgdenno · 2025-08-22T12:31:03Z

src/teehr/evaluation/generate.py

+        output_dataframe : ps.DataFrame
+            The output spark DataFrame of a specified timestep and start
+            and end datetimes.
+        update_variable_table : bool


Should these boolean args have default values?

I think these classes BenchmarkForecast and SignatureTimeseries can go away to be honest. I think I added them to define the type of class that gets returned, but it seems like the method.generate() call can happen directly in the Generator class to reduce complexity.

I have the defaults of the boolean set in the method arguments.

mgdenno · 2025-08-22T12:33:32Z

src/teehr/evaluation/generate.py

+            The input timeseries model. The defines a unique timeseries
+            that will be queried from the Evaluation and used as the
+            input_dataframe. Defaults to None.
+        start_datetime : Union[str, datetime]


Should a note be added to the datetime and timedelta doc strings indicating what string formats are supported. Or, I think something like "Any string that can be parsed by XYZ library" would be fine if appropriate (I did not look to see how it is parsed).

mgdenno · 2025-08-22T12:40:35Z

src/teehr/evaluation/generate.py

+        validated_df = tbl._validate(df=self.df)
+        tbl._write_spark_df(validated_df, write_mode=write_mode)
+
+    def to_pandas(self):


Shoulkd this have to_sdf() and maybe to_geopandas() methods?

I guess can .df.show() but that is not consistent with other classes...IDK

Yeah I can add to_sdf(). Maybe we can implement to_geopandas() for timeseries in a separate issue?

mgdenno · 2025-08-22T13:05:49Z

src/teehr/models/generate/timeseries_generator_models.py

+
+
+class Normals(SignatureGeneratorBaseModel, GeneratorABC):
+    """Model for generating synthetic normals timeseries."""


Does it make sense to put a doc string here so users know what fields are available to be set?

I guess because the way this is inherits other classes the code editor is not aware of what the properties are...

added. I can see the doc string when hovering in vs code but it seems really sensitive to my cursor location

mgdenno · 2025-08-22T13:08:49Z

src/teehr/models/generate/base.py

+    configuration_name: Optional[str] = Field(default=None)
+    variable_name: Optional[str] = Field(default=None)
+    unit_name: Optional[str] = Field(default=None)
+    table_name: TimeseriesTableNamesEnum = Field(


nit: I feel like the table should be at the top or bottom of the list.

added to top :)

samlamont · 2025-08-25T13:02:35Z

@mgdenno I took another crack at the filter issue we talked about. I created a TableFilter(), which takes a table and filter object. This is essentially just consolidates the two arguments.

Then I created a filter() method on the Evaluation class which accepts a TableFilter class or table_name, filter arguments. This is just another way to filter a table (ex.,ev.primary_timeseries.filter()) but directly from the evaluation, and includes the filter validation.

I think this is closer to what I was originally going for, but let me know what you think. Happy to discuss

samlamont and others added 30 commits April 15, 2025 11:33

updated xml parser; hefs loading tests WIP

df859c9

update null field checks, revert previous temp changes

5ca1765

remove previous xml parsing func

4315c59

remove unneeded funcs

f69f5bf

include timezone offset

8beb68a

parallelize timeseries loading from directory; xml parsing update

599cfc1

add max_workers argument

6fda974

climatology WIP

4c5e11e

Update unique column set fields

52dbb39

Update null column checks

1a374cb

update test, add member to field check

0c80c86

update null column check

4cbec06

update max cpu count

5880a2d

Adds persist_dataframe argument for loading timeseries

6190bc8

Calculate reference forecast WIP

119ab33

Aggregate timeseries in cache according to file size

288e384

Adding calculate_climatology() and create_refernce_forecast() methods…

547d9a2

… to secondary table

Add drop_duplicates test

2388e0d

change function name

ac206d7

fix function references

f64f1e8

Update concurrent convert_timeseries logic

0e271b6

set default persist_dataframe to false

af54d27

Merge branch '424-parsing-xml-fews-timeseries-is-slow' into 420-add-s…

a7a1eaf

…kill-score-for-ensemble-metrics

WIP backup

c3c9c0a

Fix ensemble pivot preprocessor

ff33c80

Ref forecast updates

0c35190

fillna WIP

09adf68

Validate dynamic overwrite

0ace90a

WIP

a2b7641

adds spark config.

47c3fd3

mgdenno reviewed Aug 22, 2025

View reviewed changes

samlamont added 8 commits August 22, 2025 15:27

remove unnecessary classes

3630adc

Merge branch 'main' into 420-add-skill-score-for-ensemble-metrics

36e62f1

uncomment test

dd60afd

add docstrings

bd4cafb

evaluation table filters

e1dad27

check for table_name, line wraps

d4667e1

fix test

7bbe34c

fix example notebook

2383794

Merge branch 'main' into 420-add-skill-score-for-ensemble-metrics

a8f7a32

samlamont marked this pull request as draft August 25, 2025 20:16

samlamont marked this pull request as ready for review August 25, 2025 20:16

samlamont added 2 commits August 25, 2025 16:23

fix test import

fb280fb

rename workflows

8ac8607

samlamont marked this pull request as draft August 26, 2025 00:21

samlamont marked this pull request as ready for review August 26, 2025 00:21

samlamont added 4 commits August 26, 2025 09:26

fix add udfs test

dc92aca

trigger tests

3468ce1

reset workflow trigger

539cf7c

update doc strings for date string format

3c18c45

samlamont marked this pull request as draft August 27, 2025 15:31

samlamont marked this pull request as ready for review August 27, 2025 15:31

samlamont merged commit 29b8682 into main Aug 27, 2025
5 checks passed

samlamont deleted the 420-add-skill-score-for-ensemble-metrics branch August 27, 2025 16:15

samlamont mentioned this pull request Aug 27, 2025

Move image build and deployment to teehr-hub repo #476

Closed



		class Normals(SignatureGeneratorBaseModel, GeneratorABC):
		"""Model for generating synthetic normals timeseries."""

Comments

Conversation

samlamont commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgdenno Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samlamont commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samlamont commented Jul 15, 2025 •

edited

Loading

mgdenno Aug 22, 2025 •

edited

Loading