Skip to content

Comments

420 add skill score for ensemble metrics#494

Merged
samlamont merged 115 commits intomainfrom
420-add-skill-score-for-ensemble-metrics
Aug 27, 2025
Merged

420 add skill score for ensemble metrics#494
samlamont merged 115 commits intomainfrom
420-add-skill-score-for-ensemble-metrics

Conversation

@samlamont
Copy link
Collaborator

@samlamont samlamont commented Jul 15, 2025

This could use a review and some additional discussion. Here are the main changes:

  • Adds calculate_climatology() method to timeseries table for calculating long-term averages of a timeseries based on some time interval (ie, day-of-year) (resolves Investigate adding simple baselines #228 )
  • Adds create_reference_forecast() method to secondary timeseries table
    • Includes method based on a reference timeseries (ie, "climatology" and template forecast)
    • Allows for timeseries aggregation (based on rolling average) and ffill/bfill of nan values
  • Updates transform function for probablistic methods
  • Adds HourOfYear as row-level calculated field
  • Fixes filtering by forecast_lead_time (pyspark time interval) (resolves Fix filtering by forecast_lead_time #451)
  • Adds skill score as an optional metric post-processing step
  • Adds load_dataframe() as a loading method to load in-memory dataframe (resolves Add _write_pandas() method top base table #471)
  • Removes redundant load_table calls
  • Also sets default pyspark driver memory based on available memory (resolves Set default PySpark driver memory configurations #477 )

output_dataframe : ps.DataFrame
The output spark DataFrame of a specified timestep and start
and end datetimes.
update_variable_table : bool
Copy link
Contributor

@mgdenno mgdenno Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these boolean args have default values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these classes BenchmarkForecast and SignatureTimeseries can go away to be honest. I think I added them to define the type of class that gets returned, but it seems like the method.generate() call can happen directly in the Generator class to reduce complexity.

I have the defaults of the boolean set in the method arguments.

The input timeseries model. The defines a unique timeseries
that will be queried from the Evaluation and used as the
input_dataframe. Defaults to None.
start_datetime : Union[str, datetime]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a note be added to the datetime and timedelta doc strings indicating what string formats are supported. Or, I think something like "Any string that can be parsed by XYZ library" would be fine if appropriate (I did not look to see how it is parsed).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

validated_df = tbl._validate(df=self.df)
tbl._write_spark_df(validated_df, write_mode=write_mode)

def to_pandas(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoulkd this have to_sdf() and maybe to_geopandas() methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess can .df.show() but that is not consistent with other classes...IDK

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add to_sdf(). Maybe we can implement to_geopandas() for timeseries in a separate issue?



class Normals(SignatureGeneratorBaseModel, GeneratorABC):
"""Model for generating synthetic normals timeseries."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to put a doc string here so users know what fields are available to be set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because the way this is inherits other classes the code editor is not aware of what the properties are...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. I can see the doc string when hovering in vs code but it seems really sensitive to my cursor location

configuration_name: Optional[str] = Field(default=None)
variable_name: Optional[str] = Field(default=None)
unit_name: Optional[str] = Field(default=None)
table_name: TimeseriesTableNamesEnum = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like the table should be at the top or bottom of the list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to top :)

@samlamont
Copy link
Collaborator Author

@mgdenno I took another crack at the filter issue we talked about. I created a TableFilter(), which takes a table and filter object. This is essentially just consolidates the two arguments.

Then I created a filter() method on the Evaluation class which accepts a TableFilter class or table_name, filter arguments. This is just another way to filter a table (ex.,ev.primary_timeseries.filter()) but directly from the evaluation, and includes the filter validation.

I think this is closer to what I was originally going for, but let me know what you think. Happy to discuss

@samlamont samlamont marked this pull request as draft August 25, 2025 20:16
@samlamont samlamont marked this pull request as ready for review August 25, 2025 20:16
@samlamont samlamont marked this pull request as draft August 26, 2025 00:21
@samlamont samlamont marked this pull request as ready for review August 26, 2025 00:21
@samlamont samlamont marked this pull request as draft August 27, 2025 15:31
@samlamont samlamont marked this pull request as ready for review August 27, 2025 15:31
@samlamont samlamont merged commit 29b8682 into main Aug 27, 2025
5 checks passed
@samlamont samlamont deleted the 420-add-skill-score-for-ensemble-metrics branch August 27, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants