Skip to content

v0.5.6 (23th December 2025)

Latest

Choose a tag to compare

@kim-mskw kim-mskw released this 23 Dec 13:07
· 45 commits to main since this release
b3751a0

Bug Fixes:

  • Changed action clamping: The action clamping was changed to extreme values defined by dicts. Instead of using the min and max of a forward pass in the NN, the clamping is now based on the activation function of the actor network. Previously, the output range was incorrectly assumed based only on the input, which failed when weights were negative due to Xavier initialization.
  • Adjusted reward scaling: Reward scaling now considers current available power instead of the unit’s max_power, reducing reward distortion when availability limits capacity. Available power is now derived from offered_order_volume instead of unit.calculate_min_max_power. Because dispatch is set before reward calculation, the previous method left available power at 0 whenever the unit was dispatched.
  • Update pytest dependency: Tests now run with Pytest 9
  • Add new docs feature: dependencies to build docs can now be installed with pip install -e .[docs]
  • Fix tests on Windows: One test was always failing on Windows, which is fixed so that all tests succeed on all archs

Improvements:

  • Application of new naming convention for bidding strategies: [unit][market][method]_[comment] for bidding strategy keys (in snake_case) and [Unit][Market][Method][Comment]Strategy for bidding strategy classes (in PascalCase for classes)

  • Changed SoC Definition: The state of charge (SoC) for storage units is now defined to take values between 0 and 1, instead of absolute energy content (MWh). This change ensures consistency with other models and standard definition. The absolute energy content can still be calculated by multiplying SoC with the unit's capacity. The previous 'max_soc' is renamed to 'capacity'. 'max_soc' and 'min_soc' can still be used to model allowed SoC ranges, but are now defined between 0 and 1 as well.

  • Restructured learning_role tasks: Major learning changes that make learning application more generalizable across the framework.

    • Simplified learning data flow: Removed the special learning_unit_operator that previously aggregated unit data and forwarded it to the learning role. Eliminates the single-sender dependency and avoids double bookkeeping across units and operators.
    • Direct write access: All learning-capable entities (units, unit operators, market agents) now write learning data directly to the learning role.
    • Centralized logic: Learning-related functionality is now almost always contained within the learning role, improving maintainability.
    • Automatic calculation of obs_dim: The observation dimension is now automatically calculated based on the definition of the foresight, num_timeseries_obs_dim and unique_obs_dim in the learning configuration. This avoids inconsistencies between the defined observation space and the actual observation dimension used in the actor network. However, if assumes the rational that 'self.obs_dim = num_timeseries_obs_dim * foresight + unique_obs_dim', if this is not the case the calculation of obs_dim needs to be adjusted in the learning strategy.
    • Note: Distributed learning across multiple machines is no longer supported, but this feature was not in active use.
  • Restructured learning configuration: All learning-related configuration parameters are now contained within a single learning_config dictionary in the config.yaml file. This change simplifies configuration management and avoids ambiguous setting of defaults.

  • Note: learning_mode is moved from the top-level config to learning_config. Existing config files need to be updated accordingly.

  • Learning_role in all cases involving DRL: The learning_role is now available in all simulations involving DRL, also if pre-trained strategies are loaded and no policy updates are performed. This change ensures consistent handling of learning configurations and simplifies the codebase by removing special cases.

  • Final DRL simulation with last policies: After training, the final simulation now uses the last trained policies instead of the best policies. This change provides a more accurate representation of the learned behavior, as the last policies reflect the most recent training state. Additionally, multi-agent simulations do not always converge to the maximum reward. E.g. competing agents may underbid each other to gain market share, leading to lower overall rewards while reaching a stable state nevertheless.

New Features:

  • Unit Operator Portfolio Strategy: A new bidding strategy type that enables portfolio optimization, where the default is called UnitsOperatorEnergyNaiveDirectStrategy. This strategy simply passes through bidding decisions of individual units within a portfolio, which was the default behavior beforehand as well. Further we added 'UnitsOperatorEnergyHeuristicCournotStrategy' which allows to model bidding behavior of a portfolio of units in a day-ahead market. The strategy calculates the optimal bid price and quantity for each unit in the portfolio, taking into account markup and the production costs of the units. This enables users to simulate and analyze the impact of strategic portfolio bidding on market outcomes and unit profitability.
  • Nodal Market Clearing Algorithm: A new market clearing algorithm that performs electricity market clearing using an optimal power flow (OPF) approach, considering grid constraints and nodal pricing. This algorithm utilizes PyPSA to solve the OPF problem, allowing for a physics based representation of network constraints.