|
| 1 | +# Changelog |
| 2 | + |
| 3 | +## Version 1.5.1beta5 |
| 4 | + |
| 5 | +### MAJOR REVISION OF CODE |
| 6 | + |
| 7 | +Did a major fix to ```comute_rpss``` to work for both scalar and non-scalar values. |
| 8 | + |
| 9 | +``` python |
| 10 | + |
| 11 | + def compute_rpss(self, threshold, dim=None): |
| 12 | + """ |
| 13 | + Compute the Ranked Probability Skill Score (RPSS) for a given threshold. |
| 14 | + |
| 15 | + Args: |
| 16 | + threshold (float): The threshold value for binary classification. |
| 17 | + dim (str, list, or None): The dimension(s) along which to compute the RPSS. |
| 18 | + If None, compute the RPSS over the entire data. |
| 19 | + |
| 20 | + Returns: |
| 21 | + xarray.DataArray: The computed RPSS values. |
| 22 | + """ |
| 23 | + # Convert data to binary based on the threshold |
| 24 | + obs_binary = (self.obs_data >= threshold).astype(int) |
| 25 | + model_binary = (self.model_data >= threshold).astype(int) |
| 26 | + |
| 27 | + # Calculate the RPS for the model data |
| 28 | + rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim) |
| 29 | + |
| 30 | + # Calculate the RPS for the climatology (base rate) |
| 31 | + base_rate = obs_binary.mean(dim=dim) |
| 32 | + rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim) |
| 33 | + rps_climo = rps_climo + base_rate * (1 - base_rate) |
| 34 | + |
| 35 | + # Calculate the RPSS |
| 36 | + rpss = 1 - rps_model / rps_climo |
| 37 | + |
| 38 | + return rpss |
| 39 | + |
| 40 | +``` |
| 41 | + |
| 42 | +The updated `compute_rpss` method will work correctly for both scalar and non-scalar `base_rate` values. |
| 43 | + |
| 44 | +In the context of xarray and dimensions/coordinates in a dataset, a scalar value refers to a single value that does not depend on any dimensions. It is a 0-dimensional value. On the other hand, a non-scalar value is an array or a DataArray that depends on one or more dimensions and has corresponding coordinates. |
| 45 | + |
| 46 | +Let's consider an example to illustrate the difference: |
| 47 | + |
| 48 | +Suppose we have a dataset with dimensions "time", "lat", and "lon". The dataset contains a variable "temperature" with corresponding coordinates for each dimension. |
| 49 | + |
| 50 | +- Scalar value: If we calculate the mean temperature over all dimensions using `temperature.mean()`, the resulting value will be a scalar. It will be a single value that does not depend on any dimensions. |
| 51 | + |
| 52 | +- Non-scalar value: If we calculate the mean temperature over a specific dimension, such as `temperature.mean(dim="time")`, the resulting value will be a non-scalar DataArray. It will have dimensions "lat" and "lon" and corresponding coordinates, but it will not depend on the "time" dimension anymore. |
| 53 | + |
| 54 | +In the updated `compute_rpss` method, the line `base_rate = obs_binary.mean(dim=dim)` calculates the mean of `obs_binary` over the specified dimensions `dim`. If `dim` is None, it will calculate the mean over all dimensions, resulting in a scalar value. If `dim` is a specific dimension or a list of dimensions, it will calculate the mean over those dimensions, resulting in a non-scalar DataArray. |
| 55 | + |
| 56 | +The subsequent lines of code in the `compute_rpss` method handle both cases correctly: |
| 57 | + |
| 58 | +```python |
| 59 | +rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim) |
| 60 | +rps_climo = rps_climo + base_rate * (1 - base_rate) |
| 61 | +``` |
| 62 | + |
| 63 | +If `base_rate` is a scalar value, it will be broadcasted to match the shape of `rps_climo`, and the calculation will be performed element-wise. If `base_rate` is a non-scalar DataArray, it will be aligned with `rps_climo` based on the common dimensions, and the calculation will be performed element-wise. |
| 64 | + |
| 65 | +Now, whether this will work with data of different coordinates??? The updated `compute_rpss` method should work correctly as long as the dimensions and coordinates of `obs_binary` and `model_binary` are compatible. The method relies on xarray's broadcasting and alignment rules to handle data with different coordinates. |
| 66 | + |
| 67 | +However, it's important to note that if the coordinates of `obs_binary` and `model_binary` are completely different or incompatible, you may encounter issues with dimension alignment or broadcasting. In such cases, you would need to ensure that the coordinates are properly aligned or resampled before applying the `compute_rpss` method. |
| 68 | + |
| 69 | +In summary, the updated `compute_rpss` method should work correctly for both scalar and non-scalar `base_rate` values, and it should handle data with different coordinates as long as the dimensions and coordinates are compatible between `obs_binary` and `model_binary`. |
| 70 | + |
| 71 | +### Bug Fixes |
| 72 | + |
| 73 | +- Fixed minor bugs and improved code stability. |
| 74 | + |
| 75 | +### Other Changes |
| 76 | + |
| 77 | +- The package has been moved from the 3-Alpha stage to the 4-Beta stage in development, indicating that it has undergone further testing and refinement. |
| 78 | + |
| 79 | +Please note that this is a beta release (version 1.5.1beta5), and while it includes significant enhancements and bug fixes, it may still have some known limitations or issues. We encourage users to provide feedback and report any bugs they encounter. |
| 80 | + |
| 81 | +We appreciate your interest in the NWPeval package and thank you for your support! |
0 commit comments