-
Notifications
You must be signed in to change notification settings - Fork 21
Description
I have been facing some issues when trying to use the hazard_series_from_dataset of the rf_glofas module due to some changes in the datasets' coordinates names that occurred with the recent update of GloFAS. In particular, when trying to generate a hazard series from a reanalysis dataset (using system_version="version_3_1"), the code fails due to a non-existing "time" coordinate:
from climada_petals.hazard.rf_glofas import RiverFloodInundation
from climada_petals.hazard.rf_glofas import hazard_series_from_dataset
rf = RiverFloodInundation()
rf.download_reanalysis(
countries="Kenya",
year=2024,
num_proc=8,
system_version="version_3_1"
format="netcdf"
)
ds_flood = rf.compute()
haz_type = "RF"
haz_rea = hazard_series_from_dataset(
ds_flood, intensity="flood_depth", event_dim="valid_time"
)
resulting in: KeyError: "No variable named 'time'. Variables on the dataset include ['surface', 'step', 'flood_depth', 'flood_depth_flopros', 'event', 'valid_time', 'lat_lon', 'latitude', 'longitude']"
For reference, the extracted dataset ds_flood looks as follow:
<xarray.Dataset> Size: 2GB
Dimensions: (valid_time: 182, longitude: 948, latitude: 1152,
time: 1)
Coordinates:
* valid_time (valid_time) datetime64[ns] 1kB 2024-01-02 ... 2024-...
surface float64 8B ...
step timedelta64[ns] 8B ...
* longitude (longitude) float64 8kB 33.95 33.96 ... 41.84 41.85
* latitude (latitude) float64 9kB 4.946 4.938 ... -4.637 -4.646
Dimensions without coordinates: time
Data variables:
flood_depth (latitude, longitude, time, valid_time) float32 795MB dask.array<chunksize=(246, 270, 1, 182), meta=np.ndarray>
flood_depth_flopros (latitude, longitude, time, valid_time) float32 795MB dask.array<chunksize=(246, 270, 1, 182), meta=np.ndarray>
Similarily, when downloading forecasts:
rf = RiverFloodInundation()
rf.download_forecast(
countries="Kenya",
forecast_date="30-03-2024",
lead_time_days=10,
preprocess= lambda x: x.mean(dim="number"),
num_proc=8,
system_version="version_3_1"
format="netcdf"
)
ds_flood = rf.compute()
haz_type = "RF"
haz_fcst = hazard_series_from_dataset(
ds_flood, intensity="flood_depth", event_dim="forecast_period"
)
results in KeyError: "No variable named 'time'. Variables on the dataset include ['forecast_reference_time', 'surface', 'valid_time', 'step', 'flood_depth', ..., 'event', 'forecast_period', 'lat_lon', 'latitude', 'longitude']".
The extracted ds_flood looks as follows:
<xarray.Dataset> Size: 87MB
Dimensions: (forecast_period: 10, longitude: 948,
latitude: 1152, time: 1)
Coordinates:
* forecast_period (forecast_period) timedelta64[ns] 80B 1 days ......
forecast_reference_time datetime64[ns] 8B ...
surface float64 8B ...
valid_time (forecast_period) datetime64[ns] 80B dask.array<chunksize=(10,), meta=np.ndarray>
step timedelta64[ns] 8B ...
* longitude (longitude) float64 8kB 33.95 33.96 ... 41.84 41.85
* latitude (latitude) float64 9kB 4.946 4.938 ... -4.646
Dimensions without coordinates: time
Data variables:
flood_depth (latitude, longitude, time, forecast_period) float32 44MB dask.array<chunksize=(246, 270, 1, 10), meta=np.ndarray>
flood_depth_flopros (latitude, longitude, time, forecast_period) float32 44MB dask.array<chunksize=(246, 270, 1, 10), meta=np.ndarray>
It is possible to bypass the issue by changing the hard-coded data_vars=dict(date="time") to data_vars=dict(date="valid_time") in the hazard_series_from_dataset function.