Skip to content

Commit c3e653b

Browse files
Issue #1768 pandas3 support (#1771)
Fixes #1768 # Description Fixes broken pieces of the code when updating to pandas 3, which [has the following major changes](https://pandas.pydata.org/community/blog/pandas-3.0.html) which also affect iMOD Python somewhat: - _Dedicated string data type by default: string columns are now inferred as the new str dtype instead of object, providing better performance and type safety_ - _Consistent copy/view behaviour with Copy-on-Write (CoW) (a.k.a. getting rid of the SettingWithCopyWarning): more predictable and consistent behavior for all operations, with improved performance through avoiding unnecessary copies_ - _New default resolution for datetime-like data: no longer defaulting to nanoseconds, but generally microseconds (or the resolution of the input), when constructing datetime or timedelta data (avoiding out-of-bounds errors for dates with a year before 1678 or after 2262)_ Because of these changes, some parts of the code had to be modified slightly to get the tests to work. I think with the latter change we could simplify the code base a lot, as all logic to deal with datetimes going beyond the year 2262 and prior to 1678 wouldn't be necessary anymore. This was quite a headache in the past, and I think the choice for microseconds by default will make our lives significantly easier. [I created an issue for this](#1773). In detail this PR alters the following: - Pin pandas version to 3.* in pixi.toml - Update setup and asserts in quite some tests to work with pandas 3.0. Most of these tests will not work with pandas 2.0 anymore, as we now assert for pandas 3.0 behavior - Update examples to work with pandas 3.0 - Fix statements where a view was altered (e.g. altering ``.values`` attribute), this is now forbidden by pandas - Update to new string handling, sometimes I had to internally enforce object dtype to get the old behavior. In other cases we could check for strings by checking ``dtype=="object"`` anymore. Luckily there is ``pd.api.types.is_string_dtype``, which both works for pandas 2 as well as pandas 3. - Add unittest for ``to_pandas_datetime_series`` to check whether pandas behavior is similar in this regard. # Checklist <!--- Before requesting review, please go through this checklist: --> - [x] Links to correct issue - [x] Update changelog, if changes affect users - [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737`` - [x] Unit tests were added - [ ] **If feature added**: Added/extended example - [ ] **If feature added**: Added feature to API documentation - [ ] **If pixi.lock was changed**: Ran `pixi run generate-sbom` and committed changes
1 parent eb64237 commit c3e653b

28 files changed

+299
-236
lines changed

docs/api/changelog.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ Fixed
4444
- Fixed bug where :func:`imod.evaluate.convert_pointwaterhead_freshwaterhead`
4545
produced incorrect results when point water heads were below elevation levels
4646
for unstructured grids.
47+
- Support pandas 3.0.
48+
4749

4850
Changed
4951
~~~~~~~

examples/imod-wq/Henry-wq.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@
102102
)
103103
m["oc"] = imod.wq.OutputControl(save_head_idf=True, save_concentration_idf=True)
104104
m.create_time_discretization(
105-
additional_times=pd.date_range("2000-01-01", "2001-01-01", freq="M")
105+
additional_times=pd.date_range("2000-01-01", "2001-01-01", freq="ME")
106106
)
107107

108108
# %%

examples/mf6/circle_transport.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@
194194
# stress period is a year long. We can then use the "last" keyword in the output
195195
# control to save the output.
196196

197-
simtimes = pd.date_range(start="2000-01-01", end="2030-01-01", freq="As")
197+
simtimes = pd.date_range(start="2000-01-01", end="2030-01-01", freq="YS")
198198
simulation.create_time_discretization(additional_times=simtimes)
199199

200200
# %%

examples/mf6/hondsrug.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ def outer_edge(da):
336336
# resampled to a yearly step by using the xarray function
337337
# `Dataset.resample <http://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html#xarray.Dataset.resample>`_.
338338

339-
rch_trans_yr = rch_trans.resample(time="A", label="left").mean()
339+
rch_trans_yr = rch_trans.resample(time="YS", label="left").mean()
340340
rch_trans_yr
341341

342342
# %%

imod/evaluate/constraints.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,8 @@ def stability_constraint_advection(front, lower, right, top_bot, porosity=0.3, R
126126
dt = 1.0 / (1.0 / dt_x + 1.0 / dt_y + 1.0 / dt_z)
127127

128128
dt_xyz = xr.concat(
129-
(dt_x, dt_y, dt_z), dim=pd.Index(["x", "y", "z"], name="direction")
129+
(dt_x, dt_y, dt_z),
130+
dim=pd.Index(["x", "y", "z"], name="direction", dtype="object"),
130131
)
131132
return dt, dt_xyz
132133

@@ -254,9 +255,11 @@ def _get_stage_name(sid):
254255
if not drop_allnan or not dt.isnull().all():
255256
results.append(dt)
256257
resultids.append(comb)
257-
dt_all = xr.concat(
258-
results, pd.Index(resultids, name="combination"), coords="minimal"
259-
)
258+
# Set index to object dtype to work around xarray concat issue where
259+
# StringDtype could not be interpreted as a data type with pandas 3.0 (as
260+
# np.dtype is called.)
261+
id_index = pd.Index(resultids, name="combination", dtype="object")
262+
dt_all = xr.concat(results, id_index, coords="minimal")
260263

261264
# overall dt
262265
dt_min = dt_all.min(dim="combination")

imod/formats/gen/gen.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@ def read_binary(path: Union[str, Path]) -> "geopandas.GeoDataFrame": # type: ig
267267
else:
268268
df = pd.DataFrame()
269269
df["feature_type"] = feature_type
270-
df["feature_type"] = df["feature_type"].replace(GENTYPE_TO_NAME)
270+
df["feature_type"] = df["feature_type"].replace(GENTYPE_TO_NAME).astype(str)
271271

272272
geometry = []
273273
for ftype, geom in zip(feature_type, xy):

imod/formats/ipf.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,8 @@ def write_assoc(path, df, itype=1, nodata=1.0e20, assoc_columns=None):
458458
# The reason is that datetime columns are converted to string as well
459459
# and then quoted. This causes trouble with some iMOD(batch) functions.
460460
for column in df.columns:
461-
if df.loc[:, column].dtype == np.dtype("O"):
461+
# Test for strings compatible with pandas 2 and 3
462+
if pd.api.types.is_string_dtype(df[column].dtype):
462463
df.loc[:, column] = df.loc[:, column].astype(str)
463464
df.loc[:, column] = '"' + df.loc[:, column] + '"'
464465

@@ -509,7 +510,8 @@ def write(path, df, indexcolumn=0, assoc_ext="txt", nodata=1.0e20):
509510
# The reason is that datetime columns are converted to string as well
510511
# and then quoted. This causes trouble with some iMOD(batch) functions.
511512
for column in df.columns:
512-
if df.loc[:, column].dtype == np.dtype("O"):
513+
# Test for strings compatible with pandas 2 and 3
514+
if pd.api.types.is_string_dtype(df[column].dtype):
513515
df.loc[:, column] = df.loc[:, column].astype(str)
514516
df.loc[:, column] = '"' + df.loc[:, column] + '"'
515517

imod/mf6/multimodel/exchange_creator.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -290,10 +290,12 @@ def rearrange_connected_cells(self):
290290

291291
label_decreasing = df["cell_label1"] > df["cell_label2"]
292292

293-
colnames = ["cell_idx1", "cell_idx2", "cell_label1", "cell_label2"]
294-
colnames_reversed = ["cell_idx2", "cell_idx1", "cell_label2", "cell_label1"]
293+
if label_decreasing.any():
294+
colnames = ["cell_idx1", "cell_idx2", "cell_label1", "cell_label2"]
295+
colnames_reversed = ["cell_idx2", "cell_idx1", "cell_label2", "cell_label1"]
295296

296-
decreasing_connections = df.loc[label_decreasing, colnames].values
297-
df.loc[label_decreasing, colnames_reversed] = decreasing_connections
297+
df_decreasing = df.loc[label_decreasing, colnames]
298+
df_decreasing.columns = colnames_reversed
299+
df.loc[label_decreasing, colnames_reversed] = df_decreasing
298300

299301
self._connected_cells = df

imod/testing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ def assert_frame_equal(left: pd.DataFrame, right: pd.DataFrame, **kwargs):
1616
def always_int64(df):
1717
df = df.copy()
1818
for column, dtype in df.dtypes.items():
19-
if np.issubdtype(dtype, np.integer):
19+
if pd.api.types.is_integer_dtype(dtype):
2020
df[column] = df[column].astype(np.int64)
2121
return df
2222

imod/tests/test_evaluate/test_constraints.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,13 @@ def test_intra_cell_boundary_conditions(test_da1):
9494
riv1drn = test_da1 * (0.3 * 1.0) / min(100.0, 150) * (1.0 - 0.0)
9595
dt_min_ref = np.minimum(ghbdrn, riv1drn)
9696

97-
assert dt_min_ref.equals(dt_min)
98-
assert dt_all.equals(
99-
xr.concat(
100-
(ghbdrn, riv1drn), pd.Index(["ghb-drn", "riv_0-drn"], name="combination")
101-
)
97+
expected_index = pd.Index(
98+
["ghb-drn", "riv_0-drn"], name="combination", dtype="object"
10299
)
103100

101+
assert dt_min_ref.equals(dt_min)
102+
assert dt_all.equals(xr.concat((ghbdrn, riv1drn), expected_index))
103+
104104

105105
def test_intra_cell_boundary_conditions_thickness_zero(test_da1):
106106
top_bot = xr.Dataset({"top": test_da1 * -1.0, "bot": test_da1 * -1.0})

0 commit comments

Comments
 (0)