Skip to content

Commit 19e8fb8

Browse files
committed
Merge branch 'main' into daydst2
2 parents 89e7527 + e0d6051 commit 19e8fb8

File tree

94 files changed

+399
-343
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+399
-343
lines changed

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -540,7 +540,7 @@ xfail during the testing phase. To do so, use the ``request`` fixture:
540540
541541
def test_xfail(request):
542542
mark = pytest.mark.xfail(raises=TypeError, reason="Indicate why here")
543-
request.node.add_marker(mark)
543+
request.applymarker(mark)
544544
545545
xfail is not to be used for tests involving failure due to invalid user arguments.
546546
For these tests, we need to verify the correct exception type and error message

doc/source/user_guide/copy_on_write.rst

Lines changed: 75 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ Copy-on-Write (CoW)
77
*******************
88

99
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
10-
optimizations that become possible through CoW are implemented and supported. A complete list
11-
can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
10+
optimizations that become possible through CoW are implemented and supported. All possible
11+
optimizations are supported starting from pandas 2.1.
1212

1313
We expect that CoW will be enabled by default in version 3.0.
1414

@@ -154,66 +154,86 @@ With copy on write this can be done by using ``loc``.
154154
155155
df.loc[df["bar"] > 5, "foo"] = 100
156156
157+
Read-only NumPy arrays
158+
----------------------
159+
160+
Accessing the underlying NumPy array of a DataFrame will return a read-only array if the array
161+
shares data with the initial DataFrame:
162+
163+
The array is a copy if the initial DataFrame consists of more than one array:
164+
165+
166+
.. ipython:: python
167+
168+
df = pd.DataFrame({"a": [1, 2], "b": [1.5, 2.5]})
169+
df.to_numpy()
170+
171+
The array shares data with the DataFrame if the DataFrame consists of only one NumPy array:
172+
173+
.. ipython:: python
174+
175+
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
176+
df.to_numpy()
177+
178+
This array is read-only, which means that it can't be modified inplace:
179+
180+
.. ipython:: python
181+
:okexcept:
182+
183+
arr = df.to_numpy()
184+
arr[0, 0] = 100
185+
186+
The same holds true for a Series, since a Series always consists of a single array.
187+
188+
There are two potential solution to this:
189+
190+
- Trigger a copy manually if you want to avoid updating DataFrames that share memory with your array.
191+
- Make the array writeable. This is a more performant solution but circumvents Copy-on-Write rules, so
192+
it should be used with caution.
193+
194+
.. ipython:: python
195+
196+
arr = df.to_numpy()
197+
arr.flags.writeable = True
198+
arr[0, 0] = 100
199+
arr
200+
201+
Patterns to avoid
202+
-----------------
203+
204+
No defensive copy will be performed if two objects share the same data while
205+
you are modifying one object inplace.
206+
207+
.. ipython:: python
208+
209+
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
210+
df2 = df.reset_index()
211+
df2.iloc[0, 0] = 100
212+
213+
This creates two objects that share data and thus the setitem operation will trigger a
214+
copy. This is not necessary if the initial object ``df`` isn't needed anymore.
215+
Simply reassigning to the same variable will invalidate the reference that is
216+
held by the object.
217+
218+
.. ipython:: python
219+
220+
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
221+
df = df.reset_index()
222+
df.iloc[0, 0] = 100
223+
224+
No copy is necessary in this example.
225+
Creating multiple references keeps unnecessary references alive
226+
and thus will hurt performance with Copy-on-Write.
227+
157228
.. _copy_on_write.optimizations:
158229

159230
Copy-on-Write optimizations
160231
---------------------------
161232

162233
A new lazy copy mechanism that defers the copy until the object in question is modified
163234
and only if this object shares data with another object. This mechanism was added to
164-
following methods:
165-
166-
- :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
167-
- :meth:`DataFrame.set_index`
168-
- :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
169-
- :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
170-
- :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
171-
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
172-
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
173-
- :meth:`DataFrame.assign`
174-
- :meth:`DataFrame.drop`
175-
- :meth:`DataFrame.dropna` / :meth:`Series.dropna`
176-
- :meth:`DataFrame.select_dtypes`
177-
- :meth:`DataFrame.align` / :meth:`Series.align`
178-
- :meth:`Series.to_frame`
179-
- :meth:`DataFrame.rename` / :meth:`Series.rename`
180-
- :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
181-
- :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
182-
- :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
183-
- :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
184-
- :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
185-
- :meth:`DataFrame.between_time` / :meth:`Series.between_time`
186-
- :meth:`DataFrame.filter` / :meth:`Series.filter`
187-
- :meth:`DataFrame.head` / :meth:`Series.head`
188-
- :meth:`DataFrame.tail` / :meth:`Series.tail`
189-
- :meth:`DataFrame.isetitem`
190-
- :meth:`DataFrame.pipe` / :meth:`Series.pipe`
191-
- :meth:`DataFrame.pop` / :meth:`Series.pop`
192-
- :meth:`DataFrame.replace` / :meth:`Series.replace`
193-
- :meth:`DataFrame.shift` / :meth:`Series.shift`
194-
- :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
195-
- :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
196-
- :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
197-
- :meth:`DataFrame.swapaxes`
198-
- :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
199-
- :meth:`DataFrame.take` / :meth:`Series.take`
200-
- :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
201-
- :meth:`DataFrame.to_period` / :meth:`Series.to_period`
202-
- :meth:`DataFrame.truncate`
203-
- :meth:`DataFrame.iterrows`
204-
- :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
205-
- :meth:`DataFrame.fillna` / :meth:`Series.fillna`
206-
- :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
207-
- :meth:`DataFrame.ffill` / :meth:`Series.ffill`
208-
- :meth:`DataFrame.bfill` / :meth:`Series.bfill`
209-
- :meth:`DataFrame.where` / :meth:`Series.where`
210-
- :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
211-
- :meth:`DataFrame.astype` / :meth:`Series.astype`
212-
- :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
213-
- :meth:`DataFrame.join`
214-
- :meth:`DataFrame.eval`
215-
- :func:`concat`
216-
- :func:`merge`
235+
methods that don't require a copy of the underlying data. Popular examples are :meth:`DataFrame.drop` for ``axis=1``
236+
and :meth:`DataFrame.rename`.
217237

218238
These methods return views when Copy-on-Write is enabled, which provides a significant
219239
performance improvement compared to the regular execution.

doc/source/user_guide/timeseries.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ of those specified will not be generated:
461461

462462
.. ipython:: python
463463
464-
pd.date_range(start, end, freq="BM")
464+
pd.date_range(start, end, freq="BME")
465465
466466
pd.date_range(start, end, freq="W")
467467
@@ -557,7 +557,7 @@ intelligent functionality like selection, slicing, etc.
557557

558558
.. ipython:: python
559559
560-
rng = pd.date_range(start, end, freq="BM")
560+
rng = pd.date_range(start, end, freq="BME")
561561
ts = pd.Series(np.random.randn(len(rng)), index=rng)
562562
ts.index
563563
ts[:5].index
@@ -884,9 +884,9 @@ into ``freq`` keyword arguments. The available date offsets and associated frequ
884884
:class:`~pandas.tseries.offsets.LastWeekOfMonth`, ``'LWOM'``, "the x-th day of the last week of each month"
885885
:class:`~pandas.tseries.offsets.MonthEnd`, ``'ME'``, "calendar month end"
886886
:class:`~pandas.tseries.offsets.MonthBegin`, ``'MS'``, "calendar month begin"
887-
:class:`~pandas.tseries.offsets.BMonthEnd` or :class:`~pandas.tseries.offsets.BusinessMonthEnd`, ``'BM'``, "business month end"
887+
:class:`~pandas.tseries.offsets.BMonthEnd` or :class:`~pandas.tseries.offsets.BusinessMonthEnd`, ``'BME'``, "business month end"
888888
:class:`~pandas.tseries.offsets.BMonthBegin` or :class:`~pandas.tseries.offsets.BusinessMonthBegin`, ``'BMS'``, "business month begin"
889-
:class:`~pandas.tseries.offsets.CBMonthEnd` or :class:`~pandas.tseries.offsets.CustomBusinessMonthEnd`, ``'CBM'``, "custom business month end"
889+
:class:`~pandas.tseries.offsets.CBMonthEnd` or :class:`~pandas.tseries.offsets.CustomBusinessMonthEnd`, ``'CBME'``, "custom business month end"
890890
:class:`~pandas.tseries.offsets.CBMonthBegin` or :class:`~pandas.tseries.offsets.CustomBusinessMonthBegin`, ``'CBMS'``, "custom business month begin"
891891
:class:`~pandas.tseries.offsets.SemiMonthEnd`, ``'SM'``, "15th (or other day_of_month) and calendar month end"
892892
:class:`~pandas.tseries.offsets.SemiMonthBegin`, ``'SMS'``, "15th (or other day_of_month) and calendar month begin"
@@ -1248,8 +1248,8 @@ frequencies. We will refer to these aliases as *offset aliases*.
12481248
"W", "weekly frequency"
12491249
"ME", "month end frequency"
12501250
"SM", "semi-month end frequency (15th and end of month)"
1251-
"BM", "business month end frequency"
1252-
"CBM", "custom business month end frequency"
1251+
"BME", "business month end frequency"
1252+
"CBME", "custom business month end frequency"
12531253
"MS", "month start frequency"
12541254
"SMS", "semi-month start frequency (1st and 15th)"
12551255
"BMS", "business month start frequency"
@@ -1586,7 +1586,7 @@ rather than changing the alignment of the data and the index:
15861586
15871587
ts.shift(5, freq="D")
15881588
ts.shift(5, freq=pd.offsets.BDay())
1589-
ts.shift(5, freq="BM")
1589+
ts.shift(5, freq="BME")
15901590
15911591
Note that with when ``freq`` is specified, the leading entry is no longer NaN
15921592
because the data is not being realigned.
@@ -1692,7 +1692,7 @@ the end of the interval.
16921692
.. warning::
16931693

16941694
The default values for ``label`` and ``closed`` is '**left**' for all
1695-
frequency offsets except for 'ME', 'Y', 'Q', 'BM', 'BY', 'BQ', and 'W'
1695+
frequency offsets except for 'ME', 'Y', 'Q', 'BME', 'BY', 'BQ', and 'W'
16961696
which all have a default of 'right'.
16971697

16981698
This might unintendedly lead to looking ahead, where the value for a later

doc/source/whatsnew/v2.2.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,11 @@ Other Deprecations
253253
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.no_silent_downcasting", True)`` (:issue:`53656`)
254254
- Deprecated including the groups in computations when using :meth:`DataFrameGroupBy.apply` and :meth:`DataFrameGroupBy.resample`; pass ``include_groups=False`` to exclude the groups (:issue:`7155`)
255255
- Deprecated not passing a tuple to :class:`DataFrameGroupBy.get_group` or :class:`SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
256-
- Deprecated string ``A`` denoting frequency in :class:`YearEnd` and strings ``A-DEC``, ``A-JAN``, etc. denoting annual frequencies with various fiscal year ends (:issue:`52536`)
256+
- Deprecated string ``AS`` denoting frequency in :class:`YearBegin` and strings ``AS-DEC``, ``AS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
257+
- Deprecated string ``A`` denoting frequency in :class:`YearEnd` and strings ``A-DEC``, ``A-JAN``, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
258+
- Deprecated string ``BAS`` denoting frequency in :class:`BYearBegin` and strings ``BAS-DEC``, ``BAS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
259+
- Deprecated string ``BA`` denoting frequency in :class:`BYearEnd` and strings ``BA-DEC``, ``BA-JAN``, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
260+
- Deprecated strings ``BM``, and ``CBM`` denoting frequencies in :class:`BusinessMonthEnd`, :class:`CustomBusinessMonthEnd` (:issue:`52064`)
257261
- Deprecated strings ``H``, ``BH``, and ``CBH`` denoting frequencies in :class:`Hour`, :class:`BusinessHour`, :class:`CustomBusinessHour` (:issue:`52536`)
258262
- Deprecated strings ``H``, ``S``, ``U``, and ``N`` denoting units in :func:`to_timedelta` (:issue:`52536`)
259263
- Deprecated strings ``H``, ``T``, ``S``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`52536`)

pandas/_libs/tslibs/dtypes.pyx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ cdef dict _abbrev_to_attrnames = {v: k for k, v in attrname_to_abbrevs.items()}
188188
OFFSET_TO_PERIOD_FREQSTR: dict = {
189189
"WEEKDAY": "D",
190190
"EOM": "M",
191-
"BM": "M",
191+
"BME": "M",
192192
"BQS": "Q",
193193
"QS": "Q",
194194
"BQ": "Q",
@@ -280,6 +280,8 @@ DEPR_ABBREVS: dict[str, str]= {
280280
"BAS-SEP": "BYS-SEP",
281281
"BAS-OCT": "BYS-OCT",
282282
"BAS-NOV": "BYS-NOV",
283+
"BM": "BME",
284+
"CBM": "CBME",
283285
"H": "h",
284286
"BH": "bh",
285287
"CBH": "cbh",

pandas/_libs/tslibs/offsets.pyx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2965,7 +2965,7 @@ cdef class BusinessMonthEnd(MonthOffset):
29652965
>>> pd.offsets.BMonthEnd().rollforward(ts)
29662966
Timestamp('2022-11-30 00:00:00')
29672967
"""
2968-
_prefix = "BM"
2968+
_prefix = "BME"
29692969
_day_opt = "business_end"
29702970

29712971

@@ -4495,10 +4495,10 @@ cdef class CustomBusinessMonthEnd(_CustomBusinessMonth):
44954495
>>> freq = pd.offsets.CustomBusinessMonthEnd(calendar=bdc)
44964496
>>> pd.date_range(dt.datetime(2022, 7, 10), dt.datetime(2022, 11, 10), freq=freq)
44974497
DatetimeIndex(['2022-07-29', '2022-08-31', '2022-09-29', '2022-10-28'],
4498-
dtype='datetime64[ns]', freq='CBM')
4498+
dtype='datetime64[ns]', freq='CBME')
44994499
"""
45004500

4501-
_prefix = "CBM"
4501+
_prefix = "CBME"
45024502

45034503

45044504
cdef class CustomBusinessMonthBegin(_CustomBusinessMonth):
@@ -4581,12 +4581,12 @@ prefix_mapping = {
45814581
BYearEnd, # 'BY'
45824582
BusinessDay, # 'B'
45834583
BusinessMonthBegin, # 'BMS'
4584-
BusinessMonthEnd, # 'BM'
4584+
BusinessMonthEnd, # 'BME'
45854585
BQuarterEnd, # 'BQ'
45864586
BQuarterBegin, # 'BQS'
45874587
BusinessHour, # 'bh'
45884588
CustomBusinessDay, # 'C'
4589-
CustomBusinessMonthEnd, # 'CBM'
4589+
CustomBusinessMonthEnd, # 'CBME'
45904590
CustomBusinessMonthBegin, # 'CBMS'
45914591
CustomBusinessHour, # 'cbh'
45924592
MonthEnd, # 'ME'

pandas/core/arrays/arrow/array.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1627,6 +1627,15 @@ def _reduce(
16271627
------
16281628
TypeError : subclass does not define reductions
16291629
"""
1630+
result = self._reduce_calc(name, skipna=skipna, keepdims=keepdims, **kwargs)
1631+
if isinstance(result, pa.Array):
1632+
return type(self)(result)
1633+
else:
1634+
return result
1635+
1636+
def _reduce_calc(
1637+
self, name: str, *, skipna: bool = True, keepdims: bool = False, **kwargs
1638+
):
16301639
pa_result = self._reduce_pyarrow(name, skipna=skipna, **kwargs)
16311640

16321641
if keepdims:
@@ -1637,7 +1646,7 @@ def _reduce(
16371646
[pa_result],
16381647
type=to_pyarrow_type(infer_dtype_from_scalar(pa_result)[0]),
16391648
)
1640-
return type(self)(result)
1649+
return result
16411650

16421651
if pc.is_null(pa_result).as_py():
16431652
return self.dtype.na_value

pandas/core/arrays/string_arrow.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,17 @@ def _str_find(self, sub: str, start: int = 0, end: int | None = None):
502502
def _convert_int_dtype(self, result):
503503
return Int64Dtype().__from_arrow__(result)
504504

505+
def _reduce(
506+
self, name: str, *, skipna: bool = True, keepdims: bool = False, **kwargs
507+
):
508+
result = self._reduce_calc(name, skipna=skipna, keepdims=keepdims, **kwargs)
509+
if name in ("argmin", "argmax") and isinstance(result, pa.Array):
510+
return self._convert_int_dtype(result)
511+
elif isinstance(result, pa.Array):
512+
return type(self)(result)
513+
else:
514+
return result
515+
505516
def _rank(
506517
self,
507518
*,

pandas/core/generic.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9188,11 +9188,11 @@ def resample(
91889188
Use frame.T.resample(...) instead.
91899189
closed : {{'right', 'left'}}, default None
91909190
Which side of bin interval is closed. The default is 'left'
9191-
for all frequency offsets except for 'ME', 'Y', 'Q', 'BM',
9191+
for all frequency offsets except for 'ME', 'Y', 'Q', 'BME',
91929192
'BA', 'BQ', and 'W' which all have a default of 'right'.
91939193
label : {{'right', 'left'}}, default None
91949194
Which bin edge label to label bucket with. The default is 'left'
9195-
for all frequency offsets except for 'ME', 'Y', 'Q', 'BM',
9195+
for all frequency offsets except for 'ME', 'Y', 'Q', 'BME',
91969196
'BA', 'BQ', and 'W' which all have a default of 'right'.
91979197
convention : {{'start', 'end', 's', 'e'}}, default 'start'
91989198
For `PeriodIndex` only, controls whether to use the start or

pandas/core/resample.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2114,7 +2114,7 @@ def __init__(
21142114
else:
21152115
freq = to_offset(freq)
21162116

2117-
end_types = {"ME", "Y", "Q", "BM", "BY", "BQ", "W"}
2117+
end_types = {"ME", "Y", "Q", "BME", "BY", "BQ", "W"}
21182118
rule = freq.rule_code
21192119
if rule in end_types or ("-" in rule and rule[: rule.find("-")] in end_types):
21202120
if closed is None:
@@ -2310,7 +2310,7 @@ def _adjust_bin_edges(
23102310
) -> tuple[DatetimeIndex, npt.NDArray[np.int64]]:
23112311
# Some hacks for > daily data, see #1471, #1458, #1483
23122312

2313-
if self.freq.name in ("BM", "ME", "W") or self.freq.name.split("-")[0] in (
2313+
if self.freq.name in ("BME", "ME", "W") or self.freq.name.split("-")[0] in (
23142314
"BQ",
23152315
"BY",
23162316
"Q",

0 commit comments

Comments
 (0)