Skip to content

Commit d7e1ab9

Browse files
committed
Merge branch 'main' into bug-60228
2 parents 8821c15 + 179258f commit d7e1ab9

File tree

21 files changed

+402
-83
lines changed

21 files changed

+402
-83
lines changed

doc/source/user_guide/scale.rst

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -164,35 +164,35 @@ files. Each file in the directory represents a different year of the entire data
164164
.. ipython:: python
165165
:okwarning:
166166
167-
import pathlib
167+
import glob
168+
import tempfile
168169
169170
N = 12
170171
starts = [f"20{i:>02d}-01-01" for i in range(N)]
171172
ends = [f"20{i:>02d}-12-13" for i in range(N)]
172173
173-
pathlib.Path("data/timeseries").mkdir(exist_ok=True)
174+
tmpdir = tempfile.TemporaryDirectory(ignore_cleanup_errors=True)
174175
175176
for i, (start, end) in enumerate(zip(starts, ends)):
176177
ts = make_timeseries(start=start, end=end, freq="1min", seed=i)
177-
ts.to_parquet(f"data/timeseries/ts-{i:0>2d}.parquet")
178+
ts.to_parquet(f"{tmpdir.name}/ts-{i:0>2d}.parquet")
178179
179180
180181
::
181182

182-
data
183-
└── timeseries
184-
├── ts-00.parquet
185-
├── ts-01.parquet
186-
├── ts-02.parquet
187-
├── ts-03.parquet
188-
├── ts-04.parquet
189-
├── ts-05.parquet
190-
├── ts-06.parquet
191-
├── ts-07.parquet
192-
├── ts-08.parquet
193-
├── ts-09.parquet
194-
├── ts-10.parquet
195-
└── ts-11.parquet
183+
tmpdir
184+
├── ts-00.parquet
185+
├── ts-01.parquet
186+
├── ts-02.parquet
187+
├── ts-03.parquet
188+
├── ts-04.parquet
189+
├── ts-05.parquet
190+
├── ts-06.parquet
191+
├── ts-07.parquet
192+
├── ts-08.parquet
193+
├── ts-09.parquet
194+
├── ts-10.parquet
195+
└── ts-11.parquet
196196

197197
Now we'll implement an out-of-core :meth:`pandas.Series.value_counts`. The peak memory usage of this
198198
workflow is the single largest chunk, plus a small series storing the unique value
@@ -202,13 +202,18 @@ work for arbitrary-sized datasets.
202202
.. ipython:: python
203203
204204
%%time
205-
files = pathlib.Path("data/timeseries/").glob("ts*.parquet")
205+
files = glob.iglob(f"{tmpdir.name}/ts*.parquet")
206206
counts = pd.Series(dtype=int)
207207
for path in files:
208208
df = pd.read_parquet(path)
209209
counts = counts.add(df["name"].value_counts(), fill_value=0)
210210
counts.astype(int)
211211
212+
.. ipython:: python
213+
:suppress:
214+
215+
tmpdir.cleanup()
216+
212217
Some readers, like :meth:`pandas.read_csv`, offer parameters to control the
213218
``chunksize`` when reading a single file.
214219

doc/source/whatsnew/v2.3.2.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ become the default string dtype in pandas 3.0. See
2222

2323
Bug fixes
2424
^^^^^^^^^
25+
- Fix :meth:`~Series.str.isdigit` to correctly recognize unicode superscript
26+
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
2527
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2628
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2729
(:issue:`61889`)

doc/source/whatsnew/v3.0.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -666,6 +666,7 @@ Other Deprecations
666666
- Deprecated using ``epoch`` date format in :meth:`DataFrame.to_json` and :meth:`Series.to_json`, use ``iso`` instead. (:issue:`57063`)
667667
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.unstack` and :meth:`DataFrame.unstack` (:issue:`12189`, :issue:`53868`)
668668
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
669+
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
669670

670671
.. ---------------------------------------------------------------------------
671672
.. _whatsnew_300.prior_deprecations:
@@ -998,6 +999,7 @@ MultiIndex
998999
- Bug in :class:`DataFrame` arithmetic operations in case of unaligned MultiIndex columns (:issue:`60498`)
9991000
- Bug in :class:`DataFrame` arithmetic operations with :class:`Series` in case of unaligned MultiIndex (:issue:`61009`)
10001001
- Bug in :meth:`MultiIndex.from_tuples` causing wrong output with input of type tuples having NaN values (:issue:`60695`, :issue:`60988`)
1002+
- Bug in :meth:`DataFrame.__setitem__` where column alignment logic would reindex the assigned value with an empty index, incorrectly setting all values to ``NaN``.(:issue:`61841`)
10011003
- Bug in :meth:`DataFrame.reindex` and :meth:`Series.reindex` where reindexing :class:`Index` to a :class:`MultiIndex` would incorrectly set all values to ``NaN``.(:issue:`60923`)
10021004

10031005
I/O

pandas/core/arrays/_arrow_string_mixins.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from pandas.compat import (
1616
HAS_PYARROW,
1717
pa_version_under17p0,
18+
pa_version_under21p0,
1819
)
1920

2021
if HAS_PYARROW:
@@ -267,6 +268,12 @@ def _str_isdecimal(self):
267268
return self._convert_bool_result(result)
268269

269270
def _str_isdigit(self):
271+
if pa_version_under21p0:
272+
# https://github.com/pandas-dev/pandas/issues/61466
273+
res_list = self._apply_elementwise(str.isdigit)
274+
return self._convert_bool_result(
275+
pa.chunked_array(res_list, type=pa.bool_())
276+
)
270277
result = pc.utf8_is_digit(self._pa_array)
271278
return self._convert_bool_result(result)
272279

pandas/core/frame.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2132,12 +2132,12 @@ def from_records(
21322132
"""
21332133
Convert structured or record ndarray to DataFrame.
21342134
2135-
Creates a DataFrame object from a structured ndarray, or sequence of
2135+
Creates a DataFrame object from a structured ndarray, or iterable of
21362136
tuples or dicts.
21372137
21382138
Parameters
21392139
----------
2140-
data : structured ndarray, sequence of tuples or dicts
2140+
data : structured ndarray, iterable of tuples or dicts
21412141
Structured input data.
21422142
index : str, list of fields, array-like
21432143
Field of array to use as the index, alternately a specific set of
@@ -4452,6 +4452,11 @@ def _set_item_frame_value(self, key, value: DataFrame) -> None:
44524452
loc, (slice, Series, np.ndarray, Index)
44534453
):
44544454
cols_droplevel = maybe_droplevels(cols, key)
4455+
if (
4456+
not isinstance(cols_droplevel, MultiIndex)
4457+
and not cols_droplevel.any()
4458+
):
4459+
return
44554460
if len(cols_droplevel) and not cols_droplevel.equals(value.columns):
44564461
value = value.reindex(cols_droplevel, axis=1)
44574462

pandas/core/indexes/datetimes.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,13 @@
2626
to_offset,
2727
)
2828
from pandas._libs.tslibs.offsets import prefix_mapping
29+
from pandas.errors import Pandas4Warning
2930
from pandas.util._decorators import (
3031
cache_readonly,
3132
doc,
3233
set_module,
3334
)
35+
from pandas.util._exceptions import find_stack_level
3436

3537
from pandas.core.dtypes.common import is_scalar
3638
from pandas.core.dtypes.dtypes import (
@@ -645,6 +647,13 @@ def _maybe_cast_slice_bound(self, label, side: str):
645647
# Pandas supports slicing with dates, treated as datetimes at midnight.
646648
# https://github.com/pandas-dev/pandas/issues/31501
647649
label = Timestamp(label).to_pydatetime()
650+
warnings.warn(
651+
# GH#35830 deprecate last remaining inconsistent date treatment
652+
"Slicing with a datetime.date object is deprecated. "
653+
"Explicitly cast to Timestamp instead.",
654+
Pandas4Warning,
655+
stacklevel=find_stack_level(),
656+
)
648657

649658
label = super()._maybe_cast_slice_bound(label, side)
650659
self._data._assert_tzawareness_compat(label)

pandas/core/reshape/melt.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55

66
import numpy as np
77

8+
from pandas.util._decorators import set_module
9+
810
from pandas.core.dtypes.common import (
911
is_iterator,
1012
is_list_like,
@@ -39,6 +41,7 @@ def ensure_list_vars(arg_vars, variable: str, columns) -> list:
3941
return []
4042

4143

44+
@set_module("pandas")
4245
def melt(
4346
frame: DataFrame,
4447
id_vars=None,
@@ -275,6 +278,7 @@ def melt(
275278
return result
276279

277280

281+
@set_module("pandas")
278282
def lreshape(data: DataFrame, groups: dict, dropna: bool = True) -> DataFrame:
279283
"""
280284
Reshape wide-format data to long. Generalized inverse of DataFrame.pivot.
@@ -361,6 +365,7 @@ def lreshape(data: DataFrame, groups: dict, dropna: bool = True) -> DataFrame:
361365
return data._constructor(mdata, columns=id_cols + pivot_cols)
362366

363367

368+
@set_module("pandas")
364369
def wide_to_long(
365370
df: DataFrame, stubnames, i, j, sep: str = "", suffix: str = r"\d+"
366371
) -> DataFrame:

pandas/core/reshape/pivot.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import numpy as np
1111

1212
from pandas._libs import lib
13+
from pandas.util._decorators import set_module
1314

1415
from pandas.core.dtypes.cast import maybe_downcast_to_dtype
1516
from pandas.core.dtypes.common import (
@@ -50,6 +51,7 @@
5051
from pandas import DataFrame
5152

5253

54+
@set_module("pandas")
5355
def pivot_table(
5456
data: DataFrame,
5557
values=None,
@@ -699,6 +701,7 @@ def _convert_by(by):
699701
return by
700702

701703

704+
@set_module("pandas")
702705
def pivot(
703706
data: DataFrame,
704707
*,
@@ -917,6 +920,7 @@ def pivot(
917920
return result
918921

919922

923+
@set_module("pandas")
920924
def crosstab(
921925
index,
922926
columns,

pandas/core/reshape/tile.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
Timestamp,
1818
lib,
1919
)
20+
from pandas.util._decorators import set_module
2021

2122
from pandas.core.dtypes.common import (
2223
ensure_platform_int,
@@ -51,6 +52,7 @@
5152
)
5253

5354

55+
@set_module("pandas")
5456
def cut(
5557
x,
5658
bins,
@@ -287,6 +289,7 @@ def cut(
287289
return _postprocess_for_cut(fac, bins, retbins, original)
288290

289291

292+
@set_module("pandas")
290293
def qcut(
291294
x,
292295
q,

pandas/core/strings/accessor.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3602,16 +3602,26 @@ def casefold(self):
36023602
Series.str.isupper : Check whether all characters are uppercase.
36033603
Series.str.istitle : Check whether all characters are titlecase.
36043604
3605-
Examples
3606-
--------
3605+
Notes
3606+
-----
36073607
Similar to ``str.isdecimal`` but also includes special digits, like
36083608
superscripted and subscripted digits in unicode.
36093609
3610+
The exact behavior of this method, i.e. which unicode characters are
3611+
considered as digits, depends on the backend used for string operations,
3612+
and there can be small differences.
3613+
For example, Python considers the ³ superscript character as a digit, but
3614+
not the ⅕ fraction character, while PyArrow considers both as digits. For
3615+
simple (ascii) decimal numbers, the behaviour is consistent.
3616+
3617+
Examples
3618+
--------
3619+
36103620
>>> s3 = pd.Series(['23', '³', '⅕', ''])
36113621
>>> s3.str.isdigit()
36123622
0 True
3613-
1 False
3614-
2 False
3623+
1 True
3624+
2 True
36153625
3 False
36163626
dtype: bool
36173627
"""

0 commit comments

Comments
 (0)