Skip to content

Commit cef7027

Browse files
Merge branch 'main' into fix_categorical_arrow
2 parents 2206e5c + bef88ef commit cef7027

File tree

33 files changed

+241
-344
lines changed

33 files changed

+241
-344
lines changed

asv_bench/benchmarks/tslibs/timedelta.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def time_from_int(self):
2020
Timedelta(123456789)
2121

2222
def time_from_unit(self):
23-
Timedelta(1, unit="d")
23+
Timedelta(1, unit="D")
2424

2525
def time_from_components(self):
2626
Timedelta(

ci/code_checks.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
162162
-i "pandas.Series.ne SA01" \
163163
-i "pandas.Series.pad PR01,SA01" \
164164
-i "pandas.Series.plot PR02" \
165-
-i "pandas.Series.pop RT03,SA01" \
165+
-i "pandas.Series.pop SA01" \
166166
-i "pandas.Series.prod RT03" \
167167
-i "pandas.Series.product RT03" \
168168
-i "pandas.Series.reorder_levels RT03,SA01" \
@@ -260,7 +260,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
260260
-i "pandas.Timestamp.to_period PR01,SA01" \
261261
-i "pandas.Timestamp.today SA01" \
262262
-i "pandas.Timestamp.toordinal SA01" \
263-
-i "pandas.Timestamp.tz SA01" \
264263
-i "pandas.Timestamp.tz_localize SA01" \
265264
-i "pandas.Timestamp.tzinfo GL08" \
266265
-i "pandas.Timestamp.tzname SA01" \

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Other enhancements
3939
- Users can globally disable any ``PerformanceWarning`` by setting the option ``mode.performance_warnings`` to ``False`` (:issue:`56920`)
4040
- :meth:`Styler.format_index_names` can now be used to format the index and column names (:issue:`48936` and :issue:`47489`)
4141
- :class:`.errors.DtypeWarning` improved to include column names when mixed data types are detected (:issue:`58174`)
42+
- :func:`DataFrame.to_excel` argument ``merge_cells`` now accepts a value of ``"columns"`` to only merge :class:`MultiIndex` column header header cells (:issue:`35384`)
4243
- :meth:`DataFrame.corrwith` now accepts ``min_periods`` as optional arguments, as in :meth:`DataFrame.corr` and :meth:`Series.corr` (:issue:`9490`)
4344
- :meth:`DataFrame.cummin`, :meth:`DataFrame.cummax`, :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods now have a ``numeric_only`` parameter (:issue:`53072`)
4445
- :meth:`DataFrame.fillna` and :meth:`Series.fillna` can now accept ``value=None``; for non-object dtype the corresponding NA value will be used (:issue:`57723`)
@@ -380,6 +381,8 @@ Other Removals
380381
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`57627`)
381382
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`57627`)
382383
- Enforced deprecation of the behavior of :func:`concat` when ``len(keys) != len(objs)`` would truncate to the shorter of the two. Now this raises a ``ValueError`` (:issue:`43485`)
384+
- Enforced deprecation of the behavior of :meth:`DataFrame.replace` and :meth:`Series.replace` with :class:`CategoricalDtype` that would introduce new categories. (:issue:`58270`)
385+
- Enforced deprecation of the behavior of :meth:`Series.argsort` in the presence of NA values (:issue:`58232`)
383386
- Enforced deprecation of values "pad", "ffill", "bfill", and "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` (:issue:`57869`)
384387
- Enforced deprecation removing :meth:`Categorical.to_list`, use ``obj.tolist()`` instead (:issue:`51254`)
385388
- Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
@@ -544,6 +547,7 @@ MultiIndex
544547
^^^^^^^^^^
545548
- :func:`DataFrame.loc` with ``axis=0`` and :class:`MultiIndex` when setting a value adds extra columns (:issue:`58116`)
546549
- :meth:`DataFrame.melt` would not accept multiple names in ``var_name`` when the columns were a :class:`MultiIndex` (:issue:`58033`)
550+
- :meth:`MultiIndex.insert` would not insert NA value correctly at unified location of index -1 (:issue:`59003`)
547551
-
548552

549553
I/O
@@ -556,7 +560,9 @@ I/O
556560
- Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
557561
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
558562
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
563+
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
559564
- Bug in :meth:`read_stata` raising ``KeyError`` when input file is stored in big-endian format and contains strL data. (:issue:`58638`)
565+
-
560566

561567
Period
562568
^^^^^^

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2368,6 +2368,17 @@ timedelta}, default 'raise'
23682368
"""
23692369
Alias for tzinfo.
23702370
2371+
The `tz` property provides a simple and direct way to retrieve the timezone
2372+
information of a `Timestamp` object. It is particularly useful when working
2373+
with time series data that includes timezone information, allowing for easy
2374+
access and manipulation of the timezone context.
2375+
2376+
See Also
2377+
--------
2378+
Timestamp.tzinfo : Returns the timezone information of the Timestamp.
2379+
Timestamp.tz_convert : Convert timezone-aware Timestamp to another time zone.
2380+
Timestamp.tz_localize : Localize the Timestamp to a timezone.
2381+
23712382
Examples
23722383
--------
23732384
>>> ts = pd.Timestamp(1584226800, unit='s', tz='Europe/Stockholm')

pandas/_typing.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,6 +510,7 @@ def closed(self) -> bool:
510510

511511
# ExcelWriter
512512
ExcelWriterIfSheetExists = Literal["error", "new", "replace", "overlay"]
513+
ExcelWriterMergeCells = Union[bool, Literal["columns"]]
513514

514515
# Offsets
515516
OffsetCalendar = Union[np.busdaycalendar, "AbstractHolidayCalendar"]

pandas/core/arrays/boolean.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -329,15 +329,21 @@ def _from_sequence_of_strings(
329329
copy: bool = False,
330330
true_values: list[str] | None = None,
331331
false_values: list[str] | None = None,
332+
none_values: list[str] | None = None,
332333
) -> BooleanArray:
333334
true_values_union = cls._TRUE_VALUES.union(true_values or [])
334335
false_values_union = cls._FALSE_VALUES.union(false_values or [])
335336

336-
def map_string(s) -> bool:
337+
if none_values is None:
338+
none_values = []
339+
340+
def map_string(s) -> bool | None:
337341
if s in true_values_union:
338342
return True
339343
elif s in false_values_union:
340344
return False
345+
elif s in none_values:
346+
return None
341347
else:
342348
raise ValueError(f"{s} cannot be cast to bool")
343349

pandas/core/arrays/categorical.py

Lines changed: 0 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
cast,
1111
overload,
1212
)
13-
import warnings
1413

1514
import numpy as np
1615

@@ -23,7 +22,6 @@
2322
)
2423
from pandas._libs.arrays import NDArrayBacked
2524
from pandas.compat.numpy import function as nv
26-
from pandas.util._exceptions import find_stack_level
2725
from pandas.util._validators import validate_bool_kwarg
2826

2927
from pandas.core.dtypes.cast import (
@@ -2673,62 +2671,6 @@ def isin(self, values: ArrayLike) -> npt.NDArray[np.bool_]:
26732671
code_values = code_values[null_mask | (code_values >= 0)]
26742672
return algorithms.isin(self.codes, code_values)
26752673

2676-
@overload
2677-
def _replace(self, *, to_replace, value, inplace: Literal[False] = ...) -> Self: ...
2678-
2679-
@overload
2680-
def _replace(self, *, to_replace, value, inplace: Literal[True]) -> None: ...
2681-
2682-
def _replace(self, *, to_replace, value, inplace: bool = False) -> Self | None:
2683-
from pandas import Index
2684-
2685-
orig_dtype = self.dtype
2686-
2687-
inplace = validate_bool_kwarg(inplace, "inplace")
2688-
cat = self if inplace else self.copy()
2689-
2690-
mask = isna(np.asarray(value))
2691-
if mask.any():
2692-
removals = np.asarray(to_replace)[mask]
2693-
removals = cat.categories[cat.categories.isin(removals)]
2694-
new_cat = cat.remove_categories(removals)
2695-
NDArrayBacked.__init__(cat, new_cat.codes, new_cat.dtype)
2696-
2697-
ser = cat.categories.to_series()
2698-
ser = ser.replace(to_replace=to_replace, value=value)
2699-
2700-
all_values = Index(ser)
2701-
2702-
# GH51016: maintain order of existing categories
2703-
idxr = cat.categories.get_indexer_for(all_values)
2704-
locs = np.arange(len(ser))
2705-
locs = np.where(idxr == -1, locs, idxr)
2706-
locs = locs.argsort()
2707-
2708-
new_categories = ser.take(locs)
2709-
new_categories = new_categories.drop_duplicates(keep="first")
2710-
index_categories = Index(new_categories)
2711-
new_codes = recode_for_categories(
2712-
cat._codes, all_values, index_categories, copy=False
2713-
)
2714-
new_dtype = CategoricalDtype(index_categories, ordered=self.dtype.ordered)
2715-
NDArrayBacked.__init__(cat, new_codes, new_dtype)
2716-
2717-
if new_dtype != orig_dtype:
2718-
warnings.warn(
2719-
# GH#55147
2720-
"The behavior of Series.replace (and DataFrame.replace) with "
2721-
"CategoricalDtype is deprecated. In a future version, replace "
2722-
"will only be used for cases that preserve the categories. "
2723-
"To change the categories, use ser.cat.rename_categories "
2724-
"instead.",
2725-
FutureWarning,
2726-
stacklevel=find_stack_level(),
2727-
)
2728-
if not inplace:
2729-
return cat
2730-
return None
2731-
27322674
# ------------------------------------------------------------------------
27332675
# String methods interface
27342676
def _str_map(

pandas/core/arrays/sparse/accessor.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,8 @@ def from_coo(cls, A, dense_index: bool = False) -> Series:
9898
... ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
9999
... )
100100
>>> A
101-
<3x4 sparse matrix of type '<class 'numpy.float64'>'
102-
with 3 stored elements in COOrdinate format>
101+
<COOrdinate sparse matrix of dtype 'float64'
102+
with 3 stored elements and shape (3, 4)>
103103
104104
>>> A.todense()
105105
matrix([[0., 0., 1., 2.],
@@ -186,8 +186,8 @@ def to_coo(
186186
... row_levels=["A", "B"], column_levels=["C", "D"], sort_labels=True
187187
... )
188188
>>> A
189-
<3x4 sparse matrix of type '<class 'numpy.float64'>'
190-
with 3 stored elements in COOrdinate format>
189+
<COOrdinate sparse matrix of dtype 'float64'
190+
with 3 stored elements and shape (3, 4)>
191191
>>> A.todense()
192192
matrix([[0., 0., 1., 3.],
193193
[3., 0., 0., 0.],
@@ -380,8 +380,8 @@ def to_coo(self) -> spmatrix:
380380
--------
381381
>>> df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1, 0, 1])})
382382
>>> df.sparse.to_coo()
383-
<4x1 sparse matrix of type '<class 'numpy.int64'>'
384-
with 2 stored elements in COOrdinate format>
383+
<COOrdinate sparse matrix of dtype 'int64'
384+
with 2 stored elements and shape (4, 1)>
385385
"""
386386
import_optional_dependency("scipy")
387387
from scipy.sparse import coo_matrix

pandas/core/generic.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5731,7 +5731,7 @@ def sample(
57315731
replace : bool, default False
57325732
Allow or disallow sampling of the same row more than once.
57335733
weights : str or ndarray-like, optional
5734-
Default 'None' results in equal probability weighting.
5734+
Default ``None`` results in equal probability weighting.
57355735
If passed a Series, will align with target object on index. Index
57365736
values in weights not found in sampled object will be ignored and
57375737
index values in sampled object not in weights will be assigned
@@ -5746,6 +5746,7 @@ def sample(
57465746
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
57475747
If int, array-like, or BitGenerator, seed for random number generator.
57485748
If np.random.RandomState or np.random.Generator, use as given.
5749+
Default ``None`` results in sampling with the current state of np.random.
57495750
57505751
.. versionchanged:: 1.4.0
57515752

pandas/core/groupby/groupby.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5389,6 +5389,7 @@ def sample(
53895389
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
53905390
If int, array-like, or BitGenerator, seed for random number generator.
53915391
If np.random.RandomState or np.random.Generator, use as given.
5392+
Default ``None`` results in sampling with the current state of np.random.
53925393
53935394
.. versionchanged:: 1.4.0
53945395
@@ -5403,6 +5404,7 @@ def sample(
54035404
See Also
54045405
--------
54055406
DataFrame.sample: Generate random samples from a DataFrame object.
5407+
Series.sample: Generate random samples from a Series object.
54065408
numpy.random.choice: Generate a random sample from a given 1-D numpy
54075409
array.
54085410

0 commit comments

Comments
 (0)