Skip to content

Commit 3ceadab

Browse files
committed
Merge remote-tracking branch 'upstream/main' into ci/tests
2 parents 59869f2 + 2cc9b21 commit 3ceadab

File tree

26 files changed

+609
-166
lines changed

26 files changed

+609
-166
lines changed

doc/source/user_guide/cookbook.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -874,7 +874,7 @@ Timeseries
874874
<https://stackoverflow.com/questions/13893227/vectorized-look-up-of-values-in-pandas-dataframe>`__
875875

876876
`Aggregation and plotting time series
877-
<https://nipunbatra.github.io/blog/visualisation/2013/05/01/aggregation-timeseries.html>`__
877+
<https://nipunbatra.github.io/blog/posts/2013-05-01-aggregation-timeseries.html>`__
878878

879879
Turn a matrix with hours in columns and days in rows into a continuous row sequence in the form of a time series.
880880
`How to rearrange a Python pandas DataFrame?

doc/source/whatsnew/v3.0.0.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Other enhancements
3535
- :class:`pandas.api.typing.NoDefault` is available for typing ``no_default``
3636
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
3737
- :func:`pandas.merge` now validates the ``how`` parameter input (merge type) (:issue:`59435`)
38+
- :func:`pandas.merge`, :meth:`DataFrame.merge` and :meth:`DataFrame.join` now support anti joins (``left_anti`` and ``right_anti``) in the ``how`` parameter (:issue:`42916`)
3839
- :func:`read_spss` now supports kwargs to be passed to pyreadstat (:issue:`56356`)
3940
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
4041
- :meth:`DataFrame.agg` called with ``axis=1`` and a ``func`` which relabels the result index now raises a ``NotImplementedError`` (:issue:`58807`).
@@ -68,6 +69,7 @@ Other enhancements
6869
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
6970
- :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
7071
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
72+
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
7173
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
7274
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
7375
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
@@ -631,6 +633,7 @@ Datetimelike
631633
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
632634
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
633635
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
636+
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
634637
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
635638
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
636639
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
@@ -766,6 +769,7 @@ Reshaping
766769
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
767770
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
768771
- Bug in :meth:`DataFrame.pivot_table` incorrectly subaggregating results when called without an ``index`` argument (:issue:`58722`)
772+
- Bug in :meth:`DataFrame.stack` with the new implementation where ``ValueError`` is raised when ``level=[]`` (:issue:`60740`)
769773
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
770774

771775
Sparse

pandas/_libs/interval.pyx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,12 @@ cdef class IntervalMixin:
209209
"""
210210
Indicates if an interval is empty, meaning it contains no points.
211211
212+
An interval is considered empty if its `left` and `right` endpoints
213+
are equal, and it is not closed on both sides. This means that the
214+
interval does not include any real points. In the case of an
215+
:class:`pandas.arrays.IntervalArray` or :class:`IntervalIndex`, the
216+
property returns a boolean array indicating the emptiness of each interval.
217+
212218
Returns
213219
-------
214220
bool or ndarray

pandas/_libs/tslibs/period.pyx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2140,6 +2140,12 @@ cdef class _Period(PeriodMixin):
21402140
"""
21412141
Get day of the month that a Period falls on.
21422142

2143+
The `day` property provides a simple way to access the day component
2144+
of a `Period` object, which represents time spans in various frequencies
2145+
(e.g., daily, hourly, monthly). If the period's frequency does not include
2146+
a day component (e.g., yearly or quarterly periods), the returned day
2147+
corresponds to the first day of that period.
2148+
21432149
Returns
21442150
-------
21452151
int

pandas/_typing.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,9 @@ def closed(self) -> bool:
442442
AnyAll = Literal["any", "all"]
443443

444444
# merge
445-
MergeHow = Literal["left", "right", "inner", "outer", "cross"]
445+
MergeHow = Literal[
446+
"left", "right", "inner", "outer", "cross", "left_anti", "right_anti"
447+
]
446448
MergeValidate = Literal[
447449
"one_to_one",
448450
"1:1",

pandas/core/frame.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,8 @@
315315
----------%s
316316
right : DataFrame or named Series
317317
Object to merge with.
318-
how : {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'
318+
how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},
319+
default 'inner'
319320
Type of merge to be performed.
320321
321322
* left: use only keys from left frame, similar to a SQL left outer join;
@@ -328,6 +329,10 @@
328329
join; preserve the order of the left keys.
329330
* cross: creates the cartesian product from both frames, preserves the order
330331
of the left keys.
332+
* left_anti: use only keys from left frame that are not in right frame, similar
333+
to SQL left anti join; preserve key order.
334+
* right_anti: use only keys from right frame that are not in left frame, similar
335+
to SQL right anti join; preserve key order.
331336
on : label or list
332337
Column or index level names to join on. These must be found in both
333338
DataFrames. If `on` is None and not merging on indexes then this defaults
@@ -3205,9 +3210,13 @@ def to_html(
32053210
Convert the characters <, >, and & to HTML-safe sequences.
32063211
notebook : {True, False}, default False
32073212
Whether the generated HTML is for IPython Notebook.
3208-
border : int
3209-
A ``border=border`` attribute is included in the opening
3210-
`<table>` tag. Default ``pd.options.display.html.border``.
3213+
border : int or bool
3214+
When an integer value is provided, it sets the border attribute in
3215+
the opening tag, specifying the thickness of the border.
3216+
If ``False`` or ``0`` is passed, the border attribute will not
3217+
be present in the ``<table>`` tag.
3218+
The default value for this parameter is governed by
3219+
``pd.options.display.html.border``.
32113220
table_id : str, optional
32123221
A css id is included in the opening `<table>` tag if specified.
32133222
render_links : bool, default False
@@ -4789,6 +4798,10 @@ def select_dtypes(self, include=None, exclude=None) -> DataFrame:
47894798
"""
47904799
Return a subset of the DataFrame's columns based on the column dtypes.
47914800
4801+
This method allows for filtering columns based on their data types.
4802+
It is useful when working with heterogeneous DataFrames where operations
4803+
need to be performed on a specific subset of data types.
4804+
47924805
Parameters
47934806
----------
47944807
include, exclude : scalar or list-like
@@ -10605,7 +10618,8 @@ def join(
1060510618
values given, the `other` DataFrame must have a MultiIndex. Can
1060610619
pass an array as the join key if it is not already contained in
1060710620
the calling DataFrame. Like an Excel VLOOKUP operation.
10608-
how : {'left', 'right', 'outer', 'inner', 'cross'}, default 'left'
10621+
how : {'left', 'right', 'outer', 'inner', 'cross', 'left_anti', 'right_anti'},
10622+
default 'left'
1060910623
How to handle the operation of the two objects.
1061010624
1061110625
* left: use calling frame's index (or column if on is specified)
@@ -10617,6 +10631,10 @@ def join(
1061710631
of the calling's one.
1061810632
* cross: creates the cartesian product from both frames, preserves the order
1061910633
of the left keys.
10634+
* left_anti: use set difference of calling frame's index and `other`'s
10635+
index.
10636+
* right_anti: use set difference of `other`'s index and calling frame's
10637+
index.
1062010638
lsuffix : str, default ''
1062110639
Suffix to use from left frame's overlapping columns.
1062210640
rsuffix : str, default ''

pandas/core/groupby/groupby.py

Lines changed: 9 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2170,8 +2170,7 @@ def mean(
21702170
numeric_only no longer accepts ``None`` and defaults to ``False``.
21712171
21722172
skipna : bool, default True
2173-
Exclude NA/null values. If an entire row/column is NA, the result
2174-
will be NA.
2173+
Exclude NA/null values. If an entire group is NA, the result will be NA.
21752174
21762175
.. versionadded:: 3.0.0
21772176
@@ -2271,8 +2270,7 @@ def median(self, numeric_only: bool = False, skipna: bool = True) -> NDFrameT:
22712270
numeric_only no longer accepts ``None`` and defaults to False.
22722271
22732272
skipna : bool, default True
2274-
Exclude NA/null values. If an entire row/column is NA, the result
2275-
will be NA.
2273+
Exclude NA/null values. If an entire group is NA, the result will be NA.
22762274
22772275
.. versionadded:: 3.0.0
22782276
@@ -2405,8 +2403,7 @@ def std(
24052403
numeric_only now defaults to ``False``.
24062404
24072405
skipna : bool, default True
2408-
Exclude NA/null values. If an entire row/column is NA, the result
2409-
will be NA.
2406+
Exclude NA/null values. If an entire group is NA, the result will be NA.
24102407
24112408
.. versionadded:: 3.0.0
24122409
@@ -2524,8 +2521,7 @@ def var(
25242521
numeric_only now defaults to ``False``.
25252522
25262523
skipna : bool, default True
2527-
Exclude NA/null values. If an entire row/column is NA, the result
2528-
will be NA.
2524+
Exclude NA/null values. If an entire group is NA, the result will be NA.
25292525
25302526
.. versionadded:: 3.0.0
25312527
@@ -2742,8 +2738,7 @@ def sem(
27422738
numeric_only now defaults to ``False``.
27432739
27442740
skipna : bool, default True
2745-
Exclude NA/null values. If an entire row/column is NA, the result
2746-
will be NA.
2741+
Exclude NA/null values. If an entire group is NA, the result will be NA.
27472742
27482743
.. versionadded:: 3.0.0
27492744
@@ -3021,8 +3016,7 @@ def prod(
30213016
than ``min_count`` non-NA values are present the result will be NA.
30223017
30233018
skipna : bool, default True
3024-
Exclude NA/null values. If an entire row/column is NA, the result
3025-
will be NA.
3019+
Exclude NA/null values. If an entire group is NA, the result will be NA.
30263020
30273021
.. versionadded:: 3.0.0
30283022
@@ -3242,8 +3236,7 @@ def first(
32423236
The required number of valid values to perform the operation. If fewer
32433237
than ``min_count`` valid values are present the result will be NA.
32443238
skipna : bool, default True
3245-
Exclude NA/null values. If an entire row/column is NA, the result
3246-
will be NA.
3239+
Exclude NA/null values. If an entire group is NA, the result will be NA.
32473240
32483241
.. versionadded:: 2.2.1
32493242
@@ -3329,8 +3322,7 @@ def last(
33293322
The required number of valid values to perform the operation. If fewer
33303323
than ``min_count`` valid values are present the result will be NA.
33313324
skipna : bool, default True
3332-
Exclude NA/null values. If an entire row/column is NA, the result
3333-
will be NA.
3325+
Exclude NA/null values. If an entire group is NA, the result will be NA.
33343326
33353327
.. versionadded:: 2.2.1
33363328
@@ -5530,8 +5522,7 @@ def _idxmax_idxmin(
55305522
numeric_only : bool, default False
55315523
Include only float, int, boolean columns.
55325524
skipna : bool, default True
5533-
Exclude NA/null values. If an entire row/column is NA, the result
5534-
will be NA.
5525+
Exclude NA/null values. If an entire group is NA, the result will be NA.
55355526
ignore_unobserved : bool, default False
55365527
When True and an unobserved group is encountered, do not raise. This used
55375528
for transform where unobserved groups do not play an impact on the result.

pandas/core/nanops.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,11 +1093,14 @@ def reduction(
10931093
if values.size == 0:
10941094
return _na_for_min_count(values, axis)
10951095

1096+
dtype = values.dtype
10961097
values, mask = _get_values(
10971098
values, skipna, fill_value_typ=fill_value_typ, mask=mask
10981099
)
10991100
result = getattr(values, meth)(axis)
1100-
result = _maybe_null_out(result, axis, mask, values.shape)
1101+
result = _maybe_null_out(
1102+
result, axis, mask, values.shape, datetimelike=dtype.kind in "mM"
1103+
)
11011104
return result
11021105

11031106
return reduction
@@ -1499,6 +1502,7 @@ def _maybe_null_out(
14991502
mask: npt.NDArray[np.bool_] | None,
15001503
shape: tuple[int, ...],
15011504
min_count: int = 1,
1505+
datetimelike: bool = False,
15021506
) -> np.ndarray | float | NaTType:
15031507
"""
15041508
Returns
@@ -1520,7 +1524,10 @@ def _maybe_null_out(
15201524
null_mask = np.broadcast_to(below_count, new_shape)
15211525

15221526
if np.any(null_mask):
1523-
if is_numeric_dtype(result):
1527+
if datetimelike:
1528+
# GH#60646 For datetimelike, no need to cast to float
1529+
result[null_mask] = iNaT
1530+
elif is_numeric_dtype(result):
15241531
if np.iscomplexobj(result):
15251532
result = result.astype("c16")
15261533
elif not is_float_dtype(result):

0 commit comments

Comments
 (0)