Skip to content

Commit 23bdbd9

Browse files
author
GitHub Actions
committed
Merge remote-tracking branch 'upstream/main' into fix-issue-63071-bug-dataframe-loc-returns-object-type-instead-of-f
2 parents 75edf05 + 7bf6660 commit 23bdbd9

File tree

136 files changed

+1213
-487
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

136 files changed

+1213
-487
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
- [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit).
44
- [ ] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints) to new arguments/methods/functions.
55
- [ ] Added an entry in the latest `doc/source/whatsnew/vX.X.X.rst` file if fixing a bug or adding a new feature.
6+
- [ ] If I used AI to develop this pull request, I prompted it to follow `AGENTS.md`.

AGENTS.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# pandas Agent Instructions
2+
3+
## Project Overview
4+
`pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
5+
6+
## Purpose
7+
- Assist contributors by suggesting code changes, tests, and documentation edits for the pandas repository while preserving stability and compatibility.
8+
9+
## Persona & Tone
10+
- Concise, neutral, code-focused. Prioritize correctness, readability, and tests.
11+
12+
## Project Guidelines
13+
- Be sure to follow all guidelines for contributing to the codebase specified at https://pandas.pydata.org/docs/development/contributing_codebase.html
14+
- These guidelines are also available in the following local files, which should be loaded into context and adhered to
15+
- doc/source/development/contributing_codebase.rst
16+
- doc/source/development/contributing_docstring.rst
17+
- doc/source/development/contributing_documentation.rst
18+
- doc/source/development/contributing.rst
19+
20+
## Decision heuristics
21+
- Favor small, backward-compatible changes with tests.
22+
- If a change would be breaking, propose it behind a deprecation path and document the rationale.
23+
- Prefer readability over micro-optimizations unless benchmarks are requested.
24+
- Add tests for behavioral changes; update docs only after code change is final.
25+
26+
## Type hints guidance (summary)
27+
- Prefer PEP 484 style and types in pandas._typing when appropriate.
28+
- Avoid unnecessary use of typing.cast; prefer refactors that convey types to type-checkers.
29+
- Use builtin generics (list, dict) when possible.
30+
31+
## Docstring guidance (summary)
32+
- Follow NumPy / numpydoc conventions used across the repo: short summary, extended summary, Parameters, Returns/Yields, See Also, Notes, Examples.
33+
- Ensure examples are deterministic, import numpy/pandas as documented, and pass doctest rules used by docs validation.
34+
- Preserve formatting rules: triple double-quotes, no blank line before/after docstring, parameter formatting ("name : type, default ..."), types and examples conventions.
35+
36+
## Pull Requests (summary)
37+
- Pull request titles should be descriptive and include one of the following prefixes:
38+
- ENH: Enhancement, new functionality
39+
- BUG: Bug fix
40+
- DOC: Additions/updates to documentation
41+
- TST: Additions/updates to tests
42+
- BLD: Updates to the build process/scripts
43+
- PERF: Performance improvement
44+
- TYP: Type annotations
45+
- CLN: Code cleanup
46+
- Pull request descriptions should follow the template, and **succinctly** describe the change being made. Usually a few sentences is sufficient.
47+
- Pull requests which are resolving an existing Github Issue should include a link to the issue in the PR Description.
48+
- Do not add summaries or additional comments to individual commit messages. The single PR description is sufficient.

doc/source/user_guide/io.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ on_bad_lines : {{'error', 'warn', 'skip'}}, default 'error'
343343
Specifies what to do upon encountering a bad line (a line with too many fields).
344344
Allowed values are :
345345

346-
- 'error', raise an ParserError when a bad line is encountered.
346+
- 'error', raise a ParserError when a bad line is encountered.
347347
- 'warn', print a warning when a bad line is encountered and skip that line.
348348
- 'skip', skip bad lines without raising or warning when they are encountered.
349349

@@ -3717,6 +3717,7 @@ The look and feel of Excel worksheets created from pandas can be modified using
37173717

37183718
* ``float_format`` : Format string for floating point numbers (default ``None``).
37193719
* ``freeze_panes`` : A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so (1, 1) will freeze the first row and first column (default ``None``).
3720+
* ``autofilter`` : A boolean indicating whether to add automatic filters to all columns (default ``False``).
37203721

37213722
.. note::
37223723

doc/source/user_guide/timeseries.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,19 @@ inferred frequency upon creation:
241241
242242
pd.DatetimeIndex(["2018-01-01", "2018-01-03", "2018-01-05"], freq="infer")
243243
244+
In most cases, parsing strings to datetimes (with any of :func:`to_datetime`, :class:`DatetimeIndex`, or :class:`Timestamp`) will produce objects with microsecond ("us") unit. The exception to this rule is if your strings have nanosecond precision, in which case the result will have "ns" unit:
245+
246+
.. ipython:: python
247+
248+
pd.to_datetime(["2016-01-01 02:03:04"]).unit
249+
pd.to_datetime(["2016-01-01 02:03:04.123"]).unit
250+
pd.to_datetime(["2016-01-01 02:03:04.123456"]).unit
251+
pd.to_datetime(["2016-01-01 02:03:04.123456789"]).unit
252+
253+
.. versionchanged:: 3.0.0
254+
255+
Previously, :func:`to_datetime` and :class:`DatetimeIndex` would always parse strings to "ns" unit. During pandas 2.x, :class:`Timestamp` could give any of "s", "ms", "us", or "ns" depending on the specificity of the input string.
256+
244257
.. _timeseries.converting.format:
245258

246259
Providing a format argument
@@ -379,6 +392,16 @@ We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by
379392
380393
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
381394
395+
Another common way to perform this conversion is to convert directly to an integer dtype. Note that the exact integers this produces will depend on the specific unit
396+
or resolution of the datetime64 dtype:
397+
398+
.. ipython:: python
399+
400+
stamps.astype(np.int64)
401+
stamps.astype("datetime64[s]").astype(np.int64)
402+
stamps.astype("datetime64[ms]").astype(np.int64)
403+
404+
382405
.. _timeseries.origin:
383406

384407
Using the ``origin`` parameter

doc/source/whatsnew/v3.0.0.rst

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ Other enhancements
202202
- :class:`Holiday` has gained the constructor argument and field ``exclude_dates`` to exclude specific datetimes from a custom holiday calendar (:issue:`54382`)
203203
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
204204
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
205+
- :func:`DataFrame.to_excel` has a new ``autofilter`` parameter to add automatic filters to all columns (:issue:`61194`)
205206
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
206207
- :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
207208
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
@@ -232,7 +233,6 @@ Other enhancements
232233
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
233234
- Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
234235
- Switched wheel upload to **PyPI Trusted Publishing** (OIDC) for release-tag pushes in ``wheels.yml``. (:issue:`61718`)
235-
-
236236

237237
.. ---------------------------------------------------------------------------
238238
.. _whatsnew_300.notable_bug_fixes:
@@ -358,7 +358,7 @@ When passing strings, the resolution will depend on the precision of the string,
358358
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
359359
Out[5]: dtype('<M8[ns]')
360360
361-
The inferred resolution now matches that of the input strings:
361+
The inferred resolution now matches that of the input strings for nanosecond-precision strings, otherwise defaulting to microseconds:
362362

363363
.. ipython:: python
364364
@@ -367,13 +367,17 @@ The inferred resolution now matches that of the input strings:
367367
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
368368
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
369369
370+
This is also a change for the :class:`Timestamp` constructor with a string input, which in version 2.x.y could give second or millisecond unit, which users generally disliked (:issue:`52653`)
371+
370372
In cases with mixed-resolution inputs, the highest resolution is used:
371373

372374
.. code-block:: ipython
373375
374376
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
375377
Out[2]: dtype('<M8[ns]')
376378
379+
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
380+
377381
.. _whatsnew_300.api_breaking.concat_datetime_sorting:
378382

379383
:func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`
@@ -1032,13 +1036,13 @@ Bug fixes
10321036
Categorical
10331037
^^^^^^^^^^^
10341038
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
1039+
- Bug in :class:`pandas.Categorical` displaying string categories without quotes when using "string" dtype (:issue:`63045`)
10351040
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
10361041
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
10371042
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
10381043
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
10391044
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
10401045
- Bug in :meth:`Series.convert_dtypes` with ``dtype_backend="pyarrow"`` where empty :class:`CategoricalDtype` :class:`Series` raised an error or got converted to ``null[pyarrow]`` (:issue:`59934`)
1041-
-
10421046

10431047
Datetimelike
10441048
^^^^^^^^^^^^
@@ -1111,12 +1115,14 @@ Conversion
11111115
- Bug in :meth:`DataFrame.astype` not casting ``values`` for Arrow-based dictionary dtype correctly (:issue:`58479`)
11121116
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
11131117
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
1118+
- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` raising ``TypeError`` when called on data with complex dtype (:issue:`60129`)
11141119
- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` removing timezone information for objects with :class:`ArrowDtype` (:issue:`60237`)
11151120
- Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)
11161121
- Bug in :meth:`to_datetime` and :meth:`to_timedelta` with input ``None`` returning ``None`` instead of ``NaT``, inconsistent with other conversion methods (:issue:`23055`)
11171122

11181123
Strings
11191124
^^^^^^^
1125+
- Bug in :meth:`Series.str.match` failing to raise when given a compiled ``re.Pattern`` object and conflicting ``case`` or ``flags`` arguments (:issue:`62240`)
11201126
- Bug in :meth:`Series.str.replace` raising an error on valid group references (``\1``, ``\2``, etc.) on series converted to PyArrow backend dtype (:issue:`62653`)
11211127
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
11221128
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
@@ -1127,6 +1133,7 @@ Interval
11271133
- :meth:`Index.is_monotonic_decreasing`, :meth:`Index.is_monotonic_increasing`, and :meth:`Index.is_unique` could incorrectly be ``False`` for an ``Index`` created from a slice of another ``Index``. (:issue:`57911`)
11281134
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
11291135
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
1136+
- Bug in :func:`pandas.interval_range` incorrectly inferring ``int64`` dtype when ``np.float32`` and ``int`` are used for ``start`` and ``freq`` (:issue:`58964`)
11301137
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
11311138
- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
11321139

@@ -1260,6 +1267,7 @@ Groupby/resample/rolling
12601267
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
12611268
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
12621269
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
1270+
- Bug in :meth:`DataFrame.groupby` methods when operating on NumPy-nullable data failing when the NA mask was not C-contiguous (:issue:`61031`)
12631271

12641272
Reshaping
12651273
^^^^^^^^^
@@ -1299,6 +1307,7 @@ ExtensionArray
12991307
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
13001308
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
13011309
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
1310+
- Bug in :meth:`NDArrayBackedExtensionArray.take` which produced arrays whose dtypes didn't match their underlying data, when called with integer arrays (:issue:`62448`)
13021311
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
13031312
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
13041313
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)

pandas/_libs/groupby.pyx

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -819,7 +819,7 @@ def group_prod(
819819
int64_t[::1] counts,
820820
ndarray[int64float_t, ndim=2] values,
821821
const intp_t[::1] labels,
822-
const uint8_t[:, ::1] mask,
822+
const uint8_t[:, :] mask,
823823
uint8_t[:, ::1] result_mask=None,
824824
Py_ssize_t min_count=0,
825825
bint skipna=True,
@@ -893,7 +893,7 @@ def group_var(
893893
const intp_t[::1] labels,
894894
Py_ssize_t min_count=-1,
895895
int64_t ddof=1,
896-
const uint8_t[:, ::1] mask=None,
896+
const uint8_t[:, :] mask=None,
897897
uint8_t[:, ::1] result_mask=None,
898898
bint is_datetimelike=False,
899899
str name="var",
@@ -998,7 +998,7 @@ def group_skew(
998998
int64_t[::1] counts,
999999
ndarray[float64_t, ndim=2] values,
10001000
const intp_t[::1] labels,
1001-
const uint8_t[:, ::1] mask=None,
1001+
const uint8_t[:, :] mask=None,
10021002
uint8_t[:, ::1] result_mask=None,
10031003
bint skipna=True,
10041004
) -> None:
@@ -1086,7 +1086,7 @@ def group_kurt(
10861086
int64_t[::1] counts,
10871087
ndarray[float64_t, ndim=2] values,
10881088
const intp_t[::1] labels,
1089-
const uint8_t[:, ::1] mask=None,
1089+
const uint8_t[:, :] mask=None,
10901090
uint8_t[:, ::1] result_mask=None,
10911091
bint skipna=True,
10921092
) -> None:
@@ -1180,7 +1180,7 @@ def group_mean(
11801180
const intp_t[::1] labels,
11811181
Py_ssize_t min_count=-1,
11821182
bint is_datetimelike=False,
1183-
const uint8_t[:, ::1] mask=None,
1183+
const uint8_t[:, :] mask=None,
11841184
uint8_t[:, ::1] result_mask=None,
11851185
bint skipna=True,
11861186
) -> None:
@@ -1324,7 +1324,7 @@ def group_ohlc(
13241324
ndarray[int64float_t, ndim=2] values,
13251325
const intp_t[::1] labels,
13261326
Py_ssize_t min_count=-1,
1327-
const uint8_t[:, ::1] mask=None,
1327+
const uint8_t[:, :] mask=None,
13281328
uint8_t[:, ::1] result_mask=None,
13291329
) -> None:
13301330
"""
@@ -1870,7 +1870,7 @@ cdef group_min_max(
18701870
Py_ssize_t min_count=-1,
18711871
bint is_datetimelike=False,
18721872
bint compute_max=True,
1873-
const uint8_t[:, ::1] mask=None,
1873+
const uint8_t[:, :] mask=None,
18741874
uint8_t[:, ::1] result_mask=None,
18751875
bint skipna=True,
18761876
):
@@ -1983,7 +1983,7 @@ def group_idxmin_idxmax(
19831983
const intp_t[::1] labels,
19841984
Py_ssize_t min_count=-1,
19851985
bint is_datetimelike=False,
1986-
const uint8_t[:, ::1] mask=None,
1986+
const uint8_t[:, :] mask=None,
19871987
str name="idxmin",
19881988
bint skipna=True,
19891989
uint8_t[:, ::1] result_mask=None,
@@ -2096,7 +2096,7 @@ def group_max(
20962096
const intp_t[::1] labels,
20972097
Py_ssize_t min_count=-1,
20982098
bint is_datetimelike=False,
2099-
const uint8_t[:, ::1] mask=None,
2099+
const uint8_t[:, :] mask=None,
21002100
uint8_t[:, ::1] result_mask=None,
21012101
bint skipna=True,
21022102
) -> None:
@@ -2124,7 +2124,7 @@ def group_min(
21242124
const intp_t[::1] labels,
21252125
Py_ssize_t min_count=-1,
21262126
bint is_datetimelike=False,
2127-
const uint8_t[:, ::1] mask=None,
2127+
const uint8_t[:, :] mask=None,
21282128
uint8_t[:, ::1] result_mask=None,
21292129
bint skipna=True,
21302130
) -> None:
@@ -2148,7 +2148,7 @@ def group_min(
21482148
cdef group_cummin_max(
21492149
numeric_t[:, ::1] out,
21502150
ndarray[numeric_t, ndim=2] values,
2151-
const uint8_t[:, ::1] mask,
2151+
const uint8_t[:, :] mask,
21522152
uint8_t[:, ::1] result_mask,
21532153
const intp_t[::1] labels,
21542154
int ngroups,
@@ -2264,7 +2264,7 @@ def group_cummin(
22642264
const intp_t[::1] labels,
22652265
int ngroups,
22662266
bint is_datetimelike,
2267-
const uint8_t[:, ::1] mask=None,
2267+
const uint8_t[:, :] mask=None,
22682268
uint8_t[:, ::1] result_mask=None,
22692269
bint skipna=True,
22702270
) -> None:
@@ -2290,7 +2290,7 @@ def group_cummax(
22902290
const intp_t[::1] labels,
22912291
int ngroups,
22922292
bint is_datetimelike,
2293-
const uint8_t[:, ::1] mask=None,
2293+
const uint8_t[:, :] mask=None,
22942294
uint8_t[:, ::1] result_mask=None,
22952295
bint skipna=True,
22962296
) -> None:

pandas/_libs/internals.pyi

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,3 +94,7 @@ class BlockValuesRefs:
9494
def add_reference(self, blk: Block) -> None: ...
9595
def add_index_reference(self, index: Index) -> None: ...
9696
def has_reference(self) -> bool: ...
97+
98+
class SetitemMixin:
99+
def __setitem__(self, key, value) -> None: ...
100+
def __delitem__(self, key) -> None: ...

0 commit comments

Comments
 (0)