Skip to content

Commit ec15ba1

Browse files
committed
Merge branch 'main' into daydst3
2 parents 1325c34 + eb489f2 commit ec15ba1

File tree

92 files changed

+903
-361
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+903
-361
lines changed

.github/workflows/unit-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ jobs:
7171
# It will be temporarily activated during tests with locale.setlocale
7272
extra_loc: "zh_CN"
7373
platform: ubuntu-24.04
74-
- name: "Past no infer strings"
74+
- name: "PANDAS_FUTURE_INFER_STRING=0"
7575
env_file: actions-312.yaml
7676
pandas_future_infer_string: "0"
7777
platform: ubuntu-24.04

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/[email protected].1
165+
uses: pypa/[email protected].3
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.12.2
22+
rev: v0.12.7
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -95,14 +95,14 @@ repos:
9595
- id: sphinx-lint
9696
args: ["--enable", "all", "--disable", "line-too-long"]
9797
- repo: https://github.com/pre-commit/mirrors-clang-format
98-
rev: v20.1.7
98+
rev: v20.1.8
9999
hooks:
100100
- id: clang-format
101101
files: ^pandas/_libs/src|^pandas/_libs/include
102102
args: [-i]
103103
types_or: [c, c++]
104104
- repo: https://github.com/trim21/pre-commit-mirror-meson
105-
rev: v1.8.2
105+
rev: v1.8.3
106106
hooks:
107107
- id: meson-fmt
108108
args: ['--inplace']

doc/source/development/contributing_documentation.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,11 @@ If you want to do a full clean build, do::
157157
python make.py clean
158158
python make.py html
159159

160+
.. tip::
161+
If ``python make.py html`` exits with an error status,
162+
try running the command ``python make.py html --num-jobs=1``
163+
to identify the cause of the error.
164+
160165
You can tell ``make.py`` to compile only a single section of the docs, greatly
161166
reducing the turn-around time for checking your changes.
162167

doc/source/user_guide/indexing.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1732,3 +1732,49 @@ Why does assignment fail when using chained indexing?
17321732
This means that chained indexing will never work.
17331733
See :ref:`this section <copy_on_write_chained_assignment>`
17341734
for more context.
1735+
1736+
.. _indexing.series_assignment:
1737+
1738+
Series Assignment and Index Alignment
1739+
-------------------------------------
1740+
1741+
When assigning a Series to a DataFrame column, pandas performs automatic alignment
1742+
based on index labels. This is a fundamental behavior that can be surprising to
1743+
new users who might expect positional assignment.
1744+
1745+
Key Points:
1746+
~~~~~~~~~~~
1747+
1748+
* Series values are matched to DataFrame rows by index label
1749+
* Position/order in the Series doesn't matter
1750+
* Missing index labels result in NaN values
1751+
* This behavior is consistent across df[col] = series and df.loc[:, col] = series
1752+
1753+
Examples:
1754+
.. ipython:: python
1755+
1756+
import pandas as pd
1757+
1758+
# Create a DataFrame
1759+
df = pd.DataFrame({'values': [1, 2, 3]}, index=['x', 'y', 'z'])
1760+
1761+
# Series with matching indices (different order)
1762+
s1 = pd.Series([10, 20, 30], index=['z', 'x', 'y'])
1763+
df['aligned'] = s1 # Aligns by index, not position
1764+
print(df)
1765+
1766+
# Series with partial index match
1767+
s2 = pd.Series([100, 200], index=['x', 'z'])
1768+
df['partial'] = s2 # Missing 'y' gets NaN
1769+
print(df)
1770+
1771+
# Series with non-matching indices
1772+
s3 = pd.Series([1000, 2000], index=['a', 'b'])
1773+
df['nomatch'] = s3 # All values become NaN
1774+
print(df)
1775+
1776+
1777+
#Avoiding Confusion:
1778+
#If you want positional assignment instead of index alignment:
1779+
# reset the Series index to match DataFrame index
1780+
df['s1_values'] = s1.reindex(df.index)

doc/source/whatsnew/v2.3.2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Bug fixes
2525
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2626
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2727
(:issue:`61889`)
28-
28+
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
2929

3030
.. ---------------------------------------------------------------------------
3131
.. _whatsnew_232.contributors:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ Other enhancements
8181
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
8282
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
8383
- :meth:`Series.map` now accepts an ``engine`` parameter to allow execution with a third-party execution engine (:issue:`61125`)
84+
- :meth:`Series.rank` and :meth:`DataFrame.rank` with numpy-nullable dtypes preserve ``NA`` values and return ``UInt64`` dtype where appropriate instead of casting ``NA`` to ``NaN`` with ``float64`` dtype (:issue:`62043`)
8485
- :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
8586
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
8687
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
@@ -89,12 +90,14 @@ Other enhancements
8990
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`)
9091
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
9192
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
93+
- Improve the resulting dtypes in :meth:`DataFrame.where` and :meth:`DataFrame.mask` with :class:`ExtensionDtype` ``other`` (:issue:`62038`)
9294
- Improved deprecation message for offset aliases (:issue:`60820`)
9395
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
9496
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
9597
- Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)
9698
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
9799
- Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
100+
-
98101

99102
.. ---------------------------------------------------------------------------
100103
.. _whatsnew_300.notable_bug_fixes:
@@ -539,7 +542,7 @@ Renamed the following offset aliases (:issue:`57986`):
539542

540543
Other Removals
541544
^^^^^^^^^^^^^^
542-
- :class:`.DataFrameGroupBy.idxmin`, :class:`.DataFrameGroupBy.idxmax`, :class:`.SeriesGroupBy.idxmin`, and :class:`.SeriesGroupBy.idxmax` will now raise a ``ValueError`` when used with ``skipna=False`` and an NA value is encountered (:issue:`10694`)
545+
- :class:`.DataFrameGroupBy.idxmin`, :class:`.DataFrameGroupBy.idxmax`, :class:`.SeriesGroupBy.idxmin`, and :class:`.SeriesGroupBy.idxmax` will now raise a ``ValueError`` when a group has all NA values, or when used with ``skipna=False`` and any NA value is encountered (:issue:`10694`, :issue:`57745`)
543546
- :func:`concat` no longer ignores empty objects when determining output dtypes (:issue:`39122`)
544547
- :func:`concat` with all-NA entries no longer ignores the dtype of those entries when determining the result dtype (:issue:`40893`)
545548
- :func:`read_excel`, :func:`read_json`, :func:`read_html`, and :func:`read_xml` no longer accept raw string or byte representation of the data. That type of data must be wrapped in a :py:class:`StringIO` or :py:class:`BytesIO` (:issue:`53767`)
@@ -722,6 +725,7 @@ Bug fixes
722725
Categorical
723726
^^^^^^^^^^^
724727
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
728+
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
725729
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
726730
- Bug in :meth:`Series.convert_dtypes` with ``dtype_backend="pyarrow"`` where empty :class:`CategoricalDtype` :class:`Series` raised an error or got converted to ``null[pyarrow]`` (:issue:`59934`)
727731
-
@@ -887,6 +891,7 @@ Groupby/resample/rolling
887891
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
888892
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` were not keeping the index name when the index had :class:`ArrowDtype` timestamp dtype (:issue:`61222`)
889893
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
894+
- Bug in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` that was returning numpy dtype values when input values are pyarrow dtype values, instead of returning pyarrow dtype values. (:issue:`53030`)
890895
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
891896
- Bug in :meth:`DataFrameGroupBy.agg` where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (:issue:`61503`)
892897
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)

pandas/_libs/groupby.pyx

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2048,9 +2048,8 @@ def group_idxmin_idxmax(
20482048
group_min_or_max = np.empty_like(out, dtype=values.dtype)
20492049
seen = np.zeros_like(out, dtype=np.uint8)
20502050

2051-
# When using transform, we need a valid value for take in the case
2052-
# a category is not observed; these values will be dropped
2053-
out[:] = 0
2051+
# Sentinel for no valid values.
2052+
out[:] = -1
20542053

20552054
with nogil(numeric_object_t is not object):
20562055
for i in range(N):

pandas/_libs/index.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -803,7 +803,7 @@ cdef class BaseMultiIndexCodesEngine:
803803
int_keys : 1-dimensional array of dtype uint64 or object
804804
Integers representing one combination each
805805
"""
806-
level_codes = list(target._recode_for_new_levels(self.levels))
806+
level_codes = list(target._recode_for_new_levels(self.levels, copy=True))
807807
for i, codes in enumerate(level_codes):
808808
if self.levels[i].hasnans:
809809
na_index = self.levels[i].isna().nonzero()[0][0]

pandas/conftest.py

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -176,25 +176,19 @@ def pytest_collection_modifyitems(items, config) -> None:
176176
ignore_doctest_warning(item, path, message)
177177

178178

179-
hypothesis_health_checks = [
180-
hypothesis.HealthCheck.too_slow,
181-
hypothesis.HealthCheck.differing_executors,
182-
]
183-
184-
# Hypothesis
179+
# Similar to "ci" config in
180+
# https://hypothesis.readthedocs.io/en/latest/reference/api.html#built-in-profiles
185181
hypothesis.settings.register_profile(
186-
"ci",
187-
# Hypothesis timing checks are tuned for scalars by default, so we bump
188-
# them from 200ms to 500ms per test case as the global default. If this
189-
# is too short for a specific test, (a) try to make it faster, and (b)
190-
# if it really is slow add `@settings(deadline=...)` with a working value,
191-
# or `deadline=None` to entirely disable timeouts for that test.
192-
# 2022-02-09: Changed deadline from 500 -> None. Deadline leads to
193-
# non-actionable, flaky CI failures (# GH 24641, 44969, 45118, 44969)
182+
"pandas_ci",
183+
database=None,
194184
deadline=None,
195-
suppress_health_check=tuple(hypothesis_health_checks),
185+
max_examples=15,
186+
suppress_health_check=(
187+
hypothesis.HealthCheck.too_slow,
188+
hypothesis.HealthCheck.differing_executors,
189+
),
196190
)
197-
hypothesis.settings.load_profile("ci")
191+
hypothesis.settings.load_profile("pandas_ci")
198192

199193
# Registering these strategies makes them globally available via st.from_type,
200194
# which is use for offsets in tests/tseries/offsets/test_offsets_properties.py

0 commit comments

Comments
 (0)