Skip to content

Commit 7dc3ac1

Browse files
committed
Merge remote-tracking branch 'upstream/main' into bug_boolean_series_with_logical_indexer
2 parents c4f32d7 + dc1e367 commit 7dc3ac1

File tree

5 files changed

+99
-77
lines changed

5 files changed

+99
-77
lines changed

AUTHORS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ About the Copyright Holders
77
led by Wes McKinney. AQR released the source under this license in 2009.
88
* Copyright (c) 2011-2012, Lambda Foundry, Inc.
99

10-
Wes is now an employee of Lambda Foundry, and remains the pandas project
10+
Wes became an employee of Lambda Foundry, and remained the pandas project
1111
lead.
1212
* Copyright (c) 2011-2012, PyData Development Team
1313

1414
The PyData Development Team is the collection of developers of the PyData
15-
project. This includes all of the PyData sub-projects, including pandas. The
15+
project. This includes all of the PyData sub-projects, such as pandas. The
1616
core team that coordinates development on GitHub can be found here:
1717
https://github.com/pydata.
1818

@@ -23,11 +23,11 @@ Our Copyright Policy
2323

2424
PyData uses a shared copyright model. Each contributor maintains copyright
2525
over their contributions to PyData. However, it is important to note that
26-
these contributions are typically only changes to the repositories. Thus,
26+
these contributions are typically limited to changes to the repositories. Thus,
2727
the PyData source code, in its entirety, is not the copyright of any single
2828
person or institution. Instead, it is the collective copyright of the
2929
entire PyData Development Team. If individual contributors want to maintain
30-
a record of what changes/contributions they have specific copyright on,
30+
a record of the specific changes or contributions they hold copyright to,
3131
they should indicate their copyright in the commit message of the change
3232
when they commit the change to one of the PyData repositories.
3333

@@ -50,7 +50,7 @@ Other licenses can be found in the LICENSES directory.
5050
License
5151
=======
5252

53-
pandas is distributed under a 3-clause ("Simplified" or "New") BSD
53+
pandas is distributed under the 3-clause ("Simplified" or "New") BSD
5454
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
55-
BSD-compatible licenses, are included. Their licenses follow the pandas
55+
BSD-compatible licenses, are included. Their licenses are compatible with the pandas
5656
license.

doc/source/reference/indexing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ Conversion
9898
:toctree: api/
9999

100100
Index.astype
101+
Index.infer_objects
101102
Index.item
102103
Index.map
103104
Index.ravel

doc/source/whatsnew/v2.3.0.rst

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -31,39 +31,6 @@ Other enhancements
3131
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for :class:`StringDtype` columns (:issue:`60633`)
3232
- The :meth:`~Series.sum` reduction is now implemented for :class:`StringDtype` columns (:issue:`59853`)
3333

34-
.. ---------------------------------------------------------------------------
35-
.. _whatsnew_230.notable_bug_fixes:
36-
37-
Notable bug fixes
38-
~~~~~~~~~~~~~~~~~
39-
40-
These are bug fixes that might have notable behavior changes.
41-
42-
.. _whatsnew_230.notable_bug_fixes.string_comparisons:
43-
44-
Comparisons between different string dtypes
45-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46-
47-
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
48-
49-
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
50-
51-
in determining the result dtype when there are different string dtypes compared. Some examples:
52-
53-
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
54-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
55-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
56-
57-
.. _whatsnew_230.api_changes:
58-
59-
API changes
60-
~~~~~~~~~~~
61-
62-
- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
63-
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
64-
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
65-
Index (:issue:`60797`)
66-
6734
.. ---------------------------------------------------------------------------
6835
.. _whatsnew_230.deprecations:
6936

@@ -85,8 +52,6 @@ Numeric
8552

8653
Strings
8754
^^^^^^^
88-
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
89-
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
9055
- Bug in :meth:`Series.__pos__` and :meth:`DataFrame.__pos__` where an ``Exception`` was not raised for :class:`StringDtype` with ``storage="pyarrow"`` (:issue:`60710`)
9156
- Bug in :meth:`Series.rank` for :class:`StringDtype` with ``storage="pyarrow"`` that incorrectly returned integer results with ``method="average"`` and raised an error if it would truncate results (:issue:`59768`)
9257
- Bug in :meth:`Series.replace` with :class:`StringDtype` when replacing with a non-string value was not upcasting to ``object`` dtype (:issue:`60282`)

doc/source/whatsnew/v2.3.1.rst

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,57 @@ including other versions of pandas.
99
{{ header }}
1010

1111
.. ---------------------------------------------------------------------------
12-
.. _whatsnew_231.enhancements:
12+
.. _whatsnew_231.string_fixes:
13+
14+
Improvements and fixes for the StringDtype
15+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16+
17+
.. _whatsnew_231.string_fixes.string_comparisons:
18+
19+
Comparisons between different string dtypes
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
23+
24+
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
25+
26+
in determining the result dtype when there are different string dtypes compared. Some examples:
27+
28+
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
29+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
30+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
31+
32+
.. _whatsnew_231.string_fixes.ignore_empty:
33+
34+
Index set operations ignore empty RangeIndex and object dtype Index
35+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36+
37+
When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
38+
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
39+
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
40+
Index (:issue:`60797`).
41+
42+
This ensures that combining such empty Index with strings will infer the string dtype
43+
correctly, rather than defaulting to ``object`` dtype. For example:
44+
45+
.. code-block:: python
46+
47+
>>> pd.options.mode.infer_string = True
48+
>>> df = pd.DataFrame()
49+
>>> df.columns.dtype
50+
dtype('int64') # default RangeIndex for empty columns
51+
>>> df["a"] = [1, 2, 3]
52+
>>> df.columns.dtype
53+
<StringDtype(na_value=nan)> # new columns use string dtype instead of object dtype
54+
55+
.. _whatsnew_231.string_fixes.bugs:
56+
57+
Bug fixes
58+
^^^^^^^^^
59+
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
60+
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
61+
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
1362

14-
Enhancements
15-
~~~~~~~~~~~~
16-
-
1763

1864
.. _whatsnew_231.regressions:
1965

@@ -26,7 +72,7 @@ Fixed regressions
2672

2773
Bug fixes
2874
~~~~~~~~~
29-
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
75+
-
3076

3177
.. ---------------------------------------------------------------------------
3278
.. _whatsnew_231.other:

scripts/validate_docstrings.py

Lines changed: 41 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,10 @@
6969
}
7070
ALL_ERRORS = set(NUMPYDOC_ERROR_MSGS).union(set(ERROR_MSGS))
7171
duplicated_errors = set(NUMPYDOC_ERROR_MSGS).intersection(set(ERROR_MSGS))
72-
assert not duplicated_errors, (f"Errors {duplicated_errors} exist in both pandas "
73-
"and numpydoc, should they be removed from pandas?")
72+
assert not duplicated_errors, (
73+
f"Errors {duplicated_errors} exist in both pandas "
74+
"and numpydoc, should they be removed from pandas?"
75+
)
7476

7577

7678
def pandas_error(code, **kwargs):
@@ -245,7 +247,15 @@ def pandas_validate(func_name: str):
245247
# Some objects are instances, e.g. IndexSlice, which numpydoc can't validate
246248
doc_obj = get_doc_object(func_obj, doc=func_obj.__doc__)
247249
doc = PandasDocstring(func_name, doc_obj)
248-
result = validate(doc_obj)
250+
if func_obj.__doc__ is not None:
251+
result = validate(doc_obj)
252+
else:
253+
result = {
254+
"docstring": "",
255+
"file": None,
256+
"file_line": None,
257+
"errors": [("GL08", "The object does not have a docstring")],
258+
}
249259
mentioned_errs = doc.mentioned_private_classes
250260
if mentioned_errs:
251261
result["errors"].append(
@@ -257,7 +267,7 @@ def pandas_validate(func_name: str):
257267
pandas_error(
258268
"SA05",
259269
reference_name=rel_name,
260-
right_reference=rel_name[len("pandas."):],
270+
right_reference=rel_name[len("pandas.") :],
261271
)
262272
for rel_name in doc.see_also
263273
if rel_name.startswith("pandas.")
@@ -365,12 +375,13 @@ def print_validate_all_results(
365375
for func_name, res in result.items():
366376
error_messages = dict(res["errors"])
367377
actual_failures = set(error_messages)
368-
expected_failures = (ignore_errors.get(func_name, set())
369-
| ignore_errors.get(None, set()))
378+
expected_failures = ignore_errors.get(func_name, set()) | ignore_errors.get(
379+
None, set()
380+
)
370381
for err_code in actual_failures - expected_failures:
371382
sys.stdout.write(
372383
f'{prefix}{res["file"]}:{res["file_line"]}:'
373-
f'{err_code}:{func_name}:{error_messages[err_code]}\n'
384+
f"{err_code}:{func_name}:{error_messages[err_code]}\n"
374385
)
375386
exit_status += 1
376387
for err_code in ignore_errors.get(func_name, set()) - actual_failures:
@@ -384,8 +395,9 @@ def print_validate_all_results(
384395
return exit_status
385396

386397

387-
def print_validate_one_results(func_name: str,
388-
ignore_errors: dict[str, set[str]]) -> int:
398+
def print_validate_one_results(
399+
func_name: str, ignore_errors: dict[str, set[str]]
400+
) -> int:
389401
def header(title, width=80, char="#") -> str:
390402
full_line = char * width
391403
side_len = (width - len(title) - 2) // 2
@@ -396,8 +408,11 @@ def header(title, width=80, char="#") -> str:
396408

397409
result = pandas_validate(func_name)
398410

399-
result["errors"] = [(code, message) for code, message in result["errors"]
400-
if code not in ignore_errors.get(None, set())]
411+
result["errors"] = [
412+
(code, message)
413+
for code, message in result["errors"]
414+
if code not in ignore_errors.get(None, set())
415+
]
401416

402417
sys.stderr.write(header(f"Docstring ({func_name})"))
403418
sys.stderr.write(f"{result['docstring']}\n")
@@ -431,14 +446,16 @@ def _format_ignore_errors(raw_ignore_errors):
431446
raise ValueError(
432447
f"Object `{obj_name}` is present in more than one "
433448
"--ignore_errors argument. Please use it once and specify "
434-
"the errors separated by commas.")
449+
"the errors separated by commas."
450+
)
435451
ignore_errors[obj_name] = set(error_codes.split(","))
436452

437453
unknown_errors = ignore_errors[obj_name] - ALL_ERRORS
438454
if unknown_errors:
439455
raise ValueError(
440456
f"Object `{obj_name}` is ignoring errors {unknown_errors} "
441-
f"which are not known. Known errors are: {ALL_ERRORS}")
457+
f"which are not known. Known errors are: {ALL_ERRORS}"
458+
)
442459

443460
# global errors "PR02,ES01"
444461
else:
@@ -448,27 +465,19 @@ def _format_ignore_errors(raw_ignore_errors):
448465
if unknown_errors:
449466
raise ValueError(
450467
f"Unknown errors {unknown_errors} specified using --ignore_errors "
451-
"Known errors are: {ALL_ERRORS}")
468+
"Known errors are: {ALL_ERRORS}"
469+
)
452470

453471
return ignore_errors
454472

455473

456-
def main(
457-
func_name,
458-
output_format,
459-
prefix,
460-
ignore_deprecated,
461-
ignore_errors
462-
):
474+
def main(func_name, output_format, prefix, ignore_deprecated, ignore_errors):
463475
"""
464476
Main entry point. Call the validation for one or for all docstrings.
465477
"""
466478
if func_name is None:
467479
return print_validate_all_results(
468-
output_format,
469-
prefix,
470-
ignore_deprecated,
471-
ignore_errors
480+
output_format, prefix, ignore_deprecated, ignore_errors
472481
)
473482
else:
474483
return print_validate_one_results(func_name, ignore_errors)
@@ -524,10 +533,11 @@ def main(
524533
args = argparser.parse_args(sys.argv[1:])
525534

526535
sys.exit(
527-
main(args.function,
528-
args.format,
529-
args.prefix,
530-
args.ignore_deprecated,
531-
_format_ignore_errors(args.ignore_errors),
532-
)
536+
main(
537+
args.function,
538+
args.format,
539+
args.prefix,
540+
args.ignore_deprecated,
541+
_format_ignore_errors(args.ignore_errors),
542+
)
533543
)

0 commit comments

Comments
 (0)