Merge remote-tracking branch 'upstream/main' into read-csv-from-directory

fangchenli · fangchenli · commit e7fee01ed090 · 2025-08-19T22:52:45.000-07:00
diff --git a/README.md b/README.md
@@ -19,9 +19,9 @@
 **pandas** is a Python package that provides fast, flexible, and expressive data
 structures designed to make working with "relational" or "labeled" data both
 easy and intuitive. It aims to be the fundamental high-level building block for
-doing practical, **real world** data analysis in Python. Additionally, it has
-the broader goal of becoming **the most powerful and flexible open source data
-analysis / manipulation tool available in any language**. It is already well on
+doing practical, **real-world** data analysis in Python. Additionally, it has
+the broader goal of becoming **the most powerful and flexible open-source data
+analysis/manipulation tool available in any language**. It is already well on
 its way towards this goal.
 
 ## Table of Contents
@@ -64,7 +64,7 @@ Here are just a few of the things that pandas does well:
     data sets
   - [**Hierarchical**][mi] labeling of axes (possible to have multiple
     labels per tick)
-  - Robust IO tools for loading data from [**flat files**][flat-files]
+  - Robust I/O tools for loading data from [**flat files**][flat-files]
     (CSV and delimited), [**Excel files**][excel], [**databases**][db],
     and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
   - [**Time series**][timeseries]-specific functionality: date range
@@ -138,7 +138,7 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_in
 
 
 ```sh
-python -m pip install -ve . --no-build-isolation -Ceditable-verbose=true
+python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
 ```
 
 See the full instructions for [installing from source](https://pandas.pydata.org/docs/dev/development/contributing_environment.html).
@@ -155,7 +155,7 @@ has been under active development since then.
 
 ## Getting Help
 
-For usage questions, the best place to go to is [StackOverflow](https://stackoverflow.com/questions/tagged/pandas).
+For usage questions, the best place to go to is [Stack Overflow](https://stackoverflow.com/questions/tagged/pandas).
 Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
 
 ## Discussion and Development
diff --git a/doc/source/user_guide/migration-3-strings.rst b/doc/source/user_guide/migration-3-strings.rst
@@ -188,6 +188,14 @@ let pandas do the inference. But if you want to be specific, you can specify the
 This is actually compatible with pandas 2.x as well, since in pandas < 3,
 ``dtype="str"`` was essentially treated as an alias for object dtype.
 
+.. attention::
+
+   While using ``dtype="str"`` in constructors is compatible with pandas 2.x,
+   specifying it as the dtype in :meth:`~Series.astype` runs into the issue
+   of also stringifying missing values in pandas 2.x. See the section
+   :ref:`string_migration_guide-astype_str` for more details.
+
+
 The missing value sentinel is now always NaN
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -310,52 +318,69 @@ case.
 Notable bug fixes
 ~~~~~~~~~~~~~~~~~
 
+.. _string_migration_guide-astype_str:
+
 ``astype(str)`` preserving missing values
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
+The stringifying of missing values is a long standing "bug" or misfeature, as
+discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
+introduces a significant behaviour change.
 
-With pandas < 3, when using ``astype(str)`` (using the built-in :func:`str`, not
-``astype("str")``!), the operation would convert every element to a string,
-including the missing values:
+With pandas < 3, when using ``astype(str)`` or ``astype("str")``, the operation
+would convert every element to a string, including the missing values:
 
 .. code-block:: python
 
    # OLD behavior in pandas < 3
-   >>> ser = pd.Series(["a", np.nan], dtype=object)
+   >>> ser = pd.Series([1.5, np.nan])
    >>> ser
-   0      a
+   0    1.5
    1    NaN
-   dtype: object
-   >>> ser.astype(str)
-   0      a
+   dtype: float64
+   >>> ser.astype("str")
+   0    1.5
    1    nan
    dtype: object
-   >>> ser.astype(str).to_numpy()
-   array(['a', 'nan'], dtype=object)
+   >>> ser.astype("str").to_numpy()
+   array(['1.5', 'nan'], dtype=object)
 
 Note how ``NaN`` (``np.nan``) was converted to the string ``"nan"``. This was
 not the intended behavior, and it was inconsistent with how other dtypes handled
 missing values.
 
-With pandas 3, this behavior has been fixed, and now ``astype(str)`` is an alias
-for ``astype("str")``, i.e. casting to the new string dtype, which will preserve
-the missing values:
+With pandas 3, this behavior has been fixed, and now ``astype("str")`` will cast
+to the new string dtype, which preserves the missing values:
 
 .. code-block:: python
 
    # NEW behavior in pandas 3
    >>> pd.options.future.infer_string = True
-   >>> ser = pd.Series(["a", np.nan], dtype=object)
-   >>> ser.astype(str)
-   0      a
+   >>> ser = pd.Series([1.5, np.nan])
+   >>> ser.astype("str")
+   0    1.5
    1    NaN
    dtype: str
-   >>> ser.astype(str).values
-   array(['a', nan], dtype=object)
+   >>> ser.astype("str").to_numpy()
+   array(['1.5', nan], dtype=object)
 
 If you want to preserve the old behaviour of converting every object to a
-string, you can use ``ser.map(str)`` instead.
+string, you can use ``ser.map(str)`` instead. If you want do such conversion
+while preserving the missing values in a way that works with both pandas 2.x and
+3.x, you can use ``ser.map(str, na_action="ignore")`` (for pandas 3.x only, you
+can do ``ser.astype("str")``).
+
+If you want to convert to object or string dtype for pandas 2.x and 3.x,
+respectively, without needing to stringify each individual element, you will
+have to use a conditional check on the pandas version.
+For example, to convert a categorical Series with string categories to its
+dense non-categorical version with object or string dtype:
+
+.. code-block:: python
+
+   >>> import pandas as pd
+   >>> ser = pd.Series(["a", np.nan], dtype="category")
+   >>> ser.astype(object if pd.__version__ < "3" else "str")
 
 
 ``prod()`` raising for string data
diff --git a/pandas/_config/config.py b/pandas/_config/config.py
@@ -693,8 +693,8 @@ def _get_registered_option(key: str):
 
 def _translate_key(key: str) -> str:
     """
-    if key id deprecated and a replacement key defined, will return the
-    replacement key, otherwise returns `key` as - is
+    if `key` is deprecated and a replacement key defined, will return the
+    replacement key, otherwise returns `key` as-is
     """
     d = _get_deprecated_option(key)
     if d:
diff --git a/pandas/_version.py b/pandas/_version.py
@@ -581,7 +581,7 @@ def render_git_describe(pieces):
 def render_git_describe_long(pieces):
     """TAG-DISTANCE-gHEX[-dirty].
 
-    Like 'git describe --tags --dirty --always -long'.
+    Like 'git describe --tags --dirty --always --long'.
     The distance/hash is unconditional.
 
     Exceptions:
diff --git a/pandas/core/accessor.py b/pandas/core/accessor.py
@@ -88,7 +88,7 @@ def _add_delegate_accessors(
         cls
             Class to add the methods/properties to.
         delegate
-            Class to get methods/properties and doc-strings.
+            Class to get methods/properties and docstrings.
         accessors : list of str
             List of accessors to add.
         typ : {'property', 'method'}
@@ -159,7 +159,7 @@ def delegate_names(
     Parameters
     ----------
     delegate : object
-        The class to get methods/properties & doc-strings.
+        The class to get methods/properties & docstrings.
     accessors : Sequence[str]
         List of accessor to add.
     typ : {'property', 'method'}
diff --git a/pandas/core/base.py b/pandas/core/base.py
@@ -90,7 +90,7 @@
 
 class PandasObject(DirNamesMixin):
     """
-    Baseclass for various pandas objects.
+    Base class for various pandas objects.
     """
 
     # results from calls to methods decorated with cache_readonly get added to _cache
diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -10216,6 +10216,7 @@ def shift(
         suffix : str, optional
             If str and periods is an iterable, this is added after the column
             name and before the shift value for each shifted column name.
+            For `Series` this parameter is unused and defaults to `None`.
 
         Returns
         -------
diff --git a/pandas/core/indexing.py b/pandas/core/indexing.py
@@ -1926,7 +1926,7 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc") -> None:
                     labels = index.insert(len(index), key)
 
                     # We are expanding the Series/DataFrame values to match
-                    #  the length of thenew index `labels`.  GH#40096 ensure
+                    #  the length of the new index `labels`.  GH#40096 ensure
                     #  this is valid even if the index has duplicates.
                     taker = np.arange(len(index) + 1, dtype=np.intp)
                     taker[-1] = -1
diff --git a/pandas/io/api.py b/pandas/io/api.py
@@ -1,5 +1,5 @@
 """
-Data IO api
+Data I/O API
 """
 
 from pandas.io.clipboards import read_clipboard
diff --git a/pandas/io/common.py b/pandas/io/common.py
@@ -1,4 +1,4 @@
-"""Common IO api utilities"""
+"""Common I/O API utilities"""
 
 from __future__ import annotations
 

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-"""Common IO api utilities"""`
	`1`	`+"""Common I/O API utilities"""`
`2`	`2`
`3`	`3`	`from __future__ import annotations`
`4`	`4`