Skip to content

Commit e7fee01

Browse files
committed
Merge remote-tracking branch 'upstream/main' into read-csv-from-directory
2 parents f792a15 + 7cc093f commit e7fee01

File tree

10 files changed

+61
-35
lines changed

10 files changed

+61
-35
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@
1919
**pandas** is a Python package that provides fast, flexible, and expressive data
2020
structures designed to make working with "relational" or "labeled" data both
2121
easy and intuitive. It aims to be the fundamental high-level building block for
22-
doing practical, **real world** data analysis in Python. Additionally, it has
23-
the broader goal of becoming **the most powerful and flexible open source data
24-
analysis / manipulation tool available in any language**. It is already well on
22+
doing practical, **real-world** data analysis in Python. Additionally, it has
23+
the broader goal of becoming **the most powerful and flexible open-source data
24+
analysis/manipulation tool available in any language**. It is already well on
2525
its way towards this goal.
2626

2727
## Table of Contents
@@ -64,7 +64,7 @@ Here are just a few of the things that pandas does well:
6464
data sets
6565
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
6666
labels per tick)
67-
- Robust IO tools for loading data from [**flat files**][flat-files]
67+
- Robust I/O tools for loading data from [**flat files**][flat-files]
6868
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
6969
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
7070
- [**Time series**][timeseries]-specific functionality: date range
@@ -138,7 +138,7 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_in
138138

139139

140140
```sh
141-
python -m pip install -ve . --no-build-isolation -Ceditable-verbose=true
141+
python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
142142
```
143143

144144
See the full instructions for [installing from source](https://pandas.pydata.org/docs/dev/development/contributing_environment.html).
@@ -155,7 +155,7 @@ has been under active development since then.
155155

156156
## Getting Help
157157

158-
For usage questions, the best place to go to is [StackOverflow](https://stackoverflow.com/questions/tagged/pandas).
158+
For usage questions, the best place to go to is [Stack Overflow](https://stackoverflow.com/questions/tagged/pandas).
159159
Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
160160

161161
## Discussion and Development

doc/source/user_guide/migration-3-strings.rst

Lines changed: 45 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,14 @@ let pandas do the inference. But if you want to be specific, you can specify the
188188
This is actually compatible with pandas 2.x as well, since in pandas < 3,
189189
``dtype="str"`` was essentially treated as an alias for object dtype.
190190

191+
.. attention::
192+
193+
While using ``dtype="str"`` in constructors is compatible with pandas 2.x,
194+
specifying it as the dtype in :meth:`~Series.astype` runs into the issue
195+
of also stringifying missing values in pandas 2.x. See the section
196+
:ref:`string_migration_guide-astype_str` for more details.
197+
198+
191199
The missing value sentinel is now always NaN
192200
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193201

@@ -310,52 +318,69 @@ case.
310318
Notable bug fixes
311319
~~~~~~~~~~~~~~~~~
312320

321+
.. _string_migration_guide-astype_str:
322+
313323
``astype(str)`` preserving missing values
314324
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315325

316-
This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
326+
The stringifying of missing values is a long standing "bug" or misfeature, as
327+
discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
328+
introduces a significant behaviour change.
317329

318-
With pandas < 3, when using ``astype(str)`` (using the built-in :func:`str`, not
319-
``astype("str")``!), the operation would convert every element to a string,
320-
including the missing values:
330+
With pandas < 3, when using ``astype(str)`` or ``astype("str")``, the operation
331+
would convert every element to a string, including the missing values:
321332

322333
.. code-block:: python
323334
324335
# OLD behavior in pandas < 3
325-
>>> ser = pd.Series(["a", np.nan], dtype=object)
336+
>>> ser = pd.Series([1.5, np.nan])
326337
>>> ser
327-
0 a
338+
0 1.5
328339
1 NaN
329-
dtype: object
330-
>>> ser.astype(str)
331-
0 a
340+
dtype: float64
341+
>>> ser.astype("str")
342+
0 1.5
332343
1 nan
333344
dtype: object
334-
>>> ser.astype(str).to_numpy()
335-
array(['a', 'nan'], dtype=object)
345+
>>> ser.astype("str").to_numpy()
346+
array(['1.5', 'nan'], dtype=object)
336347
337348
Note how ``NaN`` (``np.nan``) was converted to the string ``"nan"``. This was
338349
not the intended behavior, and it was inconsistent with how other dtypes handled
339350
missing values.
340351

341-
With pandas 3, this behavior has been fixed, and now ``astype(str)`` is an alias
342-
for ``astype("str")``, i.e. casting to the new string dtype, which will preserve
343-
the missing values:
352+
With pandas 3, this behavior has been fixed, and now ``astype("str")`` will cast
353+
to the new string dtype, which preserves the missing values:
344354

345355
.. code-block:: python
346356
347357
# NEW behavior in pandas 3
348358
>>> pd.options.future.infer_string = True
349-
>>> ser = pd.Series(["a", np.nan], dtype=object)
350-
>>> ser.astype(str)
351-
0 a
359+
>>> ser = pd.Series([1.5, np.nan])
360+
>>> ser.astype("str")
361+
0 1.5
352362
1 NaN
353363
dtype: str
354-
>>> ser.astype(str).values
355-
array(['a', nan], dtype=object)
364+
>>> ser.astype("str").to_numpy()
365+
array(['1.5', nan], dtype=object)
356366
357367
If you want to preserve the old behaviour of converting every object to a
358-
string, you can use ``ser.map(str)`` instead.
368+
string, you can use ``ser.map(str)`` instead. If you want do such conversion
369+
while preserving the missing values in a way that works with both pandas 2.x and
370+
3.x, you can use ``ser.map(str, na_action="ignore")`` (for pandas 3.x only, you
371+
can do ``ser.astype("str")``).
372+
373+
If you want to convert to object or string dtype for pandas 2.x and 3.x,
374+
respectively, without needing to stringify each individual element, you will
375+
have to use a conditional check on the pandas version.
376+
For example, to convert a categorical Series with string categories to its
377+
dense non-categorical version with object or string dtype:
378+
379+
.. code-block:: python
380+
381+
>>> import pandas as pd
382+
>>> ser = pd.Series(["a", np.nan], dtype="category")
383+
>>> ser.astype(object if pd.__version__ < "3" else "str")
359384
360385
361386
``prod()`` raising for string data

pandas/_config/config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -693,8 +693,8 @@ def _get_registered_option(key: str):
693693

694694
def _translate_key(key: str) -> str:
695695
"""
696-
if key id deprecated and a replacement key defined, will return the
697-
replacement key, otherwise returns `key` as - is
696+
if `key` is deprecated and a replacement key defined, will return the
697+
replacement key, otherwise returns `key` as-is
698698
"""
699699
d = _get_deprecated_option(key)
700700
if d:

pandas/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -581,7 +581,7 @@ def render_git_describe(pieces):
581581
def render_git_describe_long(pieces):
582582
"""TAG-DISTANCE-gHEX[-dirty].
583583
584-
Like 'git describe --tags --dirty --always -long'.
584+
Like 'git describe --tags --dirty --always --long'.
585585
The distance/hash is unconditional.
586586
587587
Exceptions:

pandas/core/accessor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def _add_delegate_accessors(
8888
cls
8989
Class to add the methods/properties to.
9090
delegate
91-
Class to get methods/properties and doc-strings.
91+
Class to get methods/properties and docstrings.
9292
accessors : list of str
9393
List of accessors to add.
9494
typ : {'property', 'method'}
@@ -159,7 +159,7 @@ def delegate_names(
159159
Parameters
160160
----------
161161
delegate : object
162-
The class to get methods/properties & doc-strings.
162+
The class to get methods/properties & docstrings.
163163
accessors : Sequence[str]
164164
List of accessor to add.
165165
typ : {'property', 'method'}

pandas/core/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090

9191
class PandasObject(DirNamesMixin):
9292
"""
93-
Baseclass for various pandas objects.
93+
Base class for various pandas objects.
9494
"""
9595

9696
# results from calls to methods decorated with cache_readonly get added to _cache

pandas/core/generic.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10216,6 +10216,7 @@ def shift(
1021610216
suffix : str, optional
1021710217
If str and periods is an iterable, this is added after the column
1021810218
name and before the shift value for each shifted column name.
10219+
For `Series` this parameter is unused and defaults to `None`.
1021910220
1022010221
Returns
1022110222
-------

pandas/core/indexing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1926,7 +1926,7 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc") -> None:
19261926
labels = index.insert(len(index), key)
19271927

19281928
# We are expanding the Series/DataFrame values to match
1929-
# the length of thenew index `labels`. GH#40096 ensure
1929+
# the length of the new index `labels`. GH#40096 ensure
19301930
# this is valid even if the index has duplicates.
19311931
taker = np.arange(len(index) + 1, dtype=np.intp)
19321932
taker[-1] = -1

pandas/io/api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
Data IO api
2+
Data I/O API
33
"""
44

55
from pandas.io.clipboards import read_clipboard

pandas/io/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Common IO api utilities"""
1+
"""Common I/O API utilities"""
22

33
from __future__ import annotations
44

0 commit comments

Comments
 (0)