Skip to content

Commit 30e6c8d

Browse files
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW) to 2.3 whatsnew notes
1 parent e5a1c10 commit 30e6c8d

File tree

1 file changed

+94
-0
lines changed

1 file changed

+94
-0
lines changed

doc/source/whatsnew/v2.3.0.rst

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,100 @@ including other versions of pandas.
1010

1111
.. ---------------------------------------------------------------------------
1212
13+
.. _whatsnew_220.upcoming_changes:
14+
15+
Upcoming changes in pandas 3.0
16+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
pandas 3.0 will bring two bigger changes to the default behavior of pandas.
19+
20+
Dedicated string data type by default
21+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22+
23+
Historically, pandas represented string columns with NumPy ``object`` data type.
24+
This representation has numerous problems: it is not specific to strings (any
25+
Python object can be stored in an ``object``-dtype array, not just strings) and
26+
it is often not very efficient (both performance wise and for memory usage).
27+
28+
Starting with the upcoming pandas 3.0 release, a dedicated string data type will
29+
be enabled by default (backed by PyArrow under the hood, if installed, otherwise
30+
falling back to NumPy). This means that pandas will start inferring columns
31+
containing string data as the new ``str`` data type when creating pandas
32+
objects, such as in constructors or IO functions.
33+
34+
Old behavior:
35+
36+
.. code-block:: python
37+
>>> ser = pd.Series(["a", "b"])
38+
0 a
39+
1 b
40+
dtype: object
41+
New behavior:
42+
43+
.. code-block:: python
44+
>>> ser = pd.Series(["a", "b"])
45+
0 a
46+
1 b
47+
dtype: str
48+
49+
The string data type that is used in these scenarios will mostly behave as NumPy
50+
object would, including missing value semantics and general operations on these
51+
columns.
52+
53+
However, the introduction of a new default dtype will also have some breaking
54+
consequences your code (for example when checking for the ``.dtype`` being
55+
object dtype). To allow testing it in advance of the pandas 3.0 release, this
56+
future dtype inference logic can be enabled in pandas 2.3 with:
57+
58+
.. code-block:: ipython
59+
60+
pd.options.future.infer_string = True
61+
62+
TODO add link to migration guide
63+
64+
Copy-on-Write
65+
^^^^^^^^^^^^^
66+
67+
The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There
68+
won't be an option to keep the current behavior enabled.
69+
70+
In summary, the new "copy-on-write" behaviour will bring changes in behavior in
71+
how pandas operates with respect to copies and views.
72+
73+
1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way,
74+
i.e. including accessing a DataFrame column as a Series) or any method returning a
75+
new DataFrame or Series, always *behaves as if* it were a copy in terms of user
76+
API.
77+
2. As a consequence, if you want to modify an object (DataFrame or Series), the only way
78+
to do this is to directly modify that object itself.
79+
80+
Because every single indexing step now behaves as a copy, this also means that
81+
"chained assignment" (updating a DataFrame with multiple setitem steps) will
82+
stop working. Because this now consistently never works, the
83+
``SettingWithCopyWarning`` will be removed.
84+
85+
The new behavioral semantics are explained in more detail in the
86+
:ref:`user guide about Copy-on-Write <copy_on_write>`.
87+
88+
The new behavior can be enabled since pandas 2.0 with the following option:
89+
90+
.. code-block:: ipython
91+
92+
pd.options.mode.copy_on_write = True
93+
94+
Some of the behaviour changes allow a clear deprecation, like the changes in
95+
chained assignment. Other changes are more subtle and thus, the warnings are
96+
hidden behind an option that can be enabled since pandas 2.2:
97+
98+
.. code-block:: ipython
99+
100+
pd.options.mode.copy_on_write = "warn"
101+
102+
This mode will warn in many different scenarios that aren't actually relevant to
103+
most queries. We recommend exploring this mode, but it is not necessary to get rid
104+
of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>`
105+
explains the upgrade process in more detail.
106+
13107
.. _whatsnew_230.enhancements:
14108

15109
Enhancements

0 commit comments

Comments
 (0)