@@ -14,10 +14,57 @@ including other versions of pandas.
14
14
Enhancements
15
15
~~~~~~~~~~~~
16
16
17
- .. _whatsnew_300.enhancements.enhancement1 :
17
+ .. _whatsnew_300.enhancements.string_dtype :
18
+
19
+ Dedicated string data type by default
20
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21
+
22
+ Historically, pandas represented string columns with NumPy ``object `` data type.
23
+ This representation has numerous problems: it is not specific to strings (any
24
+ Python object can be stored in an ``object ``-dtype array, not just strings) and
25
+ it is often not very efficient (both performance wise and for memory usage).
26
+
27
+ Starting with pandas 3.0, a dedicated string data type is enabled by default
28
+ (backed by PyArrow under the hood, if installed, otherwise falling back to
29
+ NumPy). This means that pandas will start inferring columns containing string
30
+ data as the new ``str `` data type when creating pandas objects, such as in
31
+ constructors or IO functions.
32
+
33
+ Old behavior:
34
+
35
+ .. code-block :: python
36
+
37
+ >> > ser = pd.Series([" a" , " b" ])
38
+ 0 a
39
+ 1 b
40
+ dtype: object
41
+
42
+ New behavior:
43
+
44
+ .. code-block :: python
45
+
46
+ >> > ser = pd.Series([" a" , " b" ])
47
+ 0 a
48
+ 1 b
49
+ dtype: str
50
+
51
+ The string data type that is used in these scenarios will mostly behave as NumPy
52
+ object would, including missing value semantics and general operations on these
53
+ columns.
54
+
55
+ The main characteristic of the new string data type:
56
+
57
+ - Inferred by default for string data (instead of object dtype)
58
+ - The ``str `` dtype can only hold strings (or missing values), in contrast to
59
+ ``object `` dtype. (setitem with non string fails)
60
+ - The missing value sentinel is always ``NaN `` (``np.nan ``) and follows the same
61
+ missing value semantics as the other default dtypes.
62
+
63
+ Those intentional changes can have breaking consequences, for example when checking
64
+ for the ``.dtype `` being object dtype or checking the exact missing value sentinel.
65
+
66
+ TODO add link to migration guide for more details
18
67
19
- Enhancement1
20
- ^^^^^^^^^^^^
21
68
22
69
.. _whatsnew_300.enhancements.enhancement2 :
23
70
0 commit comments