@@ -11,13 +11,17 @@ Working with text data
1111Text data types
1212---------------
1313
14- There are two ways to store text data in pandas:
14+ There are three ways to store text data in pandas:
1515
16161. ``object `` -dtype NumPy array.
17172. :class: `StringDtype ` extension type.
18183. ``str `` -dtype (default from pandas 3.0).
1919
20+ For historical context, see the
21+ `PDEP for string dtype <https://pandas.pydata.org/pdeps/0014-string-dtype.html >`_.
22+
2023We recommend using the ``str `` dtype or :class: `StringDtype ` to store text data.
24+ Avoid using the ``object `` dtype for text data.
2125
2226Prior to pandas 1.0, ``object `` dtype was the only option. This was unfortunate
2327for many reasons:
@@ -35,8 +39,9 @@ default for string data. You may still encounter ``object`` dtype for legacy dat
3539pandas versions. Use ``.astype("str") `` to explicitly convert these to ``str `` dtype or specify ``dtype="str" ``
3640when creating new data structures.
3741
38- Use the nullable :class: `StringDtype ` (``"string" ``) when handling NA values in your string data. It offers
39- additional flexibility for missing values while maintaining compatibility with pandas' nullable types.
42+ Use the nullable :class: `StringDtype ` (``"string" ``) or ``str `` dtype when handling NA values in your string data.
43+ Note that ``StringDtype `` uses ``pd.NA `` for missing values, whereas ``str `` dtype uses ``np.NaN ``. ``StringDtype ``
44+ offers additional flexibility for missing values while maintaining compatibility with pandas' nullable types.
4045
4146Currently, the performance of ``str `` dtype, ``object `` dtype arrays of strings, and
4247:class: `arrays.StringArray ` are about the same. We expect future enhancements
@@ -125,7 +130,7 @@ Behavior differences
125130^^^^^^^^^^^^^^^^^^^^
126131
127132These are places where the behavior of ``StringDtype `` or ``str `` objects differ from
128- ``object `` dtype:
133+ ``object `` dtype.
129134
1301351. For ``StringDtype `` and ``str ``, :ref: `string accessor methods<api.series.str> `
131136 that return **numeric ** output will always return a nullable integer dtype,
@@ -157,8 +162,8 @@ These are places where the behavior of ``StringDtype`` or ``str`` objects differ
157162
158163 2. Some string methods, like :meth: `Series.str.decode `, are not available
159164 on ``StringArray `` or ``str `` because they only hold strings, not bytes.
160- 3. In comparison operations, :class: `arrays.StringArray `, ``Series `` backed
161- by a ``StringArray ``, and `` str `` dtype will return an object with :class: `BooleanDtype `,
165+ 3. In comparison operations, :class: `arrays.StringArray ` and ``Series `` backed
166+ by a ``StringArray `` will return an object with :class: `BooleanDtype `,
162167 rather than a ``bool `` dtype object. Missing values in these types will propagate
163168 in comparison operations, rather than always comparing unequal like :attr: `numpy.nan `.
164169
@@ -431,7 +436,7 @@ Missing values on either side will result in missing values in the result as wel
431436 Concatenating a Series and something array-like into a Series
432437^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
433438
434- The parameter ``others `` can also be two-dimensional. In this case, the number or rows must match the length of the calling ``Series `` (or ``Index ``).
439+ The parameter ``others `` can also be two-dimensional. In this case, the number of rows must match the length of the calling ``Series `` (or ``Index ``).
435440
436441.. ipython :: python
437442
0 commit comments