Skip to content

Commit 7a670d4

Browse files
committed
Updated documentation as per review
1 parent 1f778dc commit 7a670d4

File tree

1 file changed

+12
-7
lines changed

1 file changed

+12
-7
lines changed

doc/source/user_guide/text.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,17 @@ Working with text data
1111
Text data types
1212
---------------
1313

14-
There are two ways to store text data in pandas:
14+
There are three ways to store text data in pandas:
1515

1616
1. ``object`` -dtype NumPy array.
1717
2. :class:`StringDtype` extension type.
1818
3. ``str`` -dtype (default from pandas 3.0).
1919

20+
For historical context, see the
21+
`PDEP for string dtype <https://pandas.pydata.org/pdeps/0014-string-dtype.html>`_.
22+
2023
We recommend using the ``str`` dtype or :class:`StringDtype` to store text data.
24+
Avoid using the ``object`` dtype for text data.
2125

2226
Prior to pandas 1.0, ``object`` dtype was the only option. This was unfortunate
2327
for many reasons:
@@ -35,8 +39,9 @@ default for string data. You may still encounter ``object`` dtype for legacy dat
3539
pandas versions. Use ``.astype("str")`` to explicitly convert these to ``str`` dtype or specify ``dtype="str"``
3640
when creating new data structures.
3741

38-
Use the nullable :class:`StringDtype` (``"string"``) when handling NA values in your string data. It offers
39-
additional flexibility for missing values while maintaining compatibility with pandas' nullable types.
42+
Use the nullable :class:`StringDtype` (``"string"``) or ``str`` dtype when handling NA values in your string data.
43+
Note that ``StringDtype`` uses ``pd.NA`` for missing values, whereas ``str`` dtype uses ``np.NaN``. ``StringDtype``
44+
offers additional flexibility for missing values while maintaining compatibility with pandas' nullable types.
4045

4146
Currently, the performance of ``str`` dtype, ``object`` dtype arrays of strings, and
4247
:class:`arrays.StringArray` are about the same. We expect future enhancements
@@ -125,7 +130,7 @@ Behavior differences
125130
^^^^^^^^^^^^^^^^^^^^
126131

127132
These are places where the behavior of ``StringDtype`` or ``str`` objects differ from
128-
``object`` dtype:
133+
``object`` dtype.
129134

130135
1. For ``StringDtype`` and ``str``, :ref:`string accessor methods<api.series.str>`
131136
that return **numeric** output will always return a nullable integer dtype,
@@ -157,8 +162,8 @@ These are places where the behavior of ``StringDtype`` or ``str`` objects differ
157162
158163
2. Some string methods, like :meth:`Series.str.decode`, are not available
159164
on ``StringArray`` or ``str`` because they only hold strings, not bytes.
160-
3. In comparison operations, :class:`arrays.StringArray`, ``Series`` backed
161-
by a ``StringArray``, and ``str`` dtype will return an object with :class:`BooleanDtype`,
165+
3. In comparison operations, :class:`arrays.StringArray` and ``Series`` backed
166+
by a ``StringArray`` will return an object with :class:`BooleanDtype`,
162167
rather than a ``bool`` dtype object. Missing values in these types will propagate
163168
in comparison operations, rather than always comparing unequal like :attr:`numpy.nan`.
164169

@@ -431,7 +436,7 @@ Missing values on either side will result in missing values in the result as wel
431436
Concatenating a Series and something array-like into a Series
432437
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
433438

434-
The parameter ``others`` can also be two-dimensional. In this case, the number or rows must match the length of the calling ``Series`` (or ``Index``).
439+
The parameter ``others`` can also be two-dimensional. In this case, the number of rows must match the length of the calling ``Series`` (or ``Index``).
435440

436441
.. ipython:: python
437442

0 commit comments

Comments
 (0)