Skip to content

Commit e4be568

Browse files
committed
Add doc for counting categorical dtype
1 parent ebc60f2 commit e4be568

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

doc/source/user_guide/categorical.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,8 @@ expects a ``dtype``. For example :func:`pandas.read_csv`,
240240
array. In other words, ``dtype='category'`` is equivalent to
241241
``dtype=CategoricalDtype()``.
242242

243+
.. _categorical.equalitysemantics:
244+
243245
Equality semantics
244246
~~~~~~~~~~~~~~~~~~
245247

@@ -1178,3 +1180,17 @@ Use ``copy=True`` to prevent such a behaviour or simply don't reuse ``Categorica
11781180
This also happens in some cases when you supply a NumPy array instead of a ``Categorical``:
11791181
using an int array (e.g. ``np.array([1,2,3,4])``) will exhibit the same behavior, while using
11801182
a string array (e.g. ``np.array(["a","b","c","a"])``) will not.
1183+
1184+
Counting CategoricalDtype
1185+
~~~~~~~~~~~~~~~~~~~~~~~~~
1186+
1187+
As mentioned in :ref:`Equality Semantics <categorical.equalitysemantics>`, two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal
1188+
whenever they have the same categories and order. Therefore, when counting data types, the multiple instances of :class:`~pandas.api.types.CategoricalDtype` will be counted as one group if they have the same categories and order.
1189+
In the example below, even though ``a``, ``c``, and ``d`` all have data types of ``category``, they will not be counted as one group since they don't have the same categories.
1190+
1191+
.. ipython:: python
1192+
1193+
df = pd.DataFrame({'a': [1], 'b': ['2'], 'c': [3], 'd': [3]}).astype({'a': 'category', 'c': 'category', 'd': 'category'})
1194+
df
1195+
df.dtypes
1196+
df.dtypes.value_counts()

0 commit comments

Comments
 (0)