Skip to content
Closed
29 changes: 28 additions & 1 deletion doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : 'foo' })
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2

Having specific :ref:`dtypes <basics.dtypes>`
Expand Down Expand Up @@ -635,6 +636,32 @@ the quarter end:
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
ts.head()

Categoricals
------------

Since version 0.15, pandas can include categorical data in a `DataFrame`. For full docs, see the
:ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>` .

.. ipython:: python

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

# convert the raw grades to a categorical
df["grade"] = pd.Categorical(df["raw_grade"])

# Alternative: df["grade"] = df["raw_grade"].astype("category")
df["grade"]

# Rename the levels
df["grade"].cat.levels = ["very good", "good", "very bad"]

# Reorder the levels and simultaneously add the missing levels
df["grade"].cat.reorder_levels(["very bad", "bad", "medium", "good", "very good"])
df["grade"]
df.sort("grade")
df.groupby("grade").size()



Plotting
--------
Expand Down
11 changes: 8 additions & 3 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -528,11 +528,17 @@ and has the following usable methods and properties (all available as
:toctree: generated/

Categorical
Categorical.from_codes
Categorical.levels
Categorical.ordered
Categorical.reorder_levels
Categorical.remove_unused_levels

The following methods are considered API when using ``Categorical`` directly:

.. autosummary::
:toctree: generated/

Categorical.from_codes
Categorical.min
Categorical.max
Categorical.mode
Expand All @@ -547,7 +553,7 @@ the Categorical back to a numpy array, so levels and order information is not pr
Categorical.__array__

To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
are also introduced.
are also introduced and available when ``Categorical`` is used directly.

.. autosummary::
:toctree: generated/
Expand All @@ -564,7 +570,6 @@ are also introduced.
Categorical.argsort
Categorical.fillna


Plotting
~~~~~~~~
.. currentmodule:: pandas
Expand Down
45 changes: 43 additions & 2 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ By using some special functions:
df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels)
df.head(10)

See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.

`Categoricals` have a specific ``category`` :ref:`dtype <basics.dtypes>`:

Expand Down Expand Up @@ -331,6 +332,45 @@ Operations

The following operations are possible with categorical data:

Comparing `Categoricals` with other objects is possible in two cases:
* comparing a `Categorical` to another `Categorical`, when `level` and `ordered` is the same or
* comparing a `Categorical` to a scalar.
All other comparisons will raise a TypeError.

.. ipython:: python

cat = pd.Series(pd.Categorical([1,2,3], levels=[3,2,1]))
cat_base = pd.Series(pd.Categorical([2,2,2], levels=[3,2,1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show the cats after they are created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cat_base2 = pd.Series(pd.Categorical([2,2,2]))

cat > cat_base

# This doesn't work because the levels are not the same
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

their is a way to do this in the docs (showing an exception); can also do a code block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a way to do that. Just letting the exception happen results in long stacktraces and I don't like codeblocks, where the exception message has to be manually inserted (and maintained). Maybe that would be a nice PR for the ipython directive....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's fine (I bet their is a way with :okexcept: though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the sphinx extension source and don't think there is a way without modifying it. `:okexcept:' basically only prevents sphinx to write the exception to stdout.

A :nostacktrace: option would be nice...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can create a small function and put in utils for this purpose (basically what u r doing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

with no_stacktrace():
   a < cat

cat > cat_base2
except TypeError as e:
print("TypeError: " + str(e))

cat > 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a comment above (eg comparison vs scalar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


.. note::

Comparisons with `Series`, `np.array` or a `Categorical` with different levels or ordering
will raise an `TypeError` because custom level ordering would result in two valid results:
one with taking in account the ordering and one without. If you want to compare a `Categorical`
with such a type, you need to be explicit and convert the `Categorical` to values:

.. ipython:: python

base = np.array([1,2,3])

try:
cat > base
except TypeError as e:
print("TypeError: " + str(e))

np.asarray(cat) > base

Getting the minimum and maximum, if the categorical is ordered:

.. ipython:: python
Expand Down Expand Up @@ -509,7 +549,8 @@ The same applies to ``df.append(df)``.
Getting Data In/Out
-------------------

Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently raise ``NotImplementedError``.
Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently
raise ``NotImplementedError``.

Writing to a CSV file will convert the data, effectively removing any information about the
`Categorical` (levels and ordering). So if you read back the CSV file you have to convert the
Expand Down Expand Up @@ -579,7 +620,7 @@ object and not as a low level `numpy` array dtype. This leads to some problems.
try:
np.dtype("category")
except TypeError as e:
print("TypeError: " + str(e))
print("TypeError: " + str(e))

dtype = pd.Categorical(["a"]).dtype
try:
Expand Down
7 changes: 7 additions & 0 deletions doc/source/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -503,3 +503,10 @@ handling of NaN:

pd.factorize(x, sort=True)
np.unique(x, return_inverse=True)[::-1]

.. note::
If you just want to handle one column as a categorical variable (like R's factor),
you can use ``df["cat_col"] = pd.Categorical(df["col"])`` or
``df["cat_col"] = df["col"].astype("category")``. For full docs on :class:`~pandas.Categorical`,
see the :ref:`Categorical introduction <categorical>` and the
:ref:`API documentation <api.categorical>`. This feature was introduced in version 0.15.
3 changes: 2 additions & 1 deletion doc/source/v0.15.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,8 @@ Categoricals in Series/DataFrame
methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
:issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`).

For full docs, see the :ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>`.
For full docs, see the :ref:`Categorical introduction <categorical>` and the
:ref:`API documentation <api.categorical>`.

.. ipython:: python

Expand Down
Loading