Skip to content

Commit e148685

Browse files
Adding examples to all methods
1 parent 2e1b427 commit e148685

File tree

1 file changed

+120
-84
lines changed

1 file changed

+120
-84
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 120 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -26,20 +26,6 @@ Here’s a simple example to illustrate a UDF applied to a Series:
2626
# Apply the function element-wise using .map
2727
s.map(add_one)
2828
29-
You can also apply UDFs to an entire DataFrame. For example:
30-
31-
.. ipython:: python
32-
33-
df = pd.DataFrame({"A": [1, 2, 3], "B": [10, 20, 30]})
34-
35-
# UDF that takes a row and returns the sum of columns A and B
36-
def sum_row(row):
37-
return row["A"] + row["B"]
38-
39-
# Apply the function row-wise (axis=1 means apply across columns per row)
40-
df.apply(sum_row, axis=1)
41-
42-
4329
Why Not To Use User-Defined Functions
4430
-------------------------------------
4531

@@ -87,25 +73,25 @@ Methods that support User-Defined Functions
8773

8874
User-Defined Functions can be applied across various pandas methods:
8975

90-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91-
| Method | Function Input | Function Output | Description |
76+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
77+
| Method | Function Input | Function Output | Description |
9278
+============================+========================+==========================+==============================================================================================================================================+
93-
| :meth:`map` | Scalar | Scalar | Apply a function to each element |
94-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
95-
| :meth:`apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97-
| :meth:`apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99-
| :meth:`pipe` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
100-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101-
| :meth:`filter` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
102-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103-
| :meth:`agg` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
104-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105-
| :meth:`transform` (axis=0) | Column (Series) | Column (Series) | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data |
106-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107-
| :meth:`transform` (axis=1) | Row (Series) | Row (Series) | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data |
108-
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
79+
| :ref:`udf.map` | Scalar | Scalar | Apply a function to each element |
80+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
81+
| :ref:`udf.apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
82+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
83+
| :ref:`udf.apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
84+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
85+
| :ref:`udf.pipe` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
86+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
87+
| :ref:`udf.filter` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
88+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
89+
| :ref:`udf.agg` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
90+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91+
| :ref:`udf.transform` (axis=0) | Column (Series) | Column (Series) | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data |
92+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
93+
| :ref:`udf.transform` (axis=1) | Row (Series) | Row (Series) | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data |
94+
+-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
10995

11096
When applying UDFs in pandas, it is essential to select the appropriate method based
11197
on your specific task. Each method has its strengths and is designed for different use
@@ -118,6 +104,8 @@ decisions, ensuring more efficient and maintainable code.
118104
and :ref:`ewm()<window>` for details.
119105

120106

107+
.. _udf.map:
108+
121109
:meth:`Series.map` and :meth:`DataFrame.map`
122110
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123111

@@ -147,6 +135,8 @@ working with medium or large data.
147135

148136
When to use: Use :meth:`map` for applying element-wise UDFs to DataFrames or Series.
149137

138+
.. _udf.apply:
139+
150140
:meth:`Series.apply` and :meth:`DataFrame.apply`
151141
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152142

@@ -213,84 +203,130 @@ about ``apply`` in groupby operations :ref:`groupby.apply`.
213203
When to use: :meth:`apply` is suitable when no alternative vectorized method or UDF method is available,
214204
but consider optimizing performance with vectorized operations wherever possible.
215205

216-
:meth:`DataFrame.pipe`
217-
~~~~~~~~~~~~~~~~~~~~~~
206+
.. _udf.pipe:
218207

219-
The :meth:`pipe` method is useful for chaining operations together into a clean and readable pipeline.
220-
It is a helpful tool for organizing complex data processing workflows.
208+
:meth:`Series.pipe` and :meth:`DataFrame.pipe`
209+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221210

222-
When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
211+
The ``pipe`` method is similar to ``map`` and ``apply``, but the function receives the whole ``Series``
212+
or ``DataFrame`` it is called on.
213+
214+
.. ipython:: python
215+
216+
temperature = pd.DataFrame({
217+
"NYC": [14, 21, 23],
218+
"Los Angeles": [22, 28, 31],
219+
})
223220
224-
:meth:`DataFrame.filter`
225-
~~~~~~~~~~~~~~~~~~~~~~~~
221+
def normalize(df):
222+
return df / df.mean().mean()
226223
227-
The :meth:`filter` method is used to select subsets of the DataFrame’s
228-
columns or row. It is useful when you want to extract specific columns or rows that
229-
match particular conditions.
224+
temperature.pipe(normalize)
230225
231-
When to use: Use :meth:`filter` when you want to use a UDF to create a subset of a DataFrame or Series
226+
This is equivalent to calling the ``normalize`` function with the ``DataFrame`` as the parameter.
232227

233-
.. note::
234-
:meth:`DataFrame.filter` does not accept UDFs, but can accept
235-
list comprehensions that have UDFs applied to them.
228+
.. ipython:: python
229+
230+
normalize(temperature)
231+
232+
The main advantage of using ``pipe`` is readability. It allows method chaining and clearer code when
233+
calling multiple functions.
236234

237235
.. ipython:: python
238236
239-
# Sample DataFrame
240-
df = pd.DataFrame({
241-
'AA': [1, 2, 3],
242-
'BB': [4, 5, 6],
243-
'C': [7, 8, 9],
244-
'D': [10, 11, 12]
237+
temperature_celsius = pd.DataFrame({
238+
"NYC": [14, 21, 23],
239+
"Los Angeles": [22, 28, 31],
245240
})
246241
247-
# Function that filters out columns where the name is longer than 1 character
248-
def is_long_name(column_name):
249-
return len(column_name) > 1
242+
def multiply_by_9(value):
243+
return value * 9
250244
251-
df_filtered = df.filter(items=[col for col in df.columns if is_long_name(col)])
252-
print(df_filtered)
245+
def divide_by_5(value):
246+
return value / 5
253247
254-
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
255-
for example, by using list comprehensions.
248+
def add_32(value):
249+
return value + 32
256250
257-
:meth:`DataFrame.agg`
258-
~~~~~~~~~~~~~~~~~~~~~
251+
# Without `pipe`:
252+
fahrenheit = add_32(divide_by_5(multiply_by_9(temperature_celsius)))
253+
254+
# With `pipe`:
255+
fahrenheit = (temperature_celsius.pipe(multiply_by_9)
256+
.pipe(divide_by_5)
257+
.pipe(add_32))
258+
259+
``pipe`` is also available for :meth:`SeriesGroupBy.pipe`, :meth:`DataFrameGroupBy.pipe` and
260+
:meth:`Resampler.pipe`. You can read more about ``pipe`` in groupby operations in :ref:`groupby.pipe`.
261+
262+
When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
263+
264+
.. _udf.filter:
265+
266+
:meth:`Series.filter` and :meth:`DataFrame.filter`
267+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268+
269+
The ``filter`` method is used to select a subset of rows that match certain criteria.
270+
:meth:`Series.filter` and :meth:`DataFrame.filter` do not support user defined functions,
271+
but :meth:`SeriesGroupBy.filter` and :meth:`DataFrameGroupBy.filter` do. You can read more
272+
about ``filter`` in groupby operations in :ref:`groupby.filter`.
273+
274+
.. _udf.agg:
275+
276+
:meth:`Series.agg` and :meth:`DataFrame.agg`
277+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
278+
279+
The ``agg`` method is used to aggregate a set of data points into a single one.
280+
The most common aggregation functions such as ``min``, ``max``, ``mean``, ``sum``, etc.
281+
are already implemented in pandas. ``agg`` allows to implement other custom aggregate
282+
functions.
283+
284+
.. ipython:: python
285+
286+
temperature = pd.DataFrame({
287+
"NYC": [14, 21, 23],
288+
"Los Angeles": [22, 28, 31],
289+
})
290+
291+
def highest_jump(column):
292+
return column.pct_change().max()
293+
294+
temperature.apply(highest_jump)
259295
260-
If you need to aggregate data, :meth:`agg` is a better choice than apply because it is
261-
specifically designed for aggregation operations.
262296
263297
When to use: Use :meth:`agg` for performing custom aggregations, where the operation returns
264298
a scalar value on each input.
265299

266-
:meth:`DataFrame.transform`
267-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
300+
.. _udf.transform:
268301

269-
The :meth:`transform` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
270-
It is generally faster than apply because it can take advantage of pandas' internal optimizations.
302+
:meth:`Series.transform` and :meth:`DataFrame.transform`
303+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271304

272-
When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
305+
The ``transform``` method is similar to an aggregation, with the difference that the result is broadcasted
306+
to the original data.
273307

274-
.. code-block:: python
308+
.. ipython:: python
275309
276-
from sklearn.linear_model import LinearRegression
310+
temperature = pd.DataFrame({
311+
"NYC": [14, 21, 23],
312+
"Los Angeles": [22, 28, 31]},
313+
index=pd.date_range("2000-01-01", "2000-01-03"))
277314
278-
df = pd.DataFrame({
279-
'group': ['A', 'A', 'A', 'B', 'B', 'B'],
280-
'x': [1, 2, 3, 1, 2, 3],
281-
'y': [2, 4, 6, 1, 2, 1.5]
282-
}).set_index("x")
315+
def warm_up_all_days(column):
316+
return pd.Series(column.max(), index=column.index)
283317
284-
# Function to fit a model to each group
285-
def fit_model(group):
286-
x = group.index.to_frame()
287-
y = group
288-
model = LinearRegression()
289-
model.fit(x, y)
290-
pred = model.predict(x)
291-
return pred
318+
temperature.transform(warm_up_all_days)
319+
320+
In the example, the ``warm_up_all_days`` function computes the ``max`` like an aggregation, but instead
321+
of returning just the maximum value, it returns a ``DataFrame`` with the same shape as the original one
322+
with the values of each day replaced by the the maximum temperature of the city.
323+
324+
``transform`` is also available for :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.transform` and
325+
:meth:`Resampler.transform`, where it's more common. You can read more about ``transform`` in groupby
326+
operations in :ref:`groupby.transform`.
292327

293-
result = df.groupby('group').transform(fit_model)
328+
When to use: When you need to perform an aggregation that will be returned in the original structure of
329+
the DataFrame.
294330

295331

296332
Performance

0 commit comments

Comments
 (0)