Adding examples to all methods

datapythonista · datapythonista · commit e148685296a2 · 2025-05-21T18:46:23.000+02:00
diff --git a/doc/source/user_guide/user_defined_functions.rst b/doc/source/user_guide/user_defined_functions.rst
@@ -26,20 +26,6 @@ Here’s a simple example to illustrate a UDF applied to a Series:
     # Apply the function element-wise using .map
     s.map(add_one)
 
-You can also apply UDFs to an entire DataFrame. For example:
-
-.. ipython:: python
-
-    df = pd.DataFrame({"A": [1, 2, 3], "B": [10, 20, 30]})
-
-    # UDF that takes a row and returns the sum of columns A and B
-    def sum_row(row):
-        return row["A"] + row["B"]
-
-    # Apply the function row-wise (axis=1 means apply across columns per row)
-    df.apply(sum_row, axis=1)
-
-
 Why Not To Use User-Defined Functions
 -------------------------------------
 
@@ -87,25 +73,25 @@ Methods that support User-Defined Functions
 
 User-Defined Functions can be applied across various pandas methods:
 
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| Method                     | Function Input         | Function Output          | Description                                                                                                                                  |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| Method                        | Function Input         | Function Output          | Description                                                                                                                                  |
 +============================+========================+==========================+==============================================================================================================================================+
-| :meth:`map`                | Scalar                 | Scalar                   | Apply a function to each element                                                                                                             |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`apply` (axis=0)     | Column (Series)        | Column (Series)          | Apply a function to each column                                                                                                              |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`apply` (axis=1)     | Row (Series)           | Row (Series)             | Apply a function to each row                                                                                                                 |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`pipe`               | Series or DataFrame    | Series or DataFrame      | Chain functions together to apply to Series or Dataframe                                                                                     |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`filter`             | Series or DataFrame    | Boolean                  | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`agg`                | Series or DataFrame    | Scalar or Series         | Aggregate and summarizes values, e.g., sum or custom reducer                                                                                 |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`transform` (axis=0) | Column (Series)        | Column (Series)          | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data                                |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`transform` (axis=1) | Row (Series)           | Row (Series)             | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data                                |
-+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.map`                | Scalar                 | Scalar                   | Apply a function to each element                                                                                                             |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.apply` (axis=0)     | Column (Series)        | Column (Series)          | Apply a function to each column                                                                                                              |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.apply` (axis=1)     | Row (Series)           | Row (Series)             | Apply a function to each row                                                                                                                 |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.pipe`               | Series or DataFrame    | Series or DataFrame      | Chain functions together to apply to Series or Dataframe                                                                                     |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.filter`             | Series or DataFrame    | Boolean                  | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.agg`                | Series or DataFrame    | Scalar or Series         | Aggregate and summarizes values, e.g., sum or custom reducer                                                                                 |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.transform` (axis=0) | Column (Series)        | Column (Series)          | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data                                |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
+| :ref:`udf.transform` (axis=1) | Row (Series)           | Row (Series)             | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data                                |
++-------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
 
 When applying UDFs in pandas, it is essential to select the appropriate method based
 on your specific task. Each method has its strengths and is designed for different use
@@ -118,6 +104,8 @@ decisions, ensuring more efficient and maintainable code.
     and :ref:`ewm()<window>` for details.
 
 
+.. _udf.map:
+
 :meth:`Series.map` and :meth:`DataFrame.map`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -147,6 +135,8 @@ working with medium or large data.
 
 When to use: Use :meth:`map` for applying element-wise UDFs to DataFrames or Series.
 
+.. _udf.apply:
+
 :meth:`Series.apply` and :meth:`DataFrame.apply`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -213,84 +203,130 @@ about ``apply`` in groupby operations :ref:`groupby.apply`.
 When to use: :meth:`apply` is suitable when no alternative vectorized method or UDF method is available,
 but consider optimizing performance with vectorized operations wherever possible.
 
-:meth:`DataFrame.pipe`
-~~~~~~~~~~~~~~~~~~~~~~
+.. _udf.pipe:
 
-The :meth:`pipe` method is useful for chaining operations together into a clean and readable pipeline.
-It is a helpful tool for organizing complex data processing workflows.
+:meth:`Series.pipe` and :meth:`DataFrame.pipe`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
+The ``pipe`` method is similar to ``map`` and ``apply``, but the function receives the whole ``Series``
+or ``DataFrame`` it is called on.
+
+.. ipython:: python
+
+    temperature = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
 
-:meth:`DataFrame.filter`
-~~~~~~~~~~~~~~~~~~~~~~~~
+    def normalize(df):
+        return df / df.mean().mean()
 
-The :meth:`filter` method is used to select subsets of the DataFrame’s
-columns or row. It is useful when you want to extract specific columns or rows that
-match particular conditions.
+    temperature.pipe(normalize)
 
-When to use: Use :meth:`filter` when you want to use a UDF to create a subset of a DataFrame or Series
+This is equivalent to calling the ``normalize`` function with the ``DataFrame`` as the parameter.
 
-.. note::
-    :meth:`DataFrame.filter` does not accept UDFs, but can accept
-    list comprehensions that have UDFs applied to them.
+.. ipython:: python
+
+    normalize(temperature)
+
+The main advantage of using ``pipe`` is readability. It allows method chaining and clearer code when
+calling multiple functions.
 
 .. ipython:: python
 
-    # Sample DataFrame
-    df = pd.DataFrame({
-        'AA': [1, 2, 3],
-        'BB': [4, 5, 6],
-        'C': [7, 8, 9],
-        'D': [10, 11, 12]
+    temperature_celsius = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
     })
 
-    # Function that filters out columns where the name is longer than 1 character
-    def is_long_name(column_name):
-        return len(column_name) > 1
+    def multiply_by_9(value):
+        return value * 9
 
-    df_filtered = df.filter(items=[col for col in df.columns if is_long_name(col)])
-    print(df_filtered)
+    def divide_by_5(value):
+        return value / 5
 
-Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
-for example, by using list comprehensions.
+    def add_32(value):
+        return value + 32
 
-:meth:`DataFrame.agg`
-~~~~~~~~~~~~~~~~~~~~~
+    # Without `pipe`:
+    fahrenheit = add_32(divide_by_5(multiply_by_9(temperature_celsius)))
+
+    # With `pipe`:
+    fahrenheit = (temperature_celsius.pipe(multiply_by_9)
+                                     .pipe(divide_by_5)
+                                     .pipe(add_32))
+
+``pipe`` is also available for :meth:`SeriesGroupBy.pipe`, :meth:`DataFrameGroupBy.pipe` and
+:meth:`Resampler.pipe`. You can read more about ``pipe`` in groupby operations in :ref:`groupby.pipe`.
+
+When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
+
+.. _udf.filter:
+
+:meth:`Series.filter` and :meth:`DataFrame.filter`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``filter`` method is used to select a subset of rows that match certain criteria.
+:meth:`Series.filter` and :meth:`DataFrame.filter` do not support user defined functions,
+but :meth:`SeriesGroupBy.filter` and :meth:`DataFrameGroupBy.filter` do. You can read more
+about ``filter`` in groupby operations in :ref:`groupby.filter`.
+
+.. _udf.agg:
+
+:meth:`Series.agg` and :meth:`DataFrame.agg`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``agg`` method is used to aggregate a set of data points into a single one.
+The most common aggregation functions such as ``min``, ``max``, ``mean``, ``sum``, etc.
+are already implemented in pandas. ``agg`` allows to implement other custom aggregate
+functions.
+
+.. ipython:: python
+
+    temperature = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
+
+    def highest_jump(column):
+        return column.pct_change().max()
+
+    temperature.apply(highest_jump)
 
-If you need to aggregate data, :meth:`agg` is a better choice than apply because it is
-specifically designed for aggregation operations.
 
 When to use: Use :meth:`agg` for performing custom aggregations, where the operation returns
 a scalar value on each input.
 
-:meth:`DataFrame.transform`
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. _udf.transform:
 
-The :meth:`transform` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
-It is generally faster than apply because it can take advantage of pandas' internal optimizations.
+:meth:`Series.transform` and :meth:`DataFrame.transform`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
+The ``transform``` method is similar to an aggregation, with the difference that the result is broadcasted
+to the original data.
 
-.. code-block:: python
+.. ipython:: python
 
-    from sklearn.linear_model import LinearRegression
+    temperature = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31]},
+        index=pd.date_range("2000-01-01", "2000-01-03"))
 
-    df = pd.DataFrame({
-        'group': ['A', 'A', 'A', 'B', 'B', 'B'],
-        'x': [1, 2, 3, 1, 2, 3],
-        'y': [2, 4, 6, 1, 2, 1.5]
-    }).set_index("x")
+    def warm_up_all_days(column):
+        return pd.Series(column.max(), index=column.index)
 
-    # Function to fit a model to each group
-    def fit_model(group):
-        x = group.index.to_frame()
-        y = group
-        model = LinearRegression()
-        model.fit(x, y)
-        pred = model.predict(x)
-        return pred
+    temperature.transform(warm_up_all_days)
+
+In the example, the ``warm_up_all_days`` function computes the ``max`` like an aggregation, but instead
+of returning just the maximum value, it returns a ``DataFrame`` with the same shape as the original one
+with the values of each day replaced by the the maximum temperature of the city.
+
+``transform`` is also available for :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.transform` and
+:meth:`Resampler.transform`, where it's more common. You can read more about ``transform`` in groupby
+operations in :ref:`groupby.transform`.
 
-    result = df.groupby('group').transform(fit_model)
+When to use: When you need to perform an aggregation that will be returned in the original structure of
+the DataFrame.
 
 
 Performance