@@ -26,20 +26,6 @@ Here’s a simple example to illustrate a UDF applied to a Series:
2626 # Apply the function element-wise using .map
2727 s.map(add_one)
2828
29- You can also apply UDFs to an entire DataFrame. For example:
30-
31- .. ipython :: python
32-
33- df = pd.DataFrame({" A" : [1 , 2 , 3 ], " B" : [10 , 20 , 30 ]})
34-
35- # UDF that takes a row and returns the sum of columns A and B
36- def sum_row (row ):
37- return row[" A" ] + row[" B" ]
38-
39- # Apply the function row-wise (axis=1 means apply across columns per row)
40- df.apply(sum_row, axis = 1 )
41-
42-
4329 Why Not To Use User-Defined Functions
4430-------------------------------------
4531
@@ -87,25 +73,25 @@ Methods that support User-Defined Functions
8773
8874User-Defined Functions can be applied across various pandas methods:
8975
90- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91- | Method | Function Input | Function Output | Description |
76+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
77+ | Method | Function Input | Function Output | Description |
9278+============================+========================+==========================+==============================================================================================================================================+
93- | :meth: ` map ` | Scalar | Scalar | Apply a function to each element |
94- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
95- | :meth: ` apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97- | :meth: ` apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99- | :meth: ` pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
100- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101- | :meth: ` filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False `` |
102- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103- | :meth: ` agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
104- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105- | :meth: ` transform ` (axis=0) | Column (Series) | Column (Series) | Same as :meth: `apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
106- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107- | :meth: ` transform ` (axis=1) | Row (Series) | Row (Series) | Same as :meth: `apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
108- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
79+ | :ref: ` udf. map ` | Scalar | Scalar | Apply a function to each element |
80+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
81+ | :ref: ` udf. apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
82+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
83+ | :ref: ` udf. apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
84+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
85+ | :ref: ` udf. pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
86+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
87+ | :ref: ` udf. filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False `` |
88+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
89+ | :ref: ` udf. agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
90+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91+ | :ref: ` udf. transform ` (axis=0) | Column (Series) | Column (Series) | Same as :meth: `apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
92+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
93+ | :ref: ` udf. transform ` (axis=1) | Row (Series) | Row (Series) | Same as :meth: `apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
94+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
10995
11096When applying UDFs in pandas, it is essential to select the appropriate method based
11197on your specific task. Each method has its strengths and is designed for different use
@@ -118,6 +104,8 @@ decisions, ensuring more efficient and maintainable code.
118104 and :ref: `ewm()<window> ` for details.
119105
120106
107+ .. _udf.map :
108+
121109:meth: `Series.map ` and :meth: `DataFrame.map `
122110~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123111
@@ -147,6 +135,8 @@ working with medium or large data.
147135
148136When to use: Use :meth: `map ` for applying element-wise UDFs to DataFrames or Series.
149137
138+ .. _udf.apply :
139+
150140:meth: `Series.apply ` and :meth: `DataFrame.apply `
151141~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152142
@@ -213,84 +203,130 @@ about ``apply`` in groupby operations :ref:`groupby.apply`.
213203When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
214204but consider optimizing performance with vectorized operations wherever possible.
215205
216- :meth: `DataFrame.pipe `
217- ~~~~~~~~~~~~~~~~~~~~~~
206+ .. _udf.pipe :
218207
219- The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline.
220- It is a helpful tool for organizing complex data processing workflows.
208+ :meth: `Series. pipe ` and :meth: ` DataFrame.pipe `
209+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221210
222- When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
211+ The ``pipe `` method is similar to ``map `` and ``apply ``, but the function receives the whole ``Series ``
212+ or ``DataFrame `` it is called on.
213+
214+ .. ipython :: python
215+
216+ temperature = pd.DataFrame({
217+ " NYC" : [14 , 21 , 23 ],
218+ " Los Angeles" : [22 , 28 , 31 ],
219+ })
223220
224- :meth: ` DataFrame.filter `
225- ~~~~~~~~~~~~~~~~~~~~~~~~
221+ def normalize ( df ):
222+ return df / df.mean().mean()
226223
227- The :meth: `filter ` method is used to select subsets of the DataFrame’s
228- columns or row. It is useful when you want to extract specific columns or rows that
229- match particular conditions.
224+ temperature.pipe(normalize)
230225
231- When to use: Use :meth: ` filter ` when you want to use a UDF to create a subset of a DataFrame or Series
226+ This is equivalent to calling the `` normalize `` function with the `` DataFrame `` as the parameter.
232227
233- .. note ::
234- :meth: `DataFrame.filter ` does not accept UDFs, but can accept
235- list comprehensions that have UDFs applied to them.
228+ .. ipython :: python
229+
230+ normalize(temperature)
231+
232+ The main advantage of using ``pipe `` is readability. It allows method chaining and clearer code when
233+ calling multiple functions.
236234
237235.. ipython :: python
238236
239- # Sample DataFrame
240- df = pd.DataFrame({
241- ' AA' : [1 , 2 , 3 ],
242- ' BB' : [4 , 5 , 6 ],
243- ' C' : [7 , 8 , 9 ],
244- ' D' : [10 , 11 , 12 ]
237+ temperature_celsius = pd.DataFrame({
238+ " NYC" : [14 , 21 , 23 ],
239+ " Los Angeles" : [22 , 28 , 31 ],
245240 })
246241
247- # Function that filters out columns where the name is longer than 1 character
248- def is_long_name (column_name ):
249- return len (column_name) > 1
242+ def multiply_by_9 (value ):
243+ return value * 9
250244
251- df_filtered = df.filter( items = [col for col in df.columns if is_long_name(col)])
252- print (df_filtered)
245+ def divide_by_5 ( value ):
246+ return value / 5
253247
254- Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
255- for example, by using list comprehensions.
248+ def add_32 ( value ):
249+ return value + 32
256250
257- :meth: `DataFrame.agg `
258- ~~~~~~~~~~~~~~~~~~~~~
251+ # Without `pipe`:
252+ fahrenheit = add_32(divide_by_5(multiply_by_9(temperature_celsius)))
253+
254+ # With `pipe`:
255+ fahrenheit = (temperature_celsius.pipe(multiply_by_9)
256+ .pipe(divide_by_5)
257+ .pipe(add_32))
258+
259+ ``pipe `` is also available for :meth: `SeriesGroupBy.pipe `, :meth: `DataFrameGroupBy.pipe ` and
260+ :meth: `Resampler.pipe `. You can read more about ``pipe `` in groupby operations in :ref: `groupby.pipe `.
261+
262+ When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
263+
264+ .. _udf.filter :
265+
266+ :meth: `Series.filter ` and :meth: `DataFrame.filter `
267+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268+
269+ The ``filter `` method is used to select a subset of rows that match certain criteria.
270+ :meth: `Series.filter ` and :meth: `DataFrame.filter ` do not support user defined functions,
271+ but :meth: `SeriesGroupBy.filter ` and :meth: `DataFrameGroupBy.filter ` do. You can read more
272+ about ``filter `` in groupby operations in :ref: `groupby.filter `.
273+
274+ .. _udf.agg :
275+
276+ :meth: `Series.agg ` and :meth: `DataFrame.agg `
277+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
278+
279+ The ``agg `` method is used to aggregate a set of data points into a single one.
280+ The most common aggregation functions such as ``min ``, ``max ``, ``mean ``, ``sum ``, etc.
281+ are already implemented in pandas. ``agg `` allows to implement other custom aggregate
282+ functions.
283+
284+ .. ipython :: python
285+
286+ temperature = pd.DataFrame({
287+ " NYC" : [14 , 21 , 23 ],
288+ " Los Angeles" : [22 , 28 , 31 ],
289+ })
290+
291+ def highest_jump (column ):
292+ return column.pct_change().max()
293+
294+ temperature.apply(highest_jump)
259295
260- If you need to aggregate data, :meth: `agg ` is a better choice than apply because it is
261- specifically designed for aggregation operations.
262296
263297 When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
264298a scalar value on each input.
265299
266- :meth: `DataFrame.transform `
267- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
300+ .. _udf.transform :
268301
269- The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
270- It is generally faster than apply because it can take advantage of pandas' internal optimizations.
302+ :meth: `Series. transform ` and :meth: ` DataFrame.transform `
303+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271304
272- When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
305+ The ``transform` `` method is similar to an aggregation, with the difference that the result is broadcasted
306+ to the original data.
273307
274- .. code-block :: python
308+ .. ipython :: python
275309
276- from sklearn.linear_model import LinearRegression
310+ temperature = pd.DataFrame({
311+ " NYC" : [14 , 21 , 23 ],
312+ " Los Angeles" : [22 , 28 , 31 ]},
313+ index = pd.date_range(" 2000-01-01" , " 2000-01-03" ))
277314
278- df = pd.DataFrame({
279- ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
280- ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
281- ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
282- }).set_index(" x" )
315+ def warm_up_all_days (column ):
316+ return pd.Series(column.max(), index = column.index)
283317
284- # Function to fit a model to each group
285- def fit_model (group ):
286- x = group.index.to_frame()
287- y = group
288- model = LinearRegression()
289- model.fit(x, y)
290- pred = model.predict(x)
291- return pred
318+ temperature.transform(warm_up_all_days)
319+
320+ In the example, the ``warm_up_all_days `` function computes the ``max `` like an aggregation, but instead
321+ of returning just the maximum value, it returns a ``DataFrame `` with the same shape as the original one
322+ with the values of each day replaced by the the maximum temperature of the city.
323+
324+ ``transform `` is also available for :meth: `SeriesGroupBy.transform `, :meth: `DataFrameGroupBy.transform ` and
325+ :meth: `Resampler.transform `, where it's more common. You can read more about ``transform `` in groupby
326+ operations in :ref: `groupby.transform `.
292327
293- result = df.groupby(' group' ).transform(fit_model)
328+ When to use: When you need to perform an aggregation that will be returned in the original structure of
329+ the DataFrame.
294330
295331
296332Performance
0 commit comments