@@ -26,20 +26,6 @@ Here’s a simple example to illustrate a UDF applied to a Series:
26
26
# Apply the function element-wise using .map
27
27
s.map(add_one)
28
28
29
- You can also apply UDFs to an entire DataFrame. For example:
30
-
31
- .. ipython :: python
32
-
33
- df = pd.DataFrame({" A" : [1 , 2 , 3 ], " B" : [10 , 20 , 30 ]})
34
-
35
- # UDF that takes a row and returns the sum of columns A and B
36
- def sum_row (row ):
37
- return row[" A" ] + row[" B" ]
38
-
39
- # Apply the function row-wise (axis=1 means apply across columns per row)
40
- df.apply(sum_row, axis = 1 )
41
-
42
-
43
29
Why Not To Use User-Defined Functions
44
30
-------------------------------------
45
31
@@ -87,25 +73,25 @@ Methods that support User-Defined Functions
87
73
88
74
User-Defined Functions can be applied across various pandas methods:
89
75
90
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91
- | Method | Function Input | Function Output | Description |
76
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
77
+ | Method | Function Input | Function Output | Description |
92
78
+============================+========================+==========================+==============================================================================================================================================+
93
- | :meth: ` map ` | Scalar | Scalar | Apply a function to each element |
94
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
95
- | :meth: ` apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97
- | :meth: ` apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99
- | :meth: ` pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
100
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101
- | :meth: ` filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False `` |
102
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103
- | :meth: ` agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
104
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105
- | :meth: ` transform ` (axis=0) | Column (Series) | Column (Series) | Same as :meth: `apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
106
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107
- | :meth: ` transform ` (axis=1) | Row (Series) | Row (Series) | Same as :meth: `apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
108
- +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
79
+ | :ref: ` udf. map ` | Scalar | Scalar | Apply a function to each element |
80
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
81
+ | :ref: ` udf. apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
82
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
83
+ | :ref: ` udf. apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
84
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
85
+ | :ref: ` udf. pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
86
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
87
+ | :ref: ` udf. filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False `` |
88
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
89
+ | :ref: ` udf. agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
90
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91
+ | :ref: ` udf. transform ` (axis=0) | Column (Series) | Column (Series) | Same as :meth: `apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
92
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
93
+ | :ref: ` udf. transform ` (axis=1) | Row (Series) | Row (Series) | Same as :meth: `apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
94
+ +------------------------------- +------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
109
95
110
96
When applying UDFs in pandas, it is essential to select the appropriate method based
111
97
on your specific task. Each method has its strengths and is designed for different use
@@ -118,6 +104,8 @@ decisions, ensuring more efficient and maintainable code.
118
104
and :ref: `ewm()<window> ` for details.
119
105
120
106
107
+ .. _udf.map :
108
+
121
109
:meth: `Series.map ` and :meth: `DataFrame.map `
122
110
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123
111
@@ -147,6 +135,8 @@ working with medium or large data.
147
135
148
136
When to use: Use :meth: `map ` for applying element-wise UDFs to DataFrames or Series.
149
137
138
+ .. _udf.apply :
139
+
150
140
:meth: `Series.apply ` and :meth: `DataFrame.apply `
151
141
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152
142
@@ -213,84 +203,130 @@ about ``apply`` in groupby operations :ref:`groupby.apply`.
213
203
When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
214
204
but consider optimizing performance with vectorized operations wherever possible.
215
205
216
- :meth: `DataFrame.pipe `
217
- ~~~~~~~~~~~~~~~~~~~~~~
206
+ .. _udf.pipe :
218
207
219
- The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline.
220
- It is a helpful tool for organizing complex data processing workflows.
208
+ :meth: `Series. pipe ` and :meth: ` DataFrame.pipe `
209
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
210
222
- When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
211
+ The ``pipe `` method is similar to ``map `` and ``apply ``, but the function receives the whole ``Series ``
212
+ or ``DataFrame `` it is called on.
213
+
214
+ .. ipython :: python
215
+
216
+ temperature = pd.DataFrame({
217
+ " NYC" : [14 , 21 , 23 ],
218
+ " Los Angeles" : [22 , 28 , 31 ],
219
+ })
223
220
224
- :meth: ` DataFrame.filter `
225
- ~~~~~~~~~~~~~~~~~~~~~~~~
221
+ def normalize ( df ):
222
+ return df / df.mean().mean()
226
223
227
- The :meth: `filter ` method is used to select subsets of the DataFrame’s
228
- columns or row. It is useful when you want to extract specific columns or rows that
229
- match particular conditions.
224
+ temperature.pipe(normalize)
230
225
231
- When to use: Use :meth: ` filter ` when you want to use a UDF to create a subset of a DataFrame or Series
226
+ This is equivalent to calling the `` normalize `` function with the `` DataFrame `` as the parameter.
232
227
233
- .. note ::
234
- :meth: `DataFrame.filter ` does not accept UDFs, but can accept
235
- list comprehensions that have UDFs applied to them.
228
+ .. ipython :: python
229
+
230
+ normalize(temperature)
231
+
232
+ The main advantage of using ``pipe `` is readability. It allows method chaining and clearer code when
233
+ calling multiple functions.
236
234
237
235
.. ipython :: python
238
236
239
- # Sample DataFrame
240
- df = pd.DataFrame({
241
- ' AA' : [1 , 2 , 3 ],
242
- ' BB' : [4 , 5 , 6 ],
243
- ' C' : [7 , 8 , 9 ],
244
- ' D' : [10 , 11 , 12 ]
237
+ temperature_celsius = pd.DataFrame({
238
+ " NYC" : [14 , 21 , 23 ],
239
+ " Los Angeles" : [22 , 28 , 31 ],
245
240
})
246
241
247
- # Function that filters out columns where the name is longer than 1 character
248
- def is_long_name (column_name ):
249
- return len (column_name) > 1
242
+ def multiply_by_9 (value ):
243
+ return value * 9
250
244
251
- df_filtered = df.filter( items = [col for col in df.columns if is_long_name(col)])
252
- print (df_filtered)
245
+ def divide_by_5 ( value ):
246
+ return value / 5
253
247
254
- Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
255
- for example, by using list comprehensions.
248
+ def add_32 ( value ):
249
+ return value + 32
256
250
257
- :meth: `DataFrame.agg `
258
- ~~~~~~~~~~~~~~~~~~~~~
251
+ # Without `pipe`:
252
+ fahrenheit = add_32(divide_by_5(multiply_by_9(temperature_celsius)))
253
+
254
+ # With `pipe`:
255
+ fahrenheit = (temperature_celsius.pipe(multiply_by_9)
256
+ .pipe(divide_by_5)
257
+ .pipe(add_32))
258
+
259
+ ``pipe `` is also available for :meth: `SeriesGroupBy.pipe `, :meth: `DataFrameGroupBy.pipe ` and
260
+ :meth: `Resampler.pipe `. You can read more about ``pipe `` in groupby operations in :ref: `groupby.pipe `.
261
+
262
+ When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
263
+
264
+ .. _udf.filter :
265
+
266
+ :meth: `Series.filter ` and :meth: `DataFrame.filter `
267
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268
+
269
+ The ``filter `` method is used to select a subset of rows that match certain criteria.
270
+ :meth: `Series.filter ` and :meth: `DataFrame.filter ` do not support user defined functions,
271
+ but :meth: `SeriesGroupBy.filter ` and :meth: `DataFrameGroupBy.filter ` do. You can read more
272
+ about ``filter `` in groupby operations in :ref: `groupby.filter `.
273
+
274
+ .. _udf.agg :
275
+
276
+ :meth: `Series.agg ` and :meth: `DataFrame.agg `
277
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
278
+
279
+ The ``agg `` method is used to aggregate a set of data points into a single one.
280
+ The most common aggregation functions such as ``min ``, ``max ``, ``mean ``, ``sum ``, etc.
281
+ are already implemented in pandas. ``agg `` allows to implement other custom aggregate
282
+ functions.
283
+
284
+ .. ipython :: python
285
+
286
+ temperature = pd.DataFrame({
287
+ " NYC" : [14 , 21 , 23 ],
288
+ " Los Angeles" : [22 , 28 , 31 ],
289
+ })
290
+
291
+ def highest_jump (column ):
292
+ return column.pct_change().max()
293
+
294
+ temperature.apply(highest_jump)
259
295
260
- If you need to aggregate data, :meth: `agg ` is a better choice than apply because it is
261
- specifically designed for aggregation operations.
262
296
263
297
When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
264
298
a scalar value on each input.
265
299
266
- :meth: `DataFrame.transform `
267
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
300
+ .. _udf.transform :
268
301
269
- The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
270
- It is generally faster than apply because it can take advantage of pandas' internal optimizations.
302
+ :meth: `Series. transform ` and :meth: ` DataFrame.transform `
303
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271
304
272
- When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
305
+ The ``transform` `` method is similar to an aggregation, with the difference that the result is broadcasted
306
+ to the original data.
273
307
274
- .. code-block :: python
308
+ .. ipython :: python
275
309
276
- from sklearn.linear_model import LinearRegression
310
+ temperature = pd.DataFrame({
311
+ " NYC" : [14 , 21 , 23 ],
312
+ " Los Angeles" : [22 , 28 , 31 ]},
313
+ index = pd.date_range(" 2000-01-01" , " 2000-01-03" ))
277
314
278
- df = pd.DataFrame({
279
- ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
280
- ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
281
- ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
282
- }).set_index(" x" )
315
+ def warm_up_all_days (column ):
316
+ return pd.Series(column.max(), index = column.index)
283
317
284
- # Function to fit a model to each group
285
- def fit_model (group ):
286
- x = group.index.to_frame()
287
- y = group
288
- model = LinearRegression()
289
- model.fit(x, y)
290
- pred = model.predict(x)
291
- return pred
318
+ temperature.transform(warm_up_all_days)
319
+
320
+ In the example, the ``warm_up_all_days `` function computes the ``max `` like an aggregation, but instead
321
+ of returning just the maximum value, it returns a ``DataFrame `` with the same shape as the original one
322
+ with the values of each day replaced by the the maximum temperature of the city.
323
+
324
+ ``transform `` is also available for :meth: `SeriesGroupBy.transform `, :meth: `DataFrameGroupBy.transform ` and
325
+ :meth: `Resampler.transform `, where it's more common. You can read more about ``transform `` in groupby
326
+ operations in :ref: `groupby.transform `.
292
327
293
- result = df.groupby(' group' ).transform(fit_model)
328
+ When to use: When you need to perform an aggregation that will be returned in the original structure of
329
+ the DataFrame.
294
330
295
331
296
332
Performance
0 commit comments