@@ -11,13 +11,13 @@ functionality by allowing users to apply custom computations to their data. Whil
11
11
pandas comes with a set of built-in functions for data manipulation, UDFs offer
12
12
flexibility when built-in methods are not sufficient. These functions can be
13
13
applied at different levels: element-wise, row-wise, column-wise, or group-wise,
14
- depending on the method used.
14
+ and change the data differently, depending on the method used.
15
15
16
16
Why Use User-Defined Functions?
17
17
-------------------------------
18
18
19
19
Pandas is designed for high-performance data processing, but sometimes your specific
20
- needs go beyond standard aggregation, transformation, or filtering. UDFs allow you to:
20
+ needs go beyond standard aggregation, transformation, or filtering. User-defined functions allow you to:
21
21
22
22
* **Customize Computations **: Implement logic tailored to your dataset, such as complex
23
23
transformations, domain-specific calculations, or conditional modifications.
@@ -32,7 +32,7 @@ needs go beyond standard aggregation, transformation, or filtering. UDFs allow y
32
32
What functions support User-Defined Functions
33
33
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34
34
35
- User-Defined Functions can be applied across various pandas methods that work with Series and DataFrames :
35
+ User-Defined Functions can be applied across various pandas methods:
36
36
37
37
* :meth: `DataFrame.apply ` - A flexible method that allows applying a function to Series,
38
38
DataFrames, or groups of data.
@@ -46,7 +46,7 @@ User-Defined Functions can be applied across various pandas methods that work wi
46
46
* :meth: `DataFrame.pipe ` - Allows chaining custom functions to process entire DataFrames or
47
47
Series in a clean, readable manner.
48
48
49
- Each of these methods can be used with both Series and DataFrame objects, providing versatile
49
+ All of these pandas methods can be used with both Series and DataFrame objects, providing versatile
50
50
ways to apply user-defined functions across different pandas data structures.
51
51
52
52
@@ -184,10 +184,13 @@ values being broadcasted to the original dimensions:
184
184
------------------------
185
185
186
186
The :meth: `DataFrame.filter ` method is used to select subsets of the DataFrame’s
187
- columns or rows and accepts user-defined functions. Specifically, these functions
188
- return boolean values to filter columns or rows. It is useful when you want to
187
+ columns or rows and accepts user-defined functions. It is useful when you want to
189
188
extract specific columns or rows that match particular conditions.
190
189
190
+ .. note ::
191
+ :meth: `DataFrame.filter ` expects a user-defined function that returns a boolean
192
+ value
193
+
191
194
.. ipython :: python
192
195
# Sample DataFrame
193
196
df = pd.DataFrame({
@@ -204,17 +207,85 @@ extract specific columns or rows that match particular conditions.
204
207
Unlike the methods discussed earlier, :meth: `DataFrame.filter ` does not accept
205
208
functions that do not return boolean values, such as `mean ` or `sum `.
206
209
210
+ :meth: `DataFrame.map `
211
+ ---------------------
212
+
213
+ The :meth: `DataFrame.map ` method is used to apply a function element-wise to a pandas Series
214
+ or Dataframe. It is particularly useful for substituting values or transforming data.
215
+
216
+ .. ipython :: python
217
+ # Sample DataFrame
218
+ df = pd.DataFrame({ ' A' : [' cat' , ' dog' , ' bird' ], ' B' : [' pig' , ' cow' , ' lamb' ] })
219
+
220
+ # Using map with a user-defined function
221
+ def animal_to_length (animal ):
222
+ return len (animal)
223
+
224
+ df_mapped = df.map(animal_to_length)
225
+ print (df_mapped)
226
+
227
+ # This works with lambda functions too
228
+ df_lambda = df.map(lambda x : x.upper())
229
+ print (df_lambda)
230
+
231
+ :meth: `DataFrame.pipe `
232
+ ----------------------
233
+
234
+ The :meth: `DataFrame.pipe ` method allows you to apply a function or a series of functions to a
235
+ DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236
+
237
+ .. ipython :: python
238
+ # Sample DataFrame
239
+ df = pd.DataFrame({ ' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ] })
240
+
241
+ # User-defined functions for transformation
242
+ def add_one (df ):
243
+ return df + 1
244
+
245
+ def square (df ):
246
+ return df ** 2
247
+
248
+ # Applying functions using pipe
249
+ df_piped = df.pipe(add_one).pipe(square)
250
+ print (df_piped)
251
+
252
+ The advantage of using :meth: `DataFrame.pipe ` is that it allows you to chain together functions
253
+ without nested calls, promoting a cleaner and more readable code style.
254
+
207
255
208
256
Performance Considerations
209
257
--------------------------
210
258
211
- While UDFs provide flexibility, their use is currently discouraged as they can introduce performance issues, especially when
212
- written in pure Python. To improve efficiency:
213
-
214
- * Use **vectorized operations ** (`NumPy ` or `pandas ` built-ins) when possible.
215
- * Leverage **Cython or Numba ** to speed up computations.
216
- * Consider using **pandas' built-in methods ** instead of UDFs for common operations.
259
+ While UDFs provide flexibility, their use is currently discouraged as they can introduce
260
+ performance issues, especially when written in pure Python. To improve efficiency,
261
+ consider using built-in `NumPy ` or `pandas ` functions instead of user-defined functions
262
+ for common operations.
217
263
218
264
.. note ::
219
- If performance is critical, explore **pandas' vectorized functions ** before resorting
220
- to UDFs.
265
+ If performance is critical, explore **vectorizated operations ** before resorting
266
+ to user-defined functions.
267
+
268
+ Vectorized Operations
269
+ ~~~~~~~~~~~~~~~~~~~~~
270
+
271
+ Below is an example of vectorized operations in pandas:
272
+
273
+ .. ipython :: python
274
+ # Vectorized operation:
275
+ df[" new_col" ] = 100 * (df[" one" ] / df[" two" ])
276
+
277
+ # User-defined function
278
+ def calc_ratio (row ):
279
+ return 100 * (row[" one" ] / row[" two" ])
280
+
281
+ df[" new_col2" ] = df.apply(calc_ratio, axis = 1 )
282
+
283
+ Measuring how long each operation takes:
284
+
285
+ .. ipython :: python
286
+ Vectorized: 0.0043 secs
287
+ User- defined function: 5.6435 secs
288
+
289
+ This happens because user-defined functions loop through each row and apply its function,
290
+ while vectorized operations are applied to underlying `Numpy ` arrays, skipping inefficient
291
+ Python code.
0 commit comments