@@ -40,7 +40,7 @@ User-Defined Functions can be applied across various pandas methods:
40
40
aggregation functions.
41
41
* :meth: `DataFrame.transform ` - Applies a function to groups while preserving the shape of
42
42
the original data.
43
- * :meth: `DataFrame.filter ` - Filters groups based on a function returning a Boolean condition .
43
+ * :meth: `DataFrame.filter ` - Filters groups based on a list of Boolean conditions .
44
44
* :meth: `DataFrame.map ` - Applies an element-wise function to a Series, useful for
45
45
transforming individual values.
46
46
* :meth: `DataFrame.pipe ` - Allows chaining custom functions to process entire DataFrames or
@@ -56,6 +56,7 @@ ways to apply user-defined functions across different pandas data structures.
56
56
The :meth: `DataFrame.apply ` allows applying a user-defined functions along either axis (rows or columns):
57
57
58
58
.. ipython :: python
59
+
59
60
import pandas as pd
60
61
61
62
# Sample DataFrame
@@ -77,6 +78,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
77
78
:meth: `DataFrame.apply ` also accepts dictionaries of multiple user-defined functions:
78
79
79
80
.. ipython :: python
81
+
80
82
# Sample DataFrame
81
83
df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [1 , 2 , 3 ]})
82
84
@@ -98,6 +100,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
98
100
:meth: `DataFrame.apply ` works with Series objects as well:
99
101
100
102
.. ipython :: python
103
+
101
104
# Sample Series
102
105
s = pd.Series([1 , 2 , 3 ])
103
106
@@ -119,6 +122,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
119
122
The :meth: `DataFrame.agg ` allows aggregation with a user-defined function along either axis (rows or columns):
120
123
121
124
.. ipython :: python
125
+
122
126
# Sample DataFrame
123
127
df = pd.DataFrame({
124
128
' Category' : [' A' , ' A' , ' B' , ' B' ],
@@ -146,6 +150,7 @@ The :meth:`DataFrame.transform` allows transforms a Dataframe, Series or Grouped
146
150
while preserving the original shape of the object.
147
151
148
152
.. ipython :: python
153
+
149
154
# Sample DataFrame
150
155
df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ]})
151
156
@@ -165,6 +170,7 @@ Attempting to use common aggregation functions such as `mean` or `sum` will resu
165
170
values being broadcasted to the original dimensions:
166
171
167
172
.. ipython :: python
173
+
168
174
# Sample DataFrame
169
175
df = pd.DataFrame({
170
176
' Category' : [' A' , ' A' , ' B' , ' B' , ' B' ],
@@ -184,28 +190,29 @@ values being broadcasted to the original dimensions:
184
190
------------------------
185
191
186
192
The :meth: `DataFrame.filter ` method is used to select subsets of the DataFrame’s
187
- columns or rows and accepts user-defined functions . It is useful when you want to
188
- extract specific columns or rows that match particular conditions.
193
+ columns or row . It is useful when you want to extract specific columns or rows that
194
+ match particular conditions.
189
195
190
196
.. note ::
191
- :meth: `DataFrame.filter ` expects a user-defined function that returns a boolean
192
- value
197
+ :meth: `DataFrame.filter ` does not accept user-defined functions, but can accept
198
+ list comprehensions that have user-defined functions applied to them.
193
199
194
200
.. ipython :: python
201
+
195
202
# Sample DataFrame
196
203
df = pd.DataFrame({
197
- ' A ' : [1 , 2 , 3 ],
198
- ' B ' : [4 , 5 , 6 ],
204
+ ' AA ' : [1 , 2 , 3 ],
205
+ ' BB ' : [4 , 5 , 6 ],
199
206
' C' : [7 , 8 , 9 ],
200
207
' D' : [10 , 11 , 12 ]
201
208
})
202
209
203
- # Define a function that filters out columns where the name is longer than 1 character
204
- df_filtered_func = df.filter(items = lambda x : len (x) > 1 )
205
- print (df_filtered_func)
210
+ def is_long_name (column_name ):
211
+ return len (column_name) > 1
206
212
207
- Unlike the methods discussed earlier, :meth: `DataFrame.filter ` does not accept
208
- functions that do not return boolean values, such as `mean ` or `sum `.
213
+ # Define a function that filters out columns where the name is longer than 1 character
214
+ df_filtered = df[[col for col in df.columns if is_long_name(col)]]
215
+ print (df_filtered)
209
216
210
217
:meth: `DataFrame.map `
211
218
---------------------
@@ -214,19 +221,20 @@ The :meth:`DataFrame.map` method is used to apply a function element-wise to a p
214
221
or Dataframe. It is particularly useful for substituting values or transforming data.
215
222
216
223
.. ipython :: python
224
+
217
225
# Sample DataFrame
218
- df = pd.DataFrame({ ' A ' : [' cat' , ' dog' , ' bird' ], ' B ' : [ ' pig ' , ' cow ' , ' lamb ' ] } )
226
+ s = pd.Series( [' cat' , ' dog' , ' bird' ])
219
227
220
228
# Using map with a user-defined function
221
229
def animal_to_length (animal ):
222
230
return len (animal)
223
231
224
- df_mapped = df .map(animal_to_length)
225
- print (df_mapped )
232
+ s_mapped = s .map(animal_to_length)
233
+ print (s_mapped )
226
234
227
235
# This works with lambda functions too
228
- df_lambda = df .map(lambda x : x.upper())
229
- print (df_lambda )
236
+ s_lambda = s .map(lambda x : x.upper())
237
+ print (s_lambda )
230
238
231
239
:meth: `DataFrame.pipe `
232
240
----------------------
@@ -235,6 +243,7 @@ The :meth:`DataFrame.pipe` method allows you to apply a function or a series of
235
243
DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236
244
237
245
.. ipython :: python
246
+
238
247
# Sample DataFrame
239
248
df = pd.DataFrame({ ' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ] })
240
249
@@ -256,7 +265,7 @@ without nested calls, promoting a cleaner and more readable code style.
256
265
Performance Considerations
257
266
--------------------------
258
267
259
- While UDFs provide flexibility, their use is currently discouraged as they can introduce
268
+ While user-defined functions provide flexibility, their use is currently discouraged as they can introduce
260
269
performance issues, especially when written in pure Python. To improve efficiency,
261
270
consider using built-in `NumPy ` or `pandas ` functions instead of user-defined functions
262
271
for common operations.
@@ -270,22 +279,27 @@ Vectorized Operations
270
279
271
280
Below is an example of vectorized operations in pandas:
272
281
273
- .. ipython :: python
274
- # Vectorized operation:
275
- df[" new_col" ] = 100 * (df[" one" ] / df[" two" ])
282
+ .. code-block :: text
276
283
277
284
# User-defined function
278
285
def calc_ratio(row):
279
286
return 100 * (row["one"] / row["two"])
280
287
281
288
df["new_col2"] = df.apply(calc_ratio, axis=1)
282
289
290
+ # Vectorized Operation
291
+ df["new_col"] = 100 * (df["one"] / df["two"])
292
+
283
293
Measuring how long each operation takes:
284
294
285
- .. ipython :: python
295
+ .. code-block :: text
296
+
286
297
Vectorized: 0.0043 secs
287
298
User-defined function: 5.6435 secs
288
299
289
- This happens because user-defined functions loop through each row and apply its function,
290
- while vectorized operations are applied to underlying `Numpy ` arrays, skipping inefficient
291
- Python code.
300
+ Vectorized operations in pandas are significantly faster than using :meth: `DataFrame.apply `
301
+ with user-defined functions because they leverage highly optimized C functions
302
+ via NumPy to process entire arrays at once. This approach avoids the overhead of looping
303
+ through rows in Python and making separate function calls for each row, which is slow and
304
+ inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
305
+ optimizations, making vectorized operations the preferred choice whenever possible.
0 commit comments