@@ -40,7 +40,7 @@ User-Defined Functions can be applied across various pandas methods:
4040 aggregation functions.
4141* :meth: `DataFrame.transform ` - Applies a function to groups while preserving the shape of
4242 the original data.
43- * :meth: `DataFrame.filter ` - Filters groups based on a function returning a Boolean condition .
43+ * :meth: `DataFrame.filter ` - Filters groups based on a list of Boolean conditions .
4444* :meth: `DataFrame.map ` - Applies an element-wise function to a Series, useful for
4545 transforming individual values.
4646* :meth: `DataFrame.pipe ` - Allows chaining custom functions to process entire DataFrames or
@@ -56,6 +56,7 @@ ways to apply user-defined functions across different pandas data structures.
5656The :meth: `DataFrame.apply ` allows applying a user-defined functions along either axis (rows or columns):
5757
5858.. ipython :: python
59+
5960 import pandas as pd
6061
6162 # Sample DataFrame
@@ -77,6 +78,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
7778:meth: `DataFrame.apply ` also accepts dictionaries of multiple user-defined functions:
7879
7980.. ipython :: python
81+
8082 # Sample DataFrame
8183 df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [1 , 2 , 3 ]})
8284
@@ -98,6 +100,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
98100:meth: `DataFrame.apply ` works with Series objects as well:
99101
100102.. ipython :: python
103+
101104 # Sample Series
102105 s = pd.Series([1 , 2 , 3 ])
103106
@@ -119,6 +122,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
119122The :meth: `DataFrame.agg ` allows aggregation with a user-defined function along either axis (rows or columns):
120123
121124.. ipython :: python
125+
122126 # Sample DataFrame
123127 df = pd.DataFrame({
124128 ' Category' : [' A' , ' A' , ' B' , ' B' ],
@@ -146,6 +150,7 @@ The :meth:`DataFrame.transform` allows transforms a Dataframe, Series or Grouped
146150while preserving the original shape of the object.
147151
148152.. ipython :: python
153+
149154 # Sample DataFrame
150155 df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ]})
151156
@@ -165,6 +170,7 @@ Attempting to use common aggregation functions such as `mean` or `sum` will resu
165170values being broadcasted to the original dimensions:
166171
167172.. ipython :: python
173+
168174 # Sample DataFrame
169175 df = pd.DataFrame({
170176 ' Category' : [' A' , ' A' , ' B' , ' B' , ' B' ],
@@ -184,28 +190,29 @@ values being broadcasted to the original dimensions:
184190------------------------
185191
186192The :meth: `DataFrame.filter ` method is used to select subsets of the DataFrame’s
187- columns or rows and accepts user-defined functions . It is useful when you want to
188- extract specific columns or rows that match particular conditions.
193+ columns or row . It is useful when you want to extract specific columns or rows that
194+ match particular conditions.
189195
190196.. note ::
191- :meth: `DataFrame.filter ` expects a user-defined function that returns a boolean
192- value
197+ :meth: `DataFrame.filter ` does not accept user-defined functions, but can accept
198+ list comprehensions that have user-defined functions applied to them.
193199
194200.. ipython :: python
201+
195202 # Sample DataFrame
196203 df = pd.DataFrame({
197- ' A ' : [1 , 2 , 3 ],
198- ' B ' : [4 , 5 , 6 ],
204+ ' AA ' : [1 , 2 , 3 ],
205+ ' BB ' : [4 , 5 , 6 ],
199206 ' C' : [7 , 8 , 9 ],
200207 ' D' : [10 , 11 , 12 ]
201208 })
202209
203- # Define a function that filters out columns where the name is longer than 1 character
204- df_filtered_func = df.filter(items = lambda x : len (x) > 1 )
205- print (df_filtered_func)
210+ def is_long_name (column_name ):
211+ return len (column_name) > 1
206212
207- Unlike the methods discussed earlier, :meth: `DataFrame.filter ` does not accept
208- functions that do not return boolean values, such as `mean ` or `sum `.
213+ # Define a function that filters out columns where the name is longer than 1 character
214+ df_filtered = df[[col for col in df.columns if is_long_name(col)]]
215+ print (df_filtered)
209216
210217:meth: `DataFrame.map `
211218---------------------
@@ -214,19 +221,20 @@ The :meth:`DataFrame.map` method is used to apply a function element-wise to a p
214221or Dataframe. It is particularly useful for substituting values or transforming data.
215222
216223.. ipython :: python
224+
217225 # Sample DataFrame
218- df = pd.DataFrame({ ' A ' : [' cat' , ' dog' , ' bird' ], ' B ' : [ ' pig ' , ' cow ' , ' lamb ' ] } )
226+ s = pd.Series( [' cat' , ' dog' , ' bird' ])
219227
220228 # Using map with a user-defined function
221229 def animal_to_length (animal ):
222230 return len (animal)
223231
224- df_mapped = df .map(animal_to_length)
225- print (df_mapped )
232+ s_mapped = s .map(animal_to_length)
233+ print (s_mapped )
226234
227235 # This works with lambda functions too
228- df_lambda = df .map(lambda x : x.upper())
229- print (df_lambda )
236+ s_lambda = s .map(lambda x : x.upper())
237+ print (s_lambda )
230238
231239:meth: `DataFrame.pipe `
232240----------------------
@@ -235,6 +243,7 @@ The :meth:`DataFrame.pipe` method allows you to apply a function or a series of
235243DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236244
237245.. ipython :: python
246+
238247 # Sample DataFrame
239248 df = pd.DataFrame({ ' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ] })
240249
@@ -256,7 +265,7 @@ without nested calls, promoting a cleaner and more readable code style.
256265Performance Considerations
257266--------------------------
258267
259- While UDFs provide flexibility, their use is currently discouraged as they can introduce
268+ While user-defined functions provide flexibility, their use is currently discouraged as they can introduce
260269performance issues, especially when written in pure Python. To improve efficiency,
261270consider using built-in `NumPy ` or `pandas ` functions instead of user-defined functions
262271for common operations.
@@ -270,22 +279,27 @@ Vectorized Operations
270279
271280Below is an example of vectorized operations in pandas:
272281
273- .. ipython :: python
274- # Vectorized operation:
275- df[" new_col" ] = 100 * (df[" one" ] / df[" two" ])
282+ .. code-block :: text
276283
277284 # User-defined function
278285 def calc_ratio(row):
279286 return 100 * (row["one"] / row["two"])
280287
281288 df["new_col2"] = df.apply(calc_ratio, axis=1)
282289
290+ # Vectorized Operation
291+ df["new_col"] = 100 * (df["one"] / df["two"])
292+
283293 Measuring how long each operation takes:
284294
285- .. ipython :: python
295+ .. code-block :: text
296+
286297 Vectorized: 0.0043 secs
287298 User-defined function: 5.6435 secs
288299
289- This happens because user-defined functions loop through each row and apply its function,
290- while vectorized operations are applied to underlying `Numpy ` arrays, skipping inefficient
291- Python code.
300+ Vectorized operations in pandas are significantly faster than using :meth: `DataFrame.apply `
301+ with user-defined functions because they leverage highly optimized C functions
302+ via NumPy to process entire arrays at once. This approach avoids the overhead of looping
303+ through rows in Python and making separate function calls for each row, which is slow and
304+ inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
305+ optimizations, making vectorized operations the preferred choice whenever possible.
0 commit comments