@@ -11,13 +11,13 @@ functionality by allowing users to apply custom computations to their data. Whil
1111pandas comes with a set of built-in functions for data manipulation, UDFs offer
1212flexibility when built-in methods are not sufficient. These functions can be
1313applied at different levels: element-wise, row-wise, column-wise, or group-wise,
14- depending on the method used.
14+ and change the data differently, depending on the method used.
1515
1616Why Use User-Defined Functions?
1717-------------------------------
1818
1919Pandas is designed for high-performance data processing, but sometimes your specific
20- needs go beyond standard aggregation, transformation, or filtering. UDFs allow you to:
20+ needs go beyond standard aggregation, transformation, or filtering. User-defined functions allow you to:
2121
2222* **Customize Computations **: Implement logic tailored to your dataset, such as complex
2323 transformations, domain-specific calculations, or conditional modifications.
@@ -32,7 +32,7 @@ needs go beyond standard aggregation, transformation, or filtering. UDFs allow y
3232What functions support User-Defined Functions
3333~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3434
35- User-Defined Functions can be applied across various pandas methods that work with Series and DataFrames :
35+ User-Defined Functions can be applied across various pandas methods:
3636
3737* :meth: `DataFrame.apply ` - A flexible method that allows applying a function to Series,
3838 DataFrames, or groups of data.
@@ -46,7 +46,7 @@ User-Defined Functions can be applied across various pandas methods that work wi
4646* :meth: `DataFrame.pipe ` - Allows chaining custom functions to process entire DataFrames or
4747 Series in a clean, readable manner.
4848
49- Each of these methods can be used with both Series and DataFrame objects, providing versatile
49+ All of these pandas methods can be used with both Series and DataFrame objects, providing versatile
5050ways to apply user-defined functions across different pandas data structures.
5151
5252
@@ -184,10 +184,13 @@ values being broadcasted to the original dimensions:
184184------------------------
185185
186186The :meth: `DataFrame.filter ` method is used to select subsets of the DataFrame’s
187- columns or rows and accepts user-defined functions. Specifically, these functions
188- return boolean values to filter columns or rows. It is useful when you want to
187+ columns or rows and accepts user-defined functions. It is useful when you want to
189188extract specific columns or rows that match particular conditions.
190189
190+ .. note ::
191+ :meth: `DataFrame.filter ` expects a user-defined function that returns a boolean
192+ value
193+
191194.. ipython :: python
192195 # Sample DataFrame
193196 df = pd.DataFrame({
@@ -204,17 +207,85 @@ extract specific columns or rows that match particular conditions.
204207 Unlike the methods discussed earlier, :meth: `DataFrame.filter ` does not accept
205208functions that do not return boolean values, such as `mean ` or `sum `.
206209
210+ :meth: `DataFrame.map `
211+ ---------------------
212+
213+ The :meth: `DataFrame.map ` method is used to apply a function element-wise to a pandas Series
214+ or Dataframe. It is particularly useful for substituting values or transforming data.
215+
216+ .. ipython :: python
217+ # Sample DataFrame
218+ df = pd.DataFrame({ ' A' : [' cat' , ' dog' , ' bird' ], ' B' : [' pig' , ' cow' , ' lamb' ] })
219+
220+ # Using map with a user-defined function
221+ def animal_to_length (animal ):
222+ return len (animal)
223+
224+ df_mapped = df.map(animal_to_length)
225+ print (df_mapped)
226+
227+ # This works with lambda functions too
228+ df_lambda = df.map(lambda x : x.upper())
229+ print (df_lambda)
230+
231+ :meth: `DataFrame.pipe `
232+ ----------------------
233+
234+ The :meth: `DataFrame.pipe ` method allows you to apply a function or a series of functions to a
235+ DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236+
237+ .. ipython :: python
238+ # Sample DataFrame
239+ df = pd.DataFrame({ ' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ] })
240+
241+ # User-defined functions for transformation
242+ def add_one (df ):
243+ return df + 1
244+
245+ def square (df ):
246+ return df ** 2
247+
248+ # Applying functions using pipe
249+ df_piped = df.pipe(add_one).pipe(square)
250+ print (df_piped)
251+
252+ The advantage of using :meth: `DataFrame.pipe ` is that it allows you to chain together functions
253+ without nested calls, promoting a cleaner and more readable code style.
254+
207255
208256Performance Considerations
209257--------------------------
210258
211- While UDFs provide flexibility, their use is currently discouraged as they can introduce performance issues, especially when
212- written in pure Python. To improve efficiency:
213-
214- * Use **vectorized operations ** (`NumPy ` or `pandas ` built-ins) when possible.
215- * Leverage **Cython or Numba ** to speed up computations.
216- * Consider using **pandas' built-in methods ** instead of UDFs for common operations.
259+ While UDFs provide flexibility, their use is currently discouraged as they can introduce
260+ performance issues, especially when written in pure Python. To improve efficiency,
261+ consider using built-in `NumPy ` or `pandas ` functions instead of user-defined functions
262+ for common operations.
217263
218264.. note ::
219- If performance is critical, explore **pandas' vectorized functions ** before resorting
220- to UDFs.
265+ If performance is critical, explore **vectorizated operations ** before resorting
266+ to user-defined functions.
267+
268+ Vectorized Operations
269+ ~~~~~~~~~~~~~~~~~~~~~
270+
271+ Below is an example of vectorized operations in pandas:
272+
273+ .. ipython :: python
274+ # Vectorized operation:
275+ df[" new_col" ] = 100 * (df[" one" ] / df[" two" ])
276+
277+ # User-defined function
278+ def calc_ratio (row ):
279+ return 100 * (row[" one" ] / row[" two" ])
280+
281+ df[" new_col2" ] = df.apply(calc_ratio, axis = 1 )
282+
283+ Measuring how long each operation takes:
284+
285+ .. ipython :: python
286+ Vectorized: 0.0043 secs
287+ User- defined function: 5.6435 secs
288+
289+ This happens because user-defined functions loop through each row and apply its function,
290+ while vectorized operations are applied to underlying `Numpy ` arrays, skipping inefficient
291+ Python code.
0 commit comments