33{{ header }}
44
55**************************************
6- Introduction to User Defined Functions
6+ Introduction to User- Defined Functions
77**************************************
88
9- In pandas, User Defined Functions (UDFs) provide a way to extend the library’s
9+ In pandas, User- Defined Functions (UDFs) provide a way to extend the library’s
1010functionality by allowing users to apply custom computations to their data. While
1111pandas comes with a set of built-in functions for data manipulation, UDFs offer
1212flexibility when built-in methods are not sufficient. These functions can be
1313applied at different levels: element-wise, row-wise, column-wise, or group-wise,
1414depending on the method used.
1515
16- Note: User Defined Functions will be abbreviated to UDFs throughout this guide.
16+ .. .. note::
17+
18+ .. User-Defined Functions will be abbreviated to UDFs throughout this guide.
1719
18- Why Use UDFs ?
19- -------------
20+ Why Use User-Defined Functions ?
21+ -------------------------------
2022
2123Pandas is designed for high-performance data processing, but sometimes your specific
2224needs go beyond standard aggregation, transformation, or filtering. UDFs allow you to:
23- * Customize Computations: Implement logic tailored to your dataset, such as complex
25+
26+ * **Customize Computations **: Implement logic tailored to your dataset, such as complex
2427 transformations, domain-specific calculations, or conditional modifications.
25- * Improve Code Readability: Encapsulate logic into functions rather than writing long,
28+ * ** Improve Code Readability ** : Encapsulate logic into functions rather than writing long,
2629 complex expressions.
27- * Handle Complex Grouped Operations: Perform operations on grouped data that standard
30+ * ** Handle Complex Grouped Operations ** : Perform operations on grouped data that standard
2831 methods do not support.
29- * Extend pandas' Functionality: Apply external libraries or advanced calculations that
32+ * ** Extend pandas' Functionality ** : Apply external libraries or advanced calculations that
3033 are not natively available.
3134
3235
33- Where Can UDFs Be Used?
34- -----------------------
36+ What functions support User-Defined Functions
37+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3538
36- UDFs can be applied across various pandas methods that work with both Series and DataFrames:
39+ UDFs can be applied across various pandas methods that work with Series and DataFrames:
3740
3841* :meth: `DataFrame.apply ` - A flexible method that allows applying a function to Series,
3942 DataFrames, or groups of data.
@@ -48,4 +51,110 @@ UDFs can be applied across various pandas methods that work with both Series and
4851 Series in a clean, readable manner.
4952
5053Each of these methods can be used with both Series and DataFrame objects, providing versatile
51- ways to apply user-defined functions across different pandas data structures.
54+ ways to apply user-defined functions across different pandas data structures.
55+
56+
57+ :meth: `DataFrame.apply `
58+ -----------------------
59+
60+ The :meth: `DataFrame.apply ` allows applying a user-defined functions along either axis (rows or columns):
61+
62+ .. ipython :: python
63+
64+ import pandas as pd
65+
66+ # Sample DataFrame
67+ df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ]})
68+
69+ # User-Defined Function
70+ def add_one (x ):
71+ return x + 1
72+
73+ # Apply function
74+ df_transformed = df.apply(add_one)
75+ print (df_transformed)
76+
77+ # This works with lambda functions too
78+ df_lambda = df.apply(lambda x : x + 1 )
79+ print (df_lambda)
80+
81+
82+ :meth: `DataFrame.apply ` also accepts dictionaries of multiple user-defined functions:
83+
84+ .. ipython :: python
85+
86+ import pandas as pd
87+
88+ # Sample DataFrame
89+ df = pd.DataFrame({' A' : [1 , 2 , 3 ], ' B' : [1 , 2 , 3 ]})
90+
91+ # User-Defined Function
92+ def add_one (x ):
93+ return x + 1
94+
95+ def add_two (x ):
96+ return x + 2
97+
98+ # Apply function
99+ df_transformed = df.apply({" A" : add_one, " B" : add_two})
100+ print (df_transformed)
101+
102+ # This works with lambda functions too
103+ df_lambda = df.apply({" A" : lambda x : x + 1 , " B" : lambda x : x + 2 })
104+ print (df_lambda)
105+
106+ :meth: `DataFrame.apply ` works with Series objects as well:
107+
108+ .. ipython :: python
109+
110+ import pandas as pd
111+
112+ # Sample Series
113+ s = pd.Series([1 , 2 , 3 ])
114+
115+ # User-Defined Function
116+ def add_one (x ):
117+ return x + 1
118+
119+ # Apply function
120+ s_transformed = s.apply(add_one)
121+ print (df_transformed)
122+
123+ # This works with lambda functions too
124+ s_lambda = s.apply(lambda x : x + 1 )
125+ print (s_lambda)
126+
127+ :meth: `DataFrame.agg `
128+ ---------------------
129+
130+ When working with grouped data, user-defined functions can be used within :meth: `DataFrame.agg `:
131+
132+ .. ipython :: python
133+
134+ # Sample DataFrame
135+ df = pd.DataFrame({
136+ ' Category' : [' A' , ' A' , ' B' , ' B' ],
137+ ' Values' : [10 , 20 , 30 , 40 ]
138+ })
139+
140+ # Define a function for group operations
141+ def group_mean (group ):
142+ return group.mean()
143+
144+ # Apply UDF to each group
145+ grouped_result = df.groupby(' Category' )[' Values' ].agg(group_mean)
146+ print (grouped_result)
147+
148+ Performance Considerations
149+ --------------------------
150+
151+ While UDFs provide flexibility, their use is currently discouraged as they can introduce performance issues, especially when
152+ written in pure Python. To improve efficiency:
153+
154+ * Use **vectorized operations ** (`NumPy ` or `pandas ` built-ins) when possible.
155+ * Leverage **Cython or Numba ** to speed up computations.
156+ * Consider using **pandas' built-in methods ** instead of UDFs for common operations.
157+
158+ .. note ::
159+ If performance is critical, explore **pandas' vectorized functions ** before resorting
160+ to UDFs.
0 commit comments