Skip to content

Commit bf984ca

Browse files
committed
added apply method
1 parent 3f94137 commit bf984ca

File tree

1 file changed

+122
-13
lines changed

1 file changed

+122
-13
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 122 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,40 @@
33
{{ header }}
44

55
**************************************
6-
Introduction to User Defined Functions
6+
Introduction to User-Defined Functions
77
**************************************
88

9-
In pandas, User Defined Functions (UDFs) provide a way to extend the library’s
9+
In pandas, User-Defined Functions (UDFs) provide a way to extend the library’s
1010
functionality by allowing users to apply custom computations to their data. While
1111
pandas comes with a set of built-in functions for data manipulation, UDFs offer
1212
flexibility when built-in methods are not sufficient. These functions can be
1313
applied at different levels: element-wise, row-wise, column-wise, or group-wise,
1414
depending on the method used.
1515

16-
Note: User Defined Functions will be abbreviated to UDFs throughout this guide.
16+
.. .. note::
17+
18+
.. User-Defined Functions will be abbreviated to UDFs throughout this guide.
1719
18-
Why Use UDFs?
19-
-------------
20+
Why Use User-Defined Functions?
21+
-------------------------------
2022

2123
Pandas is designed for high-performance data processing, but sometimes your specific
2224
needs go beyond standard aggregation, transformation, or filtering. UDFs allow you to:
23-
* Customize Computations: Implement logic tailored to your dataset, such as complex
25+
26+
* **Customize Computations**: Implement logic tailored to your dataset, such as complex
2427
transformations, domain-specific calculations, or conditional modifications.
25-
* Improve Code Readability: Encapsulate logic into functions rather than writing long,
28+
* **Improve Code Readability**: Encapsulate logic into functions rather than writing long,
2629
complex expressions.
27-
* Handle Complex Grouped Operations: Perform operations on grouped data that standard
30+
* **Handle Complex Grouped Operations**: Perform operations on grouped data that standard
2831
methods do not support.
29-
* Extend pandas' Functionality: Apply external libraries or advanced calculations that
32+
* **Extend pandas' Functionality**: Apply external libraries or advanced calculations that
3033
are not natively available.
3134

3235

33-
Where Can UDFs Be Used?
34-
-----------------------
36+
What functions support User-Defined Functions
37+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3538

36-
UDFs can be applied across various pandas methods that work with both Series and DataFrames:
39+
UDFs can be applied across various pandas methods that work with Series and DataFrames:
3740

3841
* :meth:`DataFrame.apply` - A flexible method that allows applying a function to Series,
3942
DataFrames, or groups of data.
@@ -48,4 +51,110 @@ UDFs can be applied across various pandas methods that work with both Series and
4851
Series in a clean, readable manner.
4952

5053
Each of these methods can be used with both Series and DataFrame objects, providing versatile
51-
ways to apply user-defined functions across different pandas data structures.
54+
ways to apply user-defined functions across different pandas data structures.
55+
56+
57+
:meth:`DataFrame.apply`
58+
-----------------------
59+
60+
The :meth:`DataFrame.apply` allows applying a user-defined functions along either axis (rows or columns):
61+
62+
.. ipython:: python
63+
64+
import pandas as pd
65+
66+
# Sample DataFrame
67+
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
68+
69+
# User-Defined Function
70+
def add_one(x):
71+
return x + 1
72+
73+
# Apply function
74+
df_transformed = df.apply(add_one)
75+
print(df_transformed)
76+
77+
# This works with lambda functions too
78+
df_lambda = df.apply(lambda x : x + 1)
79+
print(df_lambda)
80+
81+
82+
:meth:`DataFrame.apply` also accepts dictionaries of multiple user-defined functions:
83+
84+
.. ipython:: python
85+
86+
import pandas as pd
87+
88+
# Sample DataFrame
89+
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]})
90+
91+
# User-Defined Function
92+
def add_one(x):
93+
return x + 1
94+
95+
def add_two(x):
96+
return x + 2
97+
98+
# Apply function
99+
df_transformed = df.apply({"A": add_one, "B": add_two})
100+
print(df_transformed)
101+
102+
# This works with lambda functions too
103+
df_lambda = df.apply({"A": lambda x : x + 1, "B": lambda x : x + 2})
104+
print(df_lambda)
105+
106+
:meth:`DataFrame.apply` works with Series objects as well:
107+
108+
.. ipython:: python
109+
110+
import pandas as pd
111+
112+
# Sample Series
113+
s = pd.Series([1, 2, 3])
114+
115+
# User-Defined Function
116+
def add_one(x):
117+
return x + 1
118+
119+
# Apply function
120+
s_transformed = s.apply(add_one)
121+
print(df_transformed)
122+
123+
# This works with lambda functions too
124+
s_lambda = s.apply(lambda x : x + 1)
125+
print(s_lambda)
126+
127+
:meth:`DataFrame.agg`
128+
---------------------
129+
130+
When working with grouped data, user-defined functions can be used within :meth:`DataFrame.agg`:
131+
132+
.. ipython:: python
133+
134+
# Sample DataFrame
135+
df = pd.DataFrame({
136+
'Category': ['A', 'A', 'B', 'B'],
137+
'Values': [10, 20, 30, 40]
138+
})
139+
140+
# Define a function for group operations
141+
def group_mean(group):
142+
return group.mean()
143+
144+
# Apply UDF to each group
145+
grouped_result = df.groupby('Category')['Values'].agg(group_mean)
146+
print(grouped_result)
147+
148+
Performance Considerations
149+
--------------------------
150+
151+
While UDFs provide flexibility, their use is currently discouraged as they can introduce performance issues, especially when
152+
written in pure Python. To improve efficiency:
153+
154+
* Use **vectorized operations** (`NumPy` or `pandas` built-ins) when possible.
155+
* Leverage **Cython or Numba** to speed up computations.
156+
* Consider using **pandas' built-in methods** instead of UDFs for common operations.
157+
158+
.. note::
159+
If performance is critical, explore **pandas' vectorized functions** before resorting
160+
to UDFs.

0 commit comments

Comments
 (0)