Skip to content

Commit 11392d7

Browse files
committed
bugfix
1 parent 4ec5697 commit 11392d7

File tree

1 file changed

+39
-25
lines changed

1 file changed

+39
-25
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 39 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ User-Defined Functions can be applied across various pandas methods:
4040
aggregation functions.
4141
* :meth:`DataFrame.transform` - Applies a function to groups while preserving the shape of
4242
the original data.
43-
* :meth:`DataFrame.filter` - Filters groups based on a function returning a Boolean condition.
43+
* :meth:`DataFrame.filter` - Filters groups based on a list of Boolean conditions.
4444
* :meth:`DataFrame.map` - Applies an element-wise function to a Series, useful for
4545
transforming individual values.
4646
* :meth:`DataFrame.pipe` - Allows chaining custom functions to process entire DataFrames or
@@ -56,6 +56,7 @@ ways to apply user-defined functions across different pandas data structures.
5656
The :meth:`DataFrame.apply` allows applying a user-defined functions along either axis (rows or columns):
5757

5858
.. ipython:: python
59+
5960
import pandas as pd
6061
6162
# Sample DataFrame
@@ -77,6 +78,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
7778
:meth:`DataFrame.apply` also accepts dictionaries of multiple user-defined functions:
7879

7980
.. ipython:: python
81+
8082
# Sample DataFrame
8183
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3]})
8284
@@ -98,6 +100,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
98100
:meth:`DataFrame.apply` works with Series objects as well:
99101

100102
.. ipython:: python
103+
101104
# Sample Series
102105
s = pd.Series([1, 2, 3])
103106
@@ -119,6 +122,7 @@ The :meth:`DataFrame.apply` allows applying a user-defined functions along eithe
119122
The :meth:`DataFrame.agg` allows aggregation with a user-defined function along either axis (rows or columns):
120123

121124
.. ipython:: python
125+
122126
# Sample DataFrame
123127
df = pd.DataFrame({
124128
'Category': ['A', 'A', 'B', 'B'],
@@ -146,6 +150,7 @@ The :meth:`DataFrame.transform` allows transforms a Dataframe, Series or Grouped
146150
while preserving the original shape of the object.
147151

148152
.. ipython:: python
153+
149154
# Sample DataFrame
150155
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
151156
@@ -165,6 +170,7 @@ Attempting to use common aggregation functions such as `mean` or `sum` will resu
165170
values being broadcasted to the original dimensions:
166171

167172
.. ipython:: python
173+
168174
# Sample DataFrame
169175
df = pd.DataFrame({
170176
'Category': ['A', 'A', 'B', 'B', 'B'],
@@ -184,28 +190,29 @@ values being broadcasted to the original dimensions:
184190
------------------------
185191

186192
The :meth:`DataFrame.filter` method is used to select subsets of the DataFrame’s
187-
columns or rows and accepts user-defined functions. It is useful when you want to
188-
extract specific columns or rows that match particular conditions.
193+
columns or row. It is useful when you want to extract specific columns or rows that
194+
match particular conditions.
189195

190196
.. note::
191-
:meth:`DataFrame.filter` expects a user-defined function that returns a boolean
192-
value
197+
:meth:`DataFrame.filter` does not accept user-defined functions, but can accept
198+
list comprehensions that have user-defined functions applied to them.
193199

194200
.. ipython:: python
201+
195202
# Sample DataFrame
196203
df = pd.DataFrame({
197-
'A': [1, 2, 3],
198-
'B': [4, 5, 6],
204+
'AA': [1, 2, 3],
205+
'BB': [4, 5, 6],
199206
'C': [7, 8, 9],
200207
'D': [10, 11, 12]
201208
})
202209
203-
# Define a function that filters out columns where the name is longer than 1 character
204-
df_filtered_func = df.filter(items=lambda x: len(x) > 1)
205-
print(df_filtered_func)
210+
def is_long_name(column_name):
211+
return len(column_name) > 1
206212
207-
Unlike the methods discussed earlier, :meth:`DataFrame.filter` does not accept
208-
functions that do not return boolean values, such as `mean` or `sum`.
213+
# Define a function that filters out columns where the name is longer than 1 character
214+
df_filtered = df[[col for col in df.columns if is_long_name(col)]]
215+
print(df_filtered)
209216
210217
:meth:`DataFrame.map`
211218
---------------------
@@ -214,19 +221,20 @@ The :meth:`DataFrame.map` method is used to apply a function element-wise to a p
214221
or Dataframe. It is particularly useful for substituting values or transforming data.
215222

216223
.. ipython:: python
224+
217225
# Sample DataFrame
218-
df = pd.DataFrame({ 'A': ['cat', 'dog', 'bird'], 'B': ['pig', 'cow', 'lamb'] })
226+
s = pd.Series(['cat', 'dog', 'bird'])
219227
220228
# Using map with a user-defined function
221229
def animal_to_length(animal):
222230
return len(animal)
223231
224-
df_mapped = df.map(animal_to_length)
225-
print(df_mapped)
232+
s_mapped = s.map(animal_to_length)
233+
print(s_mapped)
226234
227235
# This works with lambda functions too
228-
df_lambda = df.map(lambda x: x.upper())
229-
print(df_lambda)
236+
s_lambda = s.map(lambda x: x.upper())
237+
print(s_lambda)
230238
231239
:meth:`DataFrame.pipe`
232240
----------------------
@@ -235,6 +243,7 @@ The :meth:`DataFrame.pipe` method allows you to apply a function or a series of
235243
DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236244

237245
.. ipython:: python
246+
238247
# Sample DataFrame
239248
df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6] })
240249
@@ -256,7 +265,7 @@ without nested calls, promoting a cleaner and more readable code style.
256265
Performance Considerations
257266
--------------------------
258267

259-
While UDFs provide flexibility, their use is currently discouraged as they can introduce
268+
While user-defined functions provide flexibility, their use is currently discouraged as they can introduce
260269
performance issues, especially when written in pure Python. To improve efficiency,
261270
consider using built-in `NumPy` or `pandas` functions instead of user-defined functions
262271
for common operations.
@@ -270,22 +279,27 @@ Vectorized Operations
270279

271280
Below is an example of vectorized operations in pandas:
272281

273-
.. ipython:: python
274-
# Vectorized operation:
275-
df["new_col"] = 100 * (df["one"] / df["two"])
282+
.. code-block:: text
276283
277284
# User-defined function
278285
def calc_ratio(row):
279286
return 100 * (row["one"] / row["two"])
280287
281288
df["new_col2"] = df.apply(calc_ratio, axis=1)
282289
290+
# Vectorized Operation
291+
df["new_col"] = 100 * (df["one"] / df["two"])
292+
283293
Measuring how long each operation takes:
284294

285-
.. ipython:: python
295+
.. code-block:: text
296+
286297
Vectorized: 0.0043 secs
287298
User-defined function: 5.6435 secs
288299
289-
This happens because user-defined functions loop through each row and apply its function,
290-
while vectorized operations are applied to underlying `Numpy` arrays, skipping inefficient
291-
Python code.
300+
Vectorized operations in pandas are significantly faster than using :meth:`DataFrame.apply`
301+
with user-defined functions because they leverage highly optimized C functions
302+
via NumPy to process entire arrays at once. This approach avoids the overhead of looping
303+
through rows in Python and making separate function calls for each row, which is slow and
304+
inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
305+
optimizations, making vectorized operations the preferred choice whenever possible.

0 commit comments

Comments
 (0)