Skip to content

Commit 4ec5697

Browse files
committed
added map, pipe and vectorized operations
1 parent fe67ec8 commit 4ec5697

File tree

1 file changed

+85
-14
lines changed

1 file changed

+85
-14
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 85 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ functionality by allowing users to apply custom computations to their data. Whil
1111
pandas comes with a set of built-in functions for data manipulation, UDFs offer
1212
flexibility when built-in methods are not sufficient. These functions can be
1313
applied at different levels: element-wise, row-wise, column-wise, or group-wise,
14-
depending on the method used.
14+
and change the data differently, depending on the method used.
1515

1616
Why Use User-Defined Functions?
1717
-------------------------------
1818

1919
Pandas is designed for high-performance data processing, but sometimes your specific
20-
needs go beyond standard aggregation, transformation, or filtering. UDFs allow you to:
20+
needs go beyond standard aggregation, transformation, or filtering. User-defined functions allow you to:
2121

2222
* **Customize Computations**: Implement logic tailored to your dataset, such as complex
2323
transformations, domain-specific calculations, or conditional modifications.
@@ -32,7 +32,7 @@ needs go beyond standard aggregation, transformation, or filtering. UDFs allow y
3232
What functions support User-Defined Functions
3333
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3434

35-
User-Defined Functions can be applied across various pandas methods that work with Series and DataFrames:
35+
User-Defined Functions can be applied across various pandas methods:
3636

3737
* :meth:`DataFrame.apply` - A flexible method that allows applying a function to Series,
3838
DataFrames, or groups of data.
@@ -46,7 +46,7 @@ User-Defined Functions can be applied across various pandas methods that work wi
4646
* :meth:`DataFrame.pipe` - Allows chaining custom functions to process entire DataFrames or
4747
Series in a clean, readable manner.
4848

49-
Each of these methods can be used with both Series and DataFrame objects, providing versatile
49+
All of these pandas methods can be used with both Series and DataFrame objects, providing versatile
5050
ways to apply user-defined functions across different pandas data structures.
5151

5252

@@ -184,10 +184,13 @@ values being broadcasted to the original dimensions:
184184
------------------------
185185

186186
The :meth:`DataFrame.filter` method is used to select subsets of the DataFrame’s
187-
columns or rows and accepts user-defined functions. Specifically, these functions
188-
return boolean values to filter columns or rows. It is useful when you want to
187+
columns or rows and accepts user-defined functions. It is useful when you want to
189188
extract specific columns or rows that match particular conditions.
190189

190+
.. note::
191+
:meth:`DataFrame.filter` expects a user-defined function that returns a boolean
192+
value
193+
191194
.. ipython:: python
192195
# Sample DataFrame
193196
df = pd.DataFrame({
@@ -204,17 +207,85 @@ extract specific columns or rows that match particular conditions.
204207
Unlike the methods discussed earlier, :meth:`DataFrame.filter` does not accept
205208
functions that do not return boolean values, such as `mean` or `sum`.
206209

210+
:meth:`DataFrame.map`
211+
---------------------
212+
213+
The :meth:`DataFrame.map` method is used to apply a function element-wise to a pandas Series
214+
or Dataframe. It is particularly useful for substituting values or transforming data.
215+
216+
.. ipython:: python
217+
# Sample DataFrame
218+
df = pd.DataFrame({ 'A': ['cat', 'dog', 'bird'], 'B': ['pig', 'cow', 'lamb'] })
219+
220+
# Using map with a user-defined function
221+
def animal_to_length(animal):
222+
return len(animal)
223+
224+
df_mapped = df.map(animal_to_length)
225+
print(df_mapped)
226+
227+
# This works with lambda functions too
228+
df_lambda = df.map(lambda x: x.upper())
229+
print(df_lambda)
230+
231+
:meth:`DataFrame.pipe`
232+
----------------------
233+
234+
The :meth:`DataFrame.pipe` method allows you to apply a function or a series of functions to a
235+
DataFrame in a clean and readable way. This is especially useful for building data processing pipelines.
236+
237+
.. ipython:: python
238+
# Sample DataFrame
239+
df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6] })
240+
241+
# User-defined functions for transformation
242+
def add_one(df):
243+
return df + 1
244+
245+
def square(df):
246+
return df ** 2
247+
248+
# Applying functions using pipe
249+
df_piped = df.pipe(add_one).pipe(square)
250+
print(df_piped)
251+
252+
The advantage of using :meth:`DataFrame.pipe` is that it allows you to chain together functions
253+
without nested calls, promoting a cleaner and more readable code style.
254+
207255

208256
Performance Considerations
209257
--------------------------
210258

211-
While UDFs provide flexibility, their use is currently discouraged as they can introduce performance issues, especially when
212-
written in pure Python. To improve efficiency:
213-
214-
* Use **vectorized operations** (`NumPy` or `pandas` built-ins) when possible.
215-
* Leverage **Cython or Numba** to speed up computations.
216-
* Consider using **pandas' built-in methods** instead of UDFs for common operations.
259+
While UDFs provide flexibility, their use is currently discouraged as they can introduce
260+
performance issues, especially when written in pure Python. To improve efficiency,
261+
consider using built-in `NumPy` or `pandas` functions instead of user-defined functions
262+
for common operations.
217263

218264
.. note::
219-
If performance is critical, explore **pandas' vectorized functions** before resorting
220-
to UDFs.
265+
If performance is critical, explore **vectorizated operations** before resorting
266+
to user-defined functions.
267+
268+
Vectorized Operations
269+
~~~~~~~~~~~~~~~~~~~~~
270+
271+
Below is an example of vectorized operations in pandas:
272+
273+
.. ipython:: python
274+
# Vectorized operation:
275+
df["new_col"] = 100 * (df["one"] / df["two"])
276+
277+
# User-defined function
278+
def calc_ratio(row):
279+
return 100 * (row["one"] / row["two"])
280+
281+
df["new_col2"] = df.apply(calc_ratio, axis=1)
282+
283+
Measuring how long each operation takes:
284+
285+
.. ipython:: python
286+
Vectorized: 0.0043 secs
287+
User-defined function: 5.6435 secs
288+
289+
This happens because user-defined functions loop through each row and apply its function,
290+
while vectorized operations are applied to underlying `Numpy` arrays, skipping inefficient
291+
Python code.

0 commit comments

Comments
 (0)