@@ -96,15 +96,15 @@ User-Defined Functions can be applied across various pandas methods:
9696+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
9797| :meth: `apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
9898+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99- | :meth: `agg ` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
99+ | :meth: `pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
100100+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101- | :meth: `transform ` (axis=0) | Column (Series) | Column( Series) | Same as :meth: ` apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
101+ | :meth: `filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns `` False `` |
102102+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103- | :meth: `transform ` (axis=1) | Row (Series) | Row ( Series) | Same as :meth: ` apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
103+ | :meth: `agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
104104+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105- | :meth: `filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns `` False `` |
105+ | :meth: `transform ` (axis=0) | Column (Series) | Column ( Series) | Same as :meth: ` apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
106106+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107- | :meth: `pipe ` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
107+ | :meth: `transform ` (axis=1) | Row (Series) | Row ( Series) | Same as :meth: ` apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
108108+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
109109
110110When applying UDFs in pandas, it is essential to select the appropriate method based
@@ -118,53 +118,108 @@ decisions, ensuring more efficient and maintainable code.
118118 and :ref: `ewm()<window> ` for details.
119119
120120
121- :meth: `DataFrame.apply `
122- ~~~~~~~~~~~~~~~~~~~~~~~
121+ :meth: `Series.map ` and :meth: ` DataFrame.map `
122+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123123
124- The :meth: `apply ` method allows you to apply UDFs along either rows or columns. While flexible,
125- it is slower than vectorized operations and should be used only when you need operations
126- that cannot be achieved with built-in pandas functions .
124+ The :meth: `map ` method is used specifically to apply element-wise UDFs. This means the function
125+ will be called for each element in the `` Series `` or `` DataFrame ``, with the individual value or
126+ the cell as the function argument .
127127
128- When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
129- but consider optimizing performance with vectorized operations wherever possible.
128+ .. ipython :: python
130129
131- :meth: `DataFrame.agg `
132- ~~~~~~~~~~~~~~~~~~~~~
130+ temperature_celsius = pd.DataFrame({
131+ " NYC" : [14 , 21 , 23 ],
132+ " Los Angeles" : [22 , 28 , 31 ],
133+ })
133134
134- If you need to aggregate data, :meth: ` agg ` is a better choice than apply because it is
135- specifically designed for aggregation operations.
135+ def to_fahrenheit ( value ):
136+ return value * ( 9 / 5 ) + 32
136137
137- When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
138- a scalar value on each input.
138+ temperature_celsius.map(to_fahrenheit)
139139
140- :meth: `DataFrame.transform `
141- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
140+ In this example, the function ``to_fahrenheit `` will be called 6 times, once for each value
141+ in the ``DataFrame ``. And the result of each call will be returned in the corresponding cell
142+ of the resulting ``DataFrame ``.
142143
143- The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
144- It is generally faster than apply because it can take advantage of pandas' internal optimizations.
144+ In general, ``map `` will be slow, as it will not make use of vectorization. Instead, a Python
145+ function call for each value will be required, which will slow down things significantly if
146+ working with medium or large data.
145147
146- When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame .
148+ When to use: Use :meth: ` map ` for applying element-wise UDFs to DataFrames or Series .
147149
148- .. code-block :: python
150+ :meth: `Series.apply ` and :meth: `DataFrame.apply `
151+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149152
150- from sklearn.linear_model import LinearRegression
153+ The :meth: `apply ` method allows you to apply UDFs for a whole column or row. This is different
154+ from :meth: `map ` in that the function will be called for each column (or row), not for each individual value.
151155
152- df = pd.DataFrame({
153- ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
154- ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
155- ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
156- }).set_index(" x" )
156+ .. ipython :: python
157157
158- # Function to fit a model to each group
159- def fit_model (group ):
160- x = group.index.to_frame()
161- y = group
162- model = LinearRegression()
163- model.fit(x, y)
164- pred = model.predict(x)
165- return pred
158+ temperature_celsius = pd.DataFrame({
159+ " NYC" : [14 , 21 , 23 ],
160+ " Los Angeles" : [22 , 28 , 31 ],
161+ })
166162
167- result = df.groupby(' group' ).transform(fit_model)
163+ def to_fahrenheit (column ):
164+ return column * (9 / 5 ) + 32
165+
166+ temperature_celsius.apply(to_fahrenheit)
167+
168+ In the example, ``to_fahrenheit `` will be called only twice, as opposed to the 6 times with :meth: `map `.
169+ This will be faster than using :meth: `map `, since the operations for each column are vectorized, and the
170+ overhead of iterating over data in Python and calling Python functions is significantly reduced.
171+
172+ In some cases, the function may require all the data to be able to compute the result. So :meth: `apply `
173+ is needed, since with :meth: `map ` the function can only access one element at a time.
174+
175+ .. ipython :: python
176+
177+ temperature = pd.DataFrame({
178+ " NYC" : [14 , 21 , 23 ],
179+ " Los Angeles" : [22 , 28 , 31 ],
180+ })
181+
182+ def normalize (column ):
183+ return column / column.mean()
184+
185+ temperature.apply(normalize)
186+
187+ In the example, the ``normalize `` function needs to compute the mean of the whole column in order
188+ to divide each element by it. So, we cannot call the function for each element, but we need the
189+ function to receive the whole column.
190+
191+ :meth: `apply ` can also execute function by row, by specifying ``axis=1 ``.
192+
193+ .. ipython :: python
194+
195+ temperature = pd.DataFrame({
196+ " NYC" : [14 , 21 , 23 ],
197+ " Los Angeles" : [22 , 28 , 31 ],
198+ })
199+
200+ def hotter (row ):
201+ return row[" Los Angeles" ] - row[" NYC" ]
202+
203+ temperature.apply(hotter, axis = 1 )
204+
205+ In the example, the function ``hotter `` will be called 3 times, once for each row. And each
206+ call will receive the whole row as the argument, allowing computations that require more than
207+ one value in the row.
208+
209+ ``apply `` is also available for :meth: `SeriesGroupBy.apply `, :meth: `DataFrameGroupBy.apply `,
210+ :meth: `Rolling.apply `, :meth: `Expanding.apply ` and :meth: `Resampler.apply `. You can read more
211+ about ``apply `` in groupby operations :ref: `groupby.apply `.
212+
213+ When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
214+ but consider optimizing performance with vectorized operations wherever possible.
215+
216+ :meth: `DataFrame.pipe `
217+ ~~~~~~~~~~~~~~~~~~~~~~
218+
219+ The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline.
220+ It is a helpful tool for organizing complex data processing workflows.
221+
222+ When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
168223
169224:meth: `DataFrame.filter `
170225~~~~~~~~~~~~~~~~~~~~~~~~
@@ -199,20 +254,43 @@ When to use: Use :meth:`filter` when you want to use a UDF to create a subset of
199254 Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
200255for example, by using list comprehensions.
201256
202- :meth: `DataFrame.map `
257+ :meth: `DataFrame.agg `
203258~~~~~~~~~~~~~~~~~~~~~
204259
205- The :meth: `map ` method is used specifically to apply element-wise UDFs.
260+ If you need to aggregate data, :meth: `agg ` is a better choice than apply because it is
261+ specifically designed for aggregation operations.
206262
207- When to use: Use :meth: `map ` for applying element-wise UDFs to DataFrames or Series.
263+ When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
264+ a scalar value on each input.
208265
209- :meth: `DataFrame.pipe `
210- ~~~~~~~~~~~~~~~~~~~~~~
266+ :meth: `DataFrame.transform `
267+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
211268
212- The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline .
213- It is a helpful tool for organizing complex data processing workflows .
269+ The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame .
270+ It is generally faster than apply because it can take advantage of pandas' internal optimizations .
214271
215- When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
272+ When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
273+
274+ .. code-block :: python
275+
276+ from sklearn.linear_model import LinearRegression
277+
278+ df = pd.DataFrame({
279+ ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
280+ ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
281+ ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
282+ }).set_index(" x" )
283+
284+ # Function to fit a model to each group
285+ def fit_model (group ):
286+ x = group.index.to_frame()
287+ y = group
288+ model = LinearRegression()
289+ model.fit(x, y)
290+ pred = model.predict(x)
291+ return pred
292+
293+ result = df.groupby(' group' ).transform(fit_model)
216294
217295
218296 Performance
0 commit comments