You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/pandas/pdeps/0008-inplace-methods-in-pandas.md
+38-16Lines changed: 38 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ This group encompasses both kinds of inplace: `update` can be values-inplace, wh
133
133
(for example, although ``isetitem`` operates on the original pandas object inplace,
134
134
it will not change any existing values inplace; rather it will remove the values of the column being set, and insert new values).
135
135
136
-
**Group 2: Methods that modify the underlying data of the DataFrame/Series object ("values-inplace")**
136
+
**Group 2: Methods that can modify the underlying data of the DataFrame/Series object ("values-inplace")**
137
137
138
138
| Method Name |
139
139
|:----------------|
@@ -151,7 +151,7 @@ These methods don't operate inplace by default, but can be done inplace with `in
151
151
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
152
152
the DataFrame or Series.
153
153
154
-
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values ("object-inplace")**
154
+
**Group 3: Methods that can modify the DataFrame/Series object, but not the pre-existing values ("object-inplace")**
155
155
156
156
| Method Name |
157
157
|:----------------------------|
@@ -197,41 +197,63 @@ implementation detail for the purpose of this PDEP.
197
197
198
198
The methods from **group 1 (always inplace, no keyword)** won't change behavior, and will remain always inplace.
199
199
200
+
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
201
+
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
202
+
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
203
+
200
204
For methods from **group 3 (object-inplace)**, the `inplace=True` keyword can currently be
201
205
used to avoid a copy. However, with the introduction of Copy-on-Write, every operation
202
206
will potentially return a shallow copy of the input object by default (if the performed
203
207
operation does not require a copy of the data). This future default is therefore
204
208
equivalent to the behavior with `inplace=True` for those methods (minus the return
205
-
value), and therefore we propose to remove the `inplace` keyword for this group of
206
-
methods.
207
-
208
-
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
209
-
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
210
-
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
209
+
value).
211
210
212
-
For the above reasoning, we think there is no benefit of keeping the keyword around for these methods. To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
211
+
For the above reasoning, we think there is no benefit of keeping the keyword around for
212
+
these methods. To emulate behavior of the `inplace` keyword, we can reassign the result
213
+
of an operation to the same variable:
213
214
214
215
:::python
215
216
df = pd.DataFrame({"foo": [1, 2, 3]})
216
217
df = df.reset_index()
217
218
df.iloc[0, 1] = ...
218
219
219
-
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
220
+
All references to the original object will go out of scope when the result of the `reset_index` operation is re-assigned
220
221
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied (with Copy-on-Write).
221
222
222
-
**Group 2 (values-inplace)** methods differ, though, since they only modify the underlying data, and therefore can be inplace.
223
+
**Group 2 (values-inplace)** methods differ, though, since they modify the underlying
224
+
data, and therefore can be actually happen inplace:
225
+
226
+
:::python
227
+
df = pd.DataFrame({"foo": [1, 2, 3]})
228
+
df.replace(to_replace=1, value=100, inplace=True)
229
+
230
+
Currently, the above updates `df` values-inplace, without requiring a copy of the data.
231
+
For this type of method, however, we can _not_ emulate the above usage of `inplace` by
232
+
re-assigning:
223
233
224
234
:::python
225
235
df = pd.DataFrame({"foo": [1, 2, 3]})
226
236
df = df.replace(to_replace=1, value=100)
227
237
228
-
If we follow the rules of Copy-on-Write[^1] where "any subset or returned series/dataframe always behaves as a copy of
229
-
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
230
-
The original object would be modified before the reference goes out of scope.
238
+
If we follow the rules of Copy-on-Write[^1] where "any subset or returned
239
+
series/dataframe always behaves as a copy of the original, and thus never modifies the
240
+
original", then there is no way of doing this operation inplace by default, because the
241
+
original object `df` would be modified before the reference goes out of scope (pandas
242
+
does not know whether you will re-assign it to `df` or assign it to another variable).
243
+
That would violate the Copy-on-Write rules, and therefore the `replace()` method in the
244
+
example always needs to make a copy of the underlying data by default
231
245
232
246
For this case, an `inplace=True` option can have an actual benefit, i.e. allowing to
233
-
avoid triggering a copy when a value would get replaced. Therefore, we propose to keep
234
-
the `inplace` argument for those methods.
247
+
avoid a data copy. Therefore, we propose to keep the `inplace` argument for this
248
+
group of methods.
249
+
250
+
Summarizing for the `inplace` keyword, we propose to:
251
+
252
+
- Keep the `inplace` keyword for this subset of methods (group 2) that can update the
253
+
underlying values inplace ("values-inplace")
254
+
- Remove the `inplace` keyword from all other methods that either can never work inplace
255
+
(group 4) or only update the object (group 3, "object-inplace", which can be emulated
0 commit comments