You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/pandas/pdeps/0008-inplace-methods-in-pandas.md
+29-19Lines changed: 29 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ Generally, we assume that people use the keyword for the following reasons:
46
46
47
47
For the first reason: efficiency is an important aspect. However, in practice it is not always the case
48
48
that `inplace=True` improves anything. Some of the methods with an `inplace` keyword can actually work inplace, but
49
-
others still make a copy under the hood anyway. In addition, with the introduction of Copy-on-Write, there are now other
49
+
others still make a copy under the hood anyway. In addition, with the introduction of Copy-on-Write ([PDEP-7](^1)), there are now other
50
50
ways to avoid making unnecessary copies by default (without needing to specify a keyword). The next section gives a
51
51
detailed overview of those different cases.
52
52
@@ -63,12 +63,12 @@ Finally, there are also methods that have a `copy` keyword instead of an `inplac
63
63
the data when `copy=False`, but returns a new object referencing the same data instead of updating the calling object),
64
64
adding to the inconsistencies. This keyword is also redundant now with the introduction of Copy-on-Write.
65
65
66
-
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword (except
67
-
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
66
+
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword, except
67
+
for a small subset of methods that can actually update data inplace. Removing those keywords will give a more
68
68
consistent and less confusing API. Removing the `copy` keyword is covered by PDEP-7 about Copy-on-Write,
69
69
and this PDEP will focus on the `inplace` keyword.
70
70
71
-
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of operations
71
+
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of methods
72
72
consistent, and compatible with Copy-on-Write.
73
73
74
74
Note: there are also operations (not methods) that work inplace in pandas, such as indexing (
@@ -101,8 +101,12 @@ for this PDEP, we can distinguish two kinds of "inplace" operations:
101
101
As illustration, an example of such an object-inplace operation without using a method:
102
102
103
103
:::python
104
-
# we replace the Index on `df` inplace, but without actually updating any existing array
104
+
# we replace the Index on `df` inplace, but without actually
105
+
# updating any existing array
105
106
df.index = pd.Index(...)
107
+
# we update the DataFrame inplace, but by completely replacing a column,
108
+
# not by mutating the existing column's underlying array
109
+
df["col"] = new_values
106
110
107
111
Object-inplace operations, while not actually modifying existing column values, keep
108
112
(a subset of) those columns and thus can avoid copying the data of those existing columns.
@@ -142,7 +146,7 @@ it will not change any existing values inplace; rather it will remove the values
142
146
|``bfill``|
143
147
|``clip``|
144
148
145
-
These methods don't operate inplace by default, but can be done inplace with `inplace=True`if the dtypes are compatible
149
+
These methods don't operate inplace by default, but can be done inplace with `inplace=True`_if_ the dtypes are compatible
146
150
(e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
147
151
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
148
152
the DataFrame or Series.
@@ -180,7 +184,7 @@ return a new object referencing the same data.
180
184
|`eval`|
181
185
|`query`|
182
186
183
-
Although these methods have the `inplace` keyword, they can never operate inplace because the nature of the
187
+
Although these methods have the `inplace` keyword, they can never operate inplace, in neither meaning, because the nature of the
184
188
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
185
189
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
186
190
@@ -191,26 +195,31 @@ implementation detail for the purpose of this PDEP.
191
195
192
196
### Proposed changes and reasoning
193
197
194
-
The methods from group 1 won't change behavior, and will remain always inplace.
198
+
The methods from **group 1 (always inplace, no keyword)** won't change behavior, and will remain always inplace.
195
199
196
-
Methods in groups 3 and 4 will lose their `inplace` keyword. Under Copy-on-Write, every operation will
197
-
potentially return a shallow copy of the input object, if the performed operation does not require a copy of the data. This is
198
-
equivalent to the behavior with `inplace=True` for those methods. If users want to make a hard
199
-
copy, they can call the `copy()` method on the result of the operation.
200
+
For methods from **group 3 (object-inplace)**, the `inplace=True` keyword can currently be
201
+
used to avoid a copy. However, with the introduction of Copy-on-Write, every operation
202
+
will potentially return a shallow copy of the input object by default (if the performed
203
+
operation does not require a copy of the data). This future default is therefore
204
+
equivalent to the behavior with `inplace=True` for those methods (minus the return
205
+
value), and therefore we propose to remove the `inplace` keyword for this group of
206
+
methods.
200
207
201
-
Therefore, there is no benefit of keeping the keyword around for these methods.
208
+
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
209
+
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
210
+
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
202
211
203
-
To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
212
+
For the above reasoning, we think there is no benefit of keeping the keyword around for these methods. To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
204
213
205
214
:::python
206
215
df = pd.DataFrame({"foo": [1, 2, 3]})
207
216
df = df.reset_index()
208
217
df.iloc[0, 1] = ...
209
218
210
219
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
211
-
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied.
220
+
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied (with Copy-on-Write).
212
221
213
-
Group 2 methods differ, though, since they only modify the underlying data, and therefore can be inplace.
222
+
**Group 2 (values-inplace)** methods differ, though, since they only modify the underlying data, and therefore can be inplace.
214
223
215
224
:::python
216
225
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -220,8 +229,9 @@ If we follow the rules of Copy-on-Write[^1] where "any subset or returned series
220
229
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
221
230
The original object would be modified before the reference goes out of scope.
222
231
223
-
To avoid triggering a copy when a value would actually get replaced, we will keep the `inplace` argument for those
224
-
methods.
232
+
For this case, an `inplace=True` option can have an actual benefit, i.e. allowing to
233
+
avoid triggering a copy when a value would get replaced. Therefore, we propose to keep
234
+
the `inplace` argument for those methods.
225
235
226
236
### Open Questions
227
237
@@ -329,7 +339,7 @@ Removing the `inplace` keyword is a breaking change, but since the affected beha
329
339
behaviour when not specifying the keyword (i.e. `inplace=False`) will not change and the keyword itself can first be
0 commit comments