You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/pandas/pdeps/0008-inplace-methods-in-pandas.md
+17-23Lines changed: 17 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,9 +60,8 @@ ways" to achieve the same result:
60
60
subtle bugs and is harder to debug.
61
61
62
62
Finally, there are also methods that have a `copy` keyword instead of an `inplace` keyword (which also avoids copying
63
-
the data in the case of `copy=False`, but still returns a new object referencing the same data instead of updating the
64
-
calling object), adding to the inconsistencies. This `copy=False` option also has become redundant with the introduction
65
-
of Copy-on-Write.
63
+
the data when `copy=False`, but returns a new object referencing the same data instead of updating the calling object),
64
+
adding to the inconsistencies. This keyword is also redundant now with the introduction of Copy-on-Write.
66
65
67
66
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword (except
68
67
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
@@ -94,7 +93,7 @@ the ``copy`` and ``inplace`` keywords, with the value of ``inplace`` overwriting
94
93
To summarize the status quo of inplace behavior of methods, we have divided methods that can operate inplace or have
95
94
an ``inplace``/``copy`` keyword into 4 groups:
96
95
97
-
**Group 1: Methods that always operate inplace**
96
+
**Group 1: Methods that always operate inplace (no user-control with ``inplace``/``copy`` keyword) **
98
97
99
98
| Method Name |
100
99
|:--------------|
@@ -103,8 +102,6 @@ an ``inplace``/``copy`` keyword into 4 groups:
103
102
|``update``|
104
103
|``isetitem``*|
105
104
106
-
These methods always operate inplace and don't have the ``inplace`` or ``copy`` keyword.
107
-
108
105
\* Although ``isetitem`` operates on the original pandas object inplace, it will not change any existing values
109
106
inplace (it will remove the values of the column being set, and insert new values).
110
107
@@ -121,7 +118,8 @@ inplace (it will remove the values of the column being set, and insert new value
121
118
|``bfill``|
122
119
|``clip``|
123
120
124
-
These methods don't operate inplace by default, but can be done inplace with `inplace=True` if the dtypes are compatible (e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
121
+
These methods don't operate inplace by default, but can be done inplace with `inplace=True` if the dtypes are compatible
122
+
(e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
125
123
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
126
124
the DataFrame or Series.
127
125
@@ -162,8 +160,8 @@ Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace`
162
160
163
161
**Group 4: Methods that can never operate inplace**
164
162
165
-
| Method Name | Keyword|
166
-
|:-------------------------|-------------|
163
+
| Method Name | Keyword |
164
+
|:-----------------------|-----------|
167
165
|`drop` (dropping rows) |`inplace`|
168
166
|`dropna`|`inplace`|
169
167
|`drop_duplicates`|`inplace`|
@@ -178,9 +176,9 @@ Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace`
178
176
|`reindex_like`|`copy`|
179
177
|`truncate`|`copy`|
180
178
181
-
Although all of these methods either`inplace` or `copy`, they can never operate inplace because the nature of the
179
+
Although these methods the`inplace`/`copy` keywords, they can never operate inplace because the nature of the
182
180
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
183
-
syntactic sugar for reassigning the new result to `self` (the calling DataFrame).
181
+
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
184
182
185
183
Note: in the case of a "no-op" (for example when sorting an already sorted DataFrame), some of those methods might not
186
184
need to perform a copy. This currently happens with Copy-on-Write (regardless of `inplace`), but this is considered an
@@ -193,14 +191,11 @@ The methods from group 1 won't change behavior, and will remain always inplace.
193
191
Methods in groups 3 and 4 will lose their `copy` and `inplace` keywords. Under Copy-on-Write, every operation will
194
192
potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
195
193
equivalent to behavior with `copy=False` and/or `inplace=True` for those methods. If users want to make a hard
196
-
copy(`copy=True`), they can do:
197
-
198
-
:::python
199
-
df = df.func().copy()
194
+
copy(`copy=True`), they can call the `copy()` method on the result of the operation.
200
195
201
196
Therefore, there is no benefit of keeping the keywords around for these methods.
202
197
203
-
User can emulate behavior of the `inplace` keyword by assigning the result of an operation to the same variable:
198
+
To emulate behavior of the `inplace` keyword, we can reassig the result of an operation to the same variable:
204
199
205
200
:::python
206
201
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -210,8 +205,7 @@ User can emulate behavior of the `inplace` keyword by assigning the result of an
210
205
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
211
206
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied.
212
207
213
-
The methods in group 2 behave different compared to the first three groups. These methods are actually able to operate
214
-
inplace because they only modify the underlying data.
208
+
Group 2 methods differ, though, since they only modify the underlying data, and therefore can be inplace.
215
209
216
210
:::python
217
211
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -336,19 +330,19 @@ There are some behaviour changes (for example the current `copy=False` returning
336
330
actual" shallow copy, but protected under Copy-on-Write), but those behaviour changes are covered by the Copy-on-Write
337
331
proposal[^1].
338
332
339
-
## Alternatives
333
+
## Rejected ideas
340
334
341
335
### Remove the `inplace` keyword altogether
342
336
343
-
In the past, it was considered to remove the `inplace` keyword entirely. This was because many operations that had
337
+
In the past, it was considered to remove the `inplace` keyword entirely. This was because many methods with
344
338
the `inplace` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
345
339
the hood, causing confusion and providing no real benefit to users.
346
340
347
341
Because a majority of the methods supporting `inplace` did not operate inplace, it was considered at the time to
348
342
deprecate and remove inplace from all methods, and add back the keyword as necessary.[^3]
349
343
350
-
For the subset of methods where the operation actually _can_ be done inplace (group 2), however, removing the `inplace`
351
-
keyword for those as well could give a significant performance regression when currently using this keyword with large
344
+
For methods where the operation actually _can_ be done inplace (group 2), however, removing the `inplace`
345
+
keyword could give a significant performance regression when currently using this keyword with large
352
346
DataFrames. Therefore, we decided to keep the `inplace` keyword for this small subset of methods.
353
347
354
348
### Standardize on the `copy` keyword instead of `inplace`
@@ -382,7 +376,7 @@ It may be helpful to review those discussions (see links) [^2] [^3] [^4] to bett
382
376
Copy-on-Write is a relatively new feature (added in version 1.5) and some methods are missing the "lazy copy"
383
377
optimization (equivalent to `copy=False`).
384
378
385
-
Therefore, we will start showing deprecation warnings for the `copy` and `inplace` parameters in pandas 2.1, to
379
+
Therefore, we propose deprecating the `copy` and `inplace` parameters in pandas 2.1, to
386
380
allow for bugs with Copy-on-Write to be addressed and for more optimizations to be added.
387
381
388
382
Hopefully, users will be able to switch to Copy-on-Write to keep the no-copy behavior and to silence the warnings.
0 commit comments