Skip to content

Commit 03ace50

Browse files
committed
cleanups + formatting
1 parent 6b0a91b commit 03ace50

File tree

1 file changed

+17
-23
lines changed

1 file changed

+17
-23
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 17 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,8 @@ ways" to achieve the same result:
6060
subtle bugs and is harder to debug.
6161

6262
Finally, there are also methods that have a `copy` keyword instead of an `inplace` keyword (which also avoids copying
63-
the data in the case of `copy=False`, but still returns a new object referencing the same data instead of updating the
64-
calling object), adding to the inconsistencies. This `copy=False` option also has become redundant with the introduction
65-
of Copy-on-Write.
63+
the data when `copy=False`, but returns a new object referencing the same data instead of updating the calling object),
64+
adding to the inconsistencies. This keyword is also redundant now with the introduction of Copy-on-Write.
6665

6766
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword (except
6867
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
@@ -94,7 +93,7 @@ the ``copy`` and ``inplace`` keywords, with the value of ``inplace`` overwriting
9493
To summarize the status quo of inplace behavior of methods, we have divided methods that can operate inplace or have
9594
an ``inplace``/``copy`` keyword into 4 groups:
9695

97-
**Group 1: Methods that always operate inplace**
96+
**Group 1: Methods that always operate inplace (no user-control with ``inplace``/``copy`` keyword) **
9897

9998
| Method Name |
10099
|:--------------|
@@ -103,8 +102,6 @@ an ``inplace``/``copy`` keyword into 4 groups:
103102
| ``update`` |
104103
| ``isetitem``* |
105104

106-
These methods always operate inplace and don't have the ``inplace`` or ``copy`` keyword.
107-
108105
\* Although ``isetitem`` operates on the original pandas object inplace, it will not change any existing values
109106
inplace (it will remove the values of the column being set, and insert new values).
110107

@@ -121,7 +118,8 @@ inplace (it will remove the values of the column being set, and insert new value
121118
| ``bfill`` |
122119
| ``clip`` |
123120

124-
These methods don't operate inplace by default, but can be done inplace with `inplace=True` if the dtypes are compatible (e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
121+
These methods don't operate inplace by default, but can be done inplace with `inplace=True` if the dtypes are compatible
122+
(e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
125123
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
126124
the DataFrame or Series.
127125

@@ -162,8 +160,8 @@ Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace`
162160

163161
**Group 4: Methods that can never operate inplace**
164162

165-
| Method Name | Keyword |
166-
|:-------------------------|-------------|
163+
| Method Name | Keyword |
164+
|:-----------------------|-----------|
167165
| `drop` (dropping rows) | `inplace` |
168166
| `dropna` | `inplace` |
169167
| `drop_duplicates` | `inplace` |
@@ -178,9 +176,9 @@ Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace`
178176
| `reindex_like` | `copy` |
179177
| `truncate` | `copy` |
180178

181-
Although all of these methods either `inplace` or `copy`, they can never operate inplace because the nature of the
179+
Although these methods the `inplace`/`copy` keywords, they can never operate inplace because the nature of the
182180
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
183-
syntactic sugar for reassigning the new result to `self` (the calling DataFrame).
181+
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
184182

185183
Note: in the case of a "no-op" (for example when sorting an already sorted DataFrame), some of those methods might not
186184
need to perform a copy. This currently happens with Copy-on-Write (regardless of `inplace`), but this is considered an
@@ -193,14 +191,11 @@ The methods from group 1 won't change behavior, and will remain always inplace.
193191
Methods in groups 3 and 4 will lose their `copy` and `inplace` keywords. Under Copy-on-Write, every operation will
194192
potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
195193
equivalent to behavior with `copy=False` and/or `inplace=True` for those methods. If users want to make a hard
196-
copy(`copy=True`), they can do:
197-
198-
:::python
199-
df = df.func().copy()
194+
copy(`copy=True`), they can call the `copy()` method on the result of the operation.
200195

201196
Therefore, there is no benefit of keeping the keywords around for these methods.
202197

203-
User can emulate behavior of the `inplace` keyword by assigning the result of an operation to the same variable:
198+
To emulate behavior of the `inplace` keyword, we can reassig the result of an operation to the same variable:
204199

205200
:::python
206201
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -210,8 +205,7 @@ User can emulate behavior of the `inplace` keyword by assigning the result of an
210205
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
211206
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied.
212207

213-
The methods in group 2 behave different compared to the first three groups. These methods are actually able to operate
214-
inplace because they only modify the underlying data.
208+
Group 2 methods differ, though, since they only modify the underlying data, and therefore can be inplace.
215209

216210
:::python
217211
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -336,19 +330,19 @@ There are some behaviour changes (for example the current `copy=False` returning
336330
actual" shallow copy, but protected under Copy-on-Write), but those behaviour changes are covered by the Copy-on-Write
337331
proposal[^1].
338332

339-
## Alternatives
333+
## Rejected ideas
340334

341335
### Remove the `inplace` keyword altogether
342336

343-
In the past, it was considered to remove the `inplace` keyword entirely. This was because many operations that had
337+
In the past, it was considered to remove the `inplace` keyword entirely. This was because many methods with
344338
the `inplace` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
345339
the hood, causing confusion and providing no real benefit to users.
346340

347341
Because a majority of the methods supporting `inplace` did not operate inplace, it was considered at the time to
348342
deprecate and remove inplace from all methods, and add back the keyword as necessary.[^3]
349343

350-
For the subset of methods where the operation actually _can_ be done inplace (group 2), however, removing the `inplace`
351-
keyword for those as well could give a significant performance regression when currently using this keyword with large
344+
For methods where the operation actually _can_ be done inplace (group 2), however, removing the `inplace`
345+
keyword could give a significant performance regression when currently using this keyword with large
352346
DataFrames. Therefore, we decided to keep the `inplace` keyword for this small subset of methods.
353347

354348
### Standardize on the `copy` keyword instead of `inplace`
@@ -382,7 +376,7 @@ It may be helpful to review those discussions (see links) [^2] [^3] [^4] to bett
382376
Copy-on-Write is a relatively new feature (added in version 1.5) and some methods are missing the "lazy copy"
383377
optimization (equivalent to `copy=False`).
384378

385-
Therefore, we will start showing deprecation warnings for the `copy` and `inplace` parameters in pandas 2.1, to
379+
Therefore, we propose deprecating the `copy` and `inplace` parameters in pandas 2.1, to
386380
allow for bugs with Copy-on-Write to be addressed and for more optimizations to be added.
387381

388382
Hopefully, users will be able to switch to Copy-on-Write to keep the no-copy behavior and to silence the warnings.

0 commit comments

Comments
 (0)