Skip to content

Commit 762f4cb

Browse files
Some textual edits to Proposed changes section
1 parent cf1c3c1 commit 762f4cb

File tree

1 file changed

+29
-19
lines changed

1 file changed

+29
-19
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Generally, we assume that people use the keyword for the following reasons:
4646

4747
For the first reason: efficiency is an important aspect. However, in practice it is not always the case
4848
that `inplace=True` improves anything. Some of the methods with an `inplace` keyword can actually work inplace, but
49-
others still make a copy under the hood anyway. In addition, with the introduction of Copy-on-Write, there are now other
49+
others still make a copy under the hood anyway. In addition, with the introduction of Copy-on-Write ([PDEP-7](^1)), there are now other
5050
ways to avoid making unnecessary copies by default (without needing to specify a keyword). The next section gives a
5151
detailed overview of those different cases.
5252

@@ -63,12 +63,12 @@ Finally, there are also methods that have a `copy` keyword instead of an `inplac
6363
the data when `copy=False`, but returns a new object referencing the same data instead of updating the calling object),
6464
adding to the inconsistencies. This keyword is also redundant now with the introduction of Copy-on-Write.
6565

66-
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword (except
67-
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
66+
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword, except
67+
for a small subset of methods that can actually update data inplace. Removing those keywords will give a more
6868
consistent and less confusing API. Removing the `copy` keyword is covered by PDEP-7 about Copy-on-Write,
6969
and this PDEP will focus on the `inplace` keyword.
7070

71-
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of operations
71+
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of methods
7272
consistent, and compatible with Copy-on-Write.
7373

7474
Note: there are also operations (not methods) that work inplace in pandas, such as indexing (
@@ -101,8 +101,12 @@ for this PDEP, we can distinguish two kinds of "inplace" operations:
101101
As illustration, an example of such an object-inplace operation without using a method:
102102

103103
:::python
104-
# we replace the Index on `df` inplace, but without actually updating any existing array
104+
# we replace the Index on `df` inplace, but without actually
105+
# updating any existing array
105106
df.index = pd.Index(...)
107+
# we update the DataFrame inplace, but by completely replacing a column,
108+
# not by mutating the existing column's underlying array
109+
df["col"] = new_values
106110

107111
Object-inplace operations, while not actually modifying existing column values, keep
108112
(a subset of) those columns and thus can avoid copying the data of those existing columns.
@@ -142,7 +146,7 @@ it will not change any existing values inplace; rather it will remove the values
142146
| ``bfill`` |
143147
| ``clip`` |
144148

145-
These methods don't operate inplace by default, but can be done inplace with `inplace=True` if the dtypes are compatible
149+
These methods don't operate inplace by default, but can be done inplace with `inplace=True` _if_ the dtypes are compatible
146150
(e.g. the values replacing the old values can be stored in the original array without an astype). All those methods leave
147151
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
148152
the DataFrame or Series.
@@ -180,7 +184,7 @@ return a new object referencing the same data.
180184
| `eval` |
181185
| `query` |
182186

183-
Although these methods have the `inplace` keyword, they can never operate inplace because the nature of the
187+
Although these methods have the `inplace` keyword, they can never operate inplace, in neither meaning, because the nature of the
184188
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
185189
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
186190

@@ -191,26 +195,31 @@ implementation detail for the purpose of this PDEP.
191195

192196
### Proposed changes and reasoning
193197

194-
The methods from group 1 won't change behavior, and will remain always inplace.
198+
The methods from **group 1 (always inplace, no keyword)** won't change behavior, and will remain always inplace.
195199

196-
Methods in groups 3 and 4 will lose their `inplace` keyword. Under Copy-on-Write, every operation will
197-
potentially return a shallow copy of the input object, if the performed operation does not require a copy of the data. This is
198-
equivalent to the behavior with `inplace=True` for those methods. If users want to make a hard
199-
copy, they can call the `copy()` method on the result of the operation.
200+
For methods from **group 3 (object-inplace)**, the `inplace=True` keyword can currently be
201+
used to avoid a copy. However, with the introduction of Copy-on-Write, every operation
202+
will potentially return a shallow copy of the input object by default (if the performed
203+
operation does not require a copy of the data). This future default is therefore
204+
equivalent to the behavior with `inplace=True` for those methods (minus the return
205+
value), and therefore we propose to remove the `inplace` keyword for this group of
206+
methods.
200207

201-
Therefore, there is no benefit of keeping the keyword around for these methods.
208+
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
209+
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
210+
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
202211

203-
To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
212+
For the above reasoning, we think there is no benefit of keeping the keyword around for these methods. To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
204213

205214
:::python
206215
df = pd.DataFrame({"foo": [1, 2, 3]})
207216
df = df.reset_index()
208217
df.iloc[0, 1] = ...
209218

210219
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
211-
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied.
220+
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied (with Copy-on-Write).
212221

213-
Group 2 methods differ, though, since they only modify the underlying data, and therefore can be inplace.
222+
**Group 2 (values-inplace)** methods differ, though, since they only modify the underlying data, and therefore can be inplace.
214223

215224
:::python
216225
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -220,8 +229,9 @@ If we follow the rules of Copy-on-Write[^1] where "any subset or returned series
220229
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
221230
The original object would be modified before the reference goes out of scope.
222231

223-
To avoid triggering a copy when a value would actually get replaced, we will keep the `inplace` argument for those
224-
methods.
232+
For this case, an `inplace=True` option can have an actual benefit, i.e. allowing to
233+
avoid triggering a copy when a value would get replaced. Therefore, we propose to keep
234+
the `inplace` argument for those methods.
225235

226236
### Open Questions
227237

@@ -329,7 +339,7 @@ Removing the `inplace` keyword is a breaking change, but since the affected beha
329339
behaviour when not specifying the keyword (i.e. `inplace=False`) will not change and the keyword itself can first be
330340
deprecated before it is removed.
331341

332-
## Rejected ideas
342+
## Rejected alternatives
333343

334344
### Remove the `inplace` keyword altogether
335345

0 commit comments

Comments
 (0)