Skip to content

Commit eb4f6f8

Browse files
clarify example of replace method and relationship with CoW rules
1 parent 762f4cb commit eb4f6f8

File tree

1 file changed

+38
-16
lines changed

1 file changed

+38
-16
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ This group encompasses both kinds of inplace: `update` can be values-inplace, wh
133133
(for example, although ``isetitem`` operates on the original pandas object inplace,
134134
it will not change any existing values inplace; rather it will remove the values of the column being set, and insert new values).
135135

136-
**Group 2: Methods that modify the underlying data of the DataFrame/Series object ("values-inplace")**
136+
**Group 2: Methods that can modify the underlying data of the DataFrame/Series object ("values-inplace")**
137137

138138
| Method Name |
139139
|:----------------|
@@ -151,7 +151,7 @@ These methods don't operate inplace by default, but can be done inplace with `in
151151
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
152152
the DataFrame or Series.
153153

154-
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values ("object-inplace")**
154+
**Group 3: Methods that can modify the DataFrame/Series object, but not the pre-existing values ("object-inplace")**
155155

156156
| Method Name |
157157
|:----------------------------|
@@ -197,41 +197,63 @@ implementation detail for the purpose of this PDEP.
197197

198198
The methods from **group 1 (always inplace, no keyword)** won't change behavior, and will remain always inplace.
199199

200+
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
201+
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
202+
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
203+
200204
For methods from **group 3 (object-inplace)**, the `inplace=True` keyword can currently be
201205
used to avoid a copy. However, with the introduction of Copy-on-Write, every operation
202206
will potentially return a shallow copy of the input object by default (if the performed
203207
operation does not require a copy of the data). This future default is therefore
204208
equivalent to the behavior with `inplace=True` for those methods (minus the return
205-
value), and therefore we propose to remove the `inplace` keyword for this group of
206-
methods.
207-
208-
For methods from **group 4 (never inplace)**, the `inplace` keyword has no actual effect
209-
(except for re-assigning to the calling variable) and is effectively syntactic sugar for
210-
manually re-assigning. For this group, we propose to remove the `inplace` keyword.
209+
value).
211210

212-
For the above reasoning, we think there is no benefit of keeping the keyword around for these methods. To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
211+
For the above reasoning, we think there is no benefit of keeping the keyword around for
212+
these methods. To emulate behavior of the `inplace` keyword, we can reassign the result
213+
of an operation to the same variable:
213214

214215
:::python
215216
df = pd.DataFrame({"foo": [1, 2, 3]})
216217
df = df.reset_index()
217218
df.iloc[0, 1] = ...
218219

219-
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
220+
All references to the original object will go out of scope when the result of the `reset_index` operation is re-assigned
220221
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied (with Copy-on-Write).
221222

222-
**Group 2 (values-inplace)** methods differ, though, since they only modify the underlying data, and therefore can be inplace.
223+
**Group 2 (values-inplace)** methods differ, though, since they modify the underlying
224+
data, and therefore can be actually happen inplace:
225+
226+
:::python
227+
df = pd.DataFrame({"foo": [1, 2, 3]})
228+
df.replace(to_replace=1, value=100, inplace=True)
229+
230+
Currently, the above updates `df` values-inplace, without requiring a copy of the data.
231+
For this type of method, however, we can _not_ emulate the above usage of `inplace` by
232+
re-assigning:
223233

224234
:::python
225235
df = pd.DataFrame({"foo": [1, 2, 3]})
226236
df = df.replace(to_replace=1, value=100)
227237

228-
If we follow the rules of Copy-on-Write[^1] where "any subset or returned series/dataframe always behaves as a copy of
229-
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
230-
The original object would be modified before the reference goes out of scope.
238+
If we follow the rules of Copy-on-Write[^1] where "any subset or returned
239+
series/dataframe always behaves as a copy of the original, and thus never modifies the
240+
original", then there is no way of doing this operation inplace by default, because the
241+
original object `df` would be modified before the reference goes out of scope (pandas
242+
does not know whether you will re-assign it to `df` or assign it to another variable).
243+
That would violate the Copy-on-Write rules, and therefore the `replace()` method in the
244+
example always needs to make a copy of the underlying data by default
231245

232246
For this case, an `inplace=True` option can have an actual benefit, i.e. allowing to
233-
avoid triggering a copy when a value would get replaced. Therefore, we propose to keep
234-
the `inplace` argument for those methods.
247+
avoid a data copy. Therefore, we propose to keep the `inplace` argument for this
248+
group of methods.
249+
250+
Summarizing for the `inplace` keyword, we propose to:
251+
252+
- Keep the `inplace` keyword for this subset of methods (group 2) that can update the
253+
underlying values inplace ("values-inplace")
254+
- Remove the `inplace` keyword from all other methods that either can never work inplace
255+
(group 4) or only update the object (group 3, "object-inplace", which can be emulated
256+
with reassigning).
235257

236258
### Open Questions
237259

0 commit comments

Comments
 (0)