Skip to content

Commit 92c6a0a

Browse files
committed
Update
1 parent 57390ad commit 92c6a0a

File tree

1 file changed

+66
-65
lines changed

1 file changed

+66
-65
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 66 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -121,93 +121,94 @@ inplace (it will remove the values of the column being set, and insert new value
121121
| ``bfill`` |
122122
| ``clip`` |
123123

124-
These methods don't operate inplace by default, but have the option to specify `inlace=True`. All those methods leave
124+
These methods don't operate inplace by default, but can be done inplace with `inplace=True`. All those methods leave
125125
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
126126
the DataFrame or Series.
127127

128128
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
129129

130-
| Method Name |
131-
|:----------------------------|
132-
| ``drop`` (dropping columns) |
133-
| ``eval`` |
134-
| ``rename`` |
135-
| ``rename_axis`` |
136-
| ``reset_index`` |
137-
| ``set_index`` |
138-
| ``astype`` |
139-
| ``infer_objects`` |
140-
| ``set_axis`` |
141-
| ``set_flags`` |
142-
| ``to_period`` |
143-
| ``to_timestamp`` |
144-
| ``tz_localize`` |
145-
| ``tz_convert`` |
146-
| ``swaplevel`` |
147-
| ``concat`` |
130+
| Method Name | Keyword |
131+
|:----------------------------|-----------------------|
132+
| ``drop`` (dropping columns) | ``inplace`` |
133+
| ``rename`` | ``inplace``, ``copy`` |
134+
| ``rename_axis`` | ``inplace``, ``copy`` |
135+
| ``reset_index`` | ``inplace`` |
136+
| ``set_index`` | ``inplace`` |
137+
| ``astype`` | ``copy`` |
138+
| ``infer_objects`` | ``copy`` |
139+
| ``set_axis`` | ``copy`` |
140+
| ``set_flags`` | ``copy`` |
141+
| ``to_period`` | ``copy`` |
142+
| ``to_timestamp`` | ``copy`` |
143+
| ``tz_localize`` | ``copy`` |
144+
| ``tz_convert`` | ``copy`` |
145+
| ``Series.swaplevel``* | ``copy`` |
146+
| ``concat`` | ``copy`` |
147+
148+
\* The `copy` keyword is only available for `Series.swaplevel` and not for `DataFrame.swaplevel`.
148149

149150
These methods can change the structure of the DataFrame or Series, such as changing the shape by adding or removing
150151
columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
151152
underlying data of the object.
153+
152154
All those methods (except for `set_flags`) make a copy of the full data by default, but can be performed inplace with
153155
avoiding copying all data (currently enabled with the `inplace` or `copy` keyword).
154156

155157
Some of these methods only have a `copy` keyword instead of an `inplace`
156-
keyword: `astype`, `infer_objects`, `set_axis`, `set_flags`, `to_period`, `to_timestamp`, `tz_localize`, `tz_convert`, `swaplevel`, `concat`
157-
and `merge`.
158-
These allow the user to avoid a copy, but don't update the original object inplace and instead return a new object
159-
referencing the same data.
158+
keyword. These allow the user to avoid a copy, but don't update the original object inplace and instead return a
159+
new object referencing the same data.
160160

161-
Two methods also have both keywords: `rename`, `rename_axis`.
161+
Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace` keyword overriding `copy`.
162162

163163
**Group 4: Methods that can never operate inplace**
164164

165-
| Method Name |
166-
|:-------------------------|
167-
| ``drop`` (dropping rows) |
168-
| ``dropna`` |
169-
| ``drop_duplicates`` |
170-
| ``sort_values`` |
171-
| ``sort_index`` |
172-
| ``query`` |
173-
| ``transpose`` |
174-
| ``swapaxes`` |
175-
| ``align`` |
176-
| ``reindex`` |
177-
| ``reindex_like`` |
178-
| ``truncate`` |
179-
180-
These methods can never operate inplace because the nature of the operation requires copying (such as reordering or
181-
dropping rows). For those methods, `inplace=True` is essentially just synctactic sugar for reassigning the new result
182-
to `self` (the calling DataFrame).
165+
| Method Name | Keyword |
166+
|:-------------------------|-------------|
167+
| `drop` (dropping rows) | `inplace` |
168+
| `dropna` | `inplace` |
169+
| `drop_duplicates` | `inplace` |
170+
| `sort_values` | `inplace` |
171+
| `sort_index` | `inplace` |
172+
| `eval` | `inplace` |
173+
| `query` | `inplace` |
174+
| `transpose` | `copy` |
175+
| `swapaxes` | `copy` |
176+
| `align` | `copy` |
177+
| `reindex` | `copy` |
178+
| `reindex_like` | `copy` |
179+
| `truncate` | `copy` |
180+
181+
Although all of these methods either `inplace` or `copy`, they can never operate inplace because the nature of the
182+
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
183+
syntactic sugar for reassigning the new result to `self` (the calling DataFrame).
183184

184185
Note: in the case of a "no-op" (for example when sorting an already sorted DataFrame), some of those methods might not
185-
need to perform a copy. This currently happens with Copy-on-Write (regardless of ``inplace``), but this is considered an
186+
need to perform a copy. This currently happens with Copy-on-Write (regardless of `inplace`), but this is considered an
186187
implementation detail for the purpose of this PDEP.
187188

188189
### Proposed changes and reasoning
189190

190191
The methods from group 1 won't change behavior, and will remain always inplace.
191192

192-
Methods in groups 3 and 4 will lose their ``copy`` and ``inplace`` keywords. Under Copy-on-Write, every operation will
193+
Methods in groups 3 and 4 will lose their `copy` and `inplace` keywords. Under Copy-on-Write, every operation will
193194
potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
194-
equivalent to behavior with ``copy=False`` and/or ``inplace=True`` for those methods. If users want to make a hard
195-
copy(``copy=True``), they can do:
195+
equivalent to behavior with `copy=False` and/or `inplace=True` for those methods. If users want to make a hard
196+
copy(`copy=True`), they can do:
196197

197198
:::python
198199
df = df.func().copy()
199200

200201
Therefore, there is no benefit of keeping the keywords around for these methods.
201202

202-
User can emulate behavior of the ``inplace`` keyword by assigning the result of an operation to the same variable:
203+
User can emulate behavior of the `inplace` keyword by assigning the result of an operation to the same variable:
203204

204205
:::python
205206
df = pd.DataFrame({"foo": [1, 2, 3]})
206207
df = df.reset_index()
207208
df.iloc[0, 1] = ...
208209

209-
All references to the original object will go out of scope when the result of the ``reset_index`` operation is assigned
210-
to ``df``. As a consequence, ``iloc`` will continue to operate inplace, and the underlying data will not be copied.
210+
All references to the original object will go out of scope when the result of the `reset_index` operation is assigned
211+
to `df`. As a consequence, `iloc` will continue to operate inplace, and the underlying data will not be copied.
211212

212213
The methods in group 2 behave different compared to the first three groups. These methods are actually able to operate
213214
inplace because they only modify the underlying data.
@@ -220,7 +221,7 @@ If we follow the rules of Copy-on-Write[^1] where "any subset or returned series
220221
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
221222
The original object would be modified before the reference goes out of scope.
222223

223-
To avoid triggering a copy when a value would actually get replaced, we will keep the ``inplace`` argument for those
224+
To avoid triggering a copy when a value would actually get replaced, we will keep the `inplace` argument for those
224225
methods.
225226

226227
### Open Questions
@@ -238,7 +239,7 @@ For example,
238239

239240
can be performed inplace.
240241

241-
This is only true if ``df`` does not share the values it stores with another pandas object. For example, the following
242+
This is only true if `df` does not share the values it stores with another pandas object. For example, the following
242243
operations
243244

244245
:::python
@@ -255,8 +256,8 @@ would be incompatible with the Copy-on-Write rules when actually done inplace. I
255256

256257
Raising an error here is problematic since oftentimes users do not have control over whether a method would cause a "
257258
lazy copy" to be triggered under Copy-on-Write. It is also hard to fix, adding a `copy()` before calling a method
258-
with ``inplace=True`` might actually be worse than triggering the copy under the hood. We would only copy columns that
259-
share data with another object, not the whole object like ``.copy()`` would.
259+
with `inplace=True` might actually be worse than triggering the copy under the hood. We would only copy columns that
260+
share data with another object, not the whole object like `.copy()` would.
260261

261262
There is another possible variant, which would be to trigger the copy (like the first option), but have an option to
262263
raise a warning whenever this happens.
@@ -305,13 +306,13 @@ was not inplace, since it is possible to go out of memory because of this.
305306
The downsides of keeping the `inplace=True` option for certain methods, are that the return type of those methods will
306307
now depend on the value of `inplace`, and that method chaining will no longer work.
307308

308-
One way around this is to have the method return the original object that was operated on inplace when ``inplace=True``.
309+
One way around this is to have the method return the original object that was operated on inplace when `inplace=True`.
309310

310311
Advantages:
311312

312313
- It enables to use inplace operations in a method chain
313314
- It simplifies type annotations
314-
- It enables to change the default for ``inplace`` to True under Copy-on-Write
315+
- It enables to change the default for `inplace` to True under Copy-on-Write
315316

316317
Disadvantages:
317318

@@ -320,7 +321,7 @@ Disadvantages:
320321
returned (`df2 = df.method(inplace=True); assert df2 is df`)
321322
- It would change the behaviour of the current `inplace=True`
322323

323-
Given that ``inplace`` is already widely used by the pandas community, we would like to collect feedback about what the
324+
Given that `inplace` is already widely used by the pandas community, we would like to collect feedback about what the
324325
expected return type should be. Therefore, we will defer a decision on this until a later revision of this PDEP.
325326

326327
## Backward compatibility
@@ -339,11 +340,11 @@ proposal[^1].
339340

340341
### Remove the `inplace` keyword altogether
341342

342-
In the past, it was considered to remove the ``inplace`` keyword entirely. This was because many operations that had
343-
the ``inplace`` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
343+
In the past, it was considered to remove the `inplace` keyword entirely. This was because many operations that had
344+
the `inplace` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
344345
the hood, causing confusion and providing no real benefit to users.
345346

346-
Because a majority of the methods supporting ``inplace`` did not operate inplace, it was considered at the time to
347+
Because a majority of the methods supporting `inplace` did not operate inplace, it was considered at the time to
347348
deprecate and remove inplace from all methods, and add back the keyword as necessary.[^3]
348349

349350
For the subset of methods where the operation actually _can_ be done inplace (group 2), however, removing the `inplace`
@@ -352,7 +353,7 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
352353

353354
### Standardize on the `copy` keyword instead of `inplace`
354355

355-
It may seem more natural to standardize on the `copy` keyword instead of the `inplace` keyword, since the ``copy``
356+
It may seem more natural to standardize on the `copy` keyword instead of the `inplace` keyword, since the `copy`
356357
keyword already returns a new object instead of None (enabling method chaining) when it is set to `True`.
357358

358359
However, the `copy` keyword is not supported in any of the values-mutating methods listed in Group 2 above
@@ -366,27 +367,27 @@ currently used.
366367

367368
Currently, for methods where it is supported, when the `copy` keyword is `False`, a new pandas object (same
368369
as `copy=True`) is returned as the result of a method call, with the values backing the object being shared when
369-
possible. With the proposed inplace behavior, current behavior of ``copy=False`` would return a new pandas object with
370+
possible. With the proposed inplace behavior, current behavior of `copy=False` would return a new pandas object with
370371
identical values as the original object(that was modified inplace), which may be confusing for users, and lead to
371372
ambiguity with Copy on Write rules.
372373

373374
## History
374375

375-
The future of the ``inplace`` keyword is something that has been debated a lot over the years.
376+
The future of the `inplace` keyword is something that has been debated a lot over the years.
376377

377378
It may be helpful to review those discussions (see links) [^2] [^3] [^4] to better understand this PDEP.
378379

379380
## Timeline
380381

381382
Copy-on-Write is a relatively new feature (added in version 1.5) and some methods are missing the "lazy copy"
382-
optimization (equivalent to ``copy=False``).
383+
optimization (equivalent to `copy=False`).
383384

384-
Therefore, we will start showing deprecation warnings for the ``copy`` and ``inplace`` parameters in pandas 2.1, to
385+
Therefore, we will start showing deprecation warnings for the `copy` and `inplace` parameters in pandas 2.1, to
385386
allow for bugs with Copy-on-Write to be addressed and for more optimizations to be added.
386387

387388
Hopefully, users will be able to switch to Copy-on-Write to keep the no-copy behavior and to silence the warnings.
388389

389-
The full removal of the ``copy`` parameter and ``inplace`` (where necessary) is set for pandas 3.0, which will coincide
390+
The full removal of the `copy` parameter and `inplace` (where necessary) is set for pandas 3.0, which will coincide
390391
with the enablement of Copy-on-Write for pandas by default.
391392

392393
## PDEP History

0 commit comments

Comments
 (0)