Skip to content

Commit 2110b34

Browse files
jorisvandenbosschelithomas1
authored andcommitted
simplify by focusing on inplace keyword (removing explicit listing of copy keyword)
1 parent 03ace50 commit 2110b34

File tree

1 file changed

+45
-72
lines changed

1 file changed

+45
-72
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 45 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,8 @@ adding to the inconsistencies. This keyword is also redundant now with the intro
6565

6666
Given the above reasons, we are convinced that there is no need for neither the `inplace` nor the `copy` keyword (except
6767
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
68-
consistent and less confusing API.
68+
consistent and less confusing API. Removing the `copy` keyword is covered by PDEP-7 about Copy-on-Write,
69+
and this PDEP will focus on the `inplace` keyword.
6970

7071
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of operations
7172
consistent, and compatible with Copy-on-Write.
@@ -79,21 +80,18 @@ the inplace behaviour of DataFrame and Series _methods_.
7980
### Status Quo
8081

8182
Many methods in pandas currently have the ability to perform an operation inplace. For example, some methods such
82-
as ``DataFrame.insert``, only support inplace operations, while other methods use keywords such as ``copy``
83-
or ``inplace`` to control whether an operation is done inplace or not.
83+
as ``DataFrame.insert`` only support inplace operations, while other methods use the `inplace` keyword to control
84+
whether an operation is done inplace or not.
8485

8586
Unfortunately, many methods supporting the ``inplace`` keyword either cannot be done inplace, or make a copy as a
8687
consequence of the operations they perform, regardless of whether ``inplace`` is ``True`` or not. This, coupled with the
8788
fact that the ``inplace=True`` changes the return type of a method from a pandas object to ``None``, makes usage of
8889
the ``inplace`` keyword confusing and non-intuitive.
8990

90-
In addition, some methods, such as ``DataFrame.rename`` and ``DataFrame.rename_axis`` confusingly support both
91-
the ``copy`` and ``inplace`` keywords, with the value of ``inplace`` overwriting the value of ``copy``.
92-
9391
To summarize the status quo of inplace behavior of methods, we have divided methods that can operate inplace or have
94-
an ``inplace``/``copy`` keyword into 4 groups:
92+
an ``inplace`` keyword into 4 groups:
9593

96-
**Group 1: Methods that always operate inplace (no user-control with ``inplace``/``copy`` keyword) **
94+
**Group 1: Methods that always operate inplace (no user-control with ``inplace`` keyword)**
9795

9896
| Method Name |
9997
|:--------------|
@@ -125,58 +123,38 @@ the DataFrame or Series.
125123

126124
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
127125

128-
| Method Name | Keyword |
129-
|:----------------------------|-----------------------|
130-
| ``drop`` (dropping columns) | ``inplace`` |
131-
| ``rename`` | ``inplace``, ``copy`` |
132-
| ``rename_axis`` | ``inplace``, ``copy`` |
133-
| ``reset_index`` | ``inplace`` |
134-
| ``set_index`` | ``inplace`` |
135-
| ``astype`` | ``copy`` |
136-
| ``infer_objects`` | ``copy`` |
137-
| ``set_axis`` | ``copy`` |
138-
| ``set_flags`` | ``copy`` |
139-
| ``to_period`` | ``copy`` |
140-
| ``to_timestamp`` | ``copy`` |
141-
| ``tz_localize`` | ``copy`` |
142-
| ``tz_convert`` | ``copy`` |
143-
| ``Series.swaplevel``* | ``copy`` |
144-
| ``concat`` | ``copy`` |
145-
146-
\* The `copy` keyword is only available for `Series.swaplevel` and not for `DataFrame.swaplevel`.
126+
| Method Name |
127+
|:----------------------------|
128+
| ``drop`` (dropping columns) |
129+
| ``rename`` |
130+
| ``rename_axis`` |
131+
| ``reset_index`` |
132+
| ``set_index`` |
147133

148134
These methods can change the structure of the DataFrame or Series, such as changing the shape by adding or removing
149135
columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
150-
underlying data of the object.
151-
152-
All those methods (except for `set_flags`) make a copy of the full data by default, but can be performed inplace with
153-
avoiding copying all data (currently enabled with the `inplace` or `copy` keyword).
136+
underlying column data of the object.
154137

155-
Some of these methods only have a `copy` keyword instead of an `inplace`
156-
keyword. These allow the user to avoid a copy, but don't update the original object inplace and instead return a
157-
new object referencing the same data.
138+
All those methods make a copy of the full data by default, but can be performed inplace with
139+
avoiding copying all data (currently enabled with specifying `inplace=True`).
158140

159-
Two methods also have both keywords: `rename`, `rename_axis`, with the `inplace` keyword overriding `copy`.
141+
Note: there are also methods that have a `copy` keyword instead of an `inplace` keyword (e.g. `set_axis`). This serves
142+
a similar purpose (avoid copying all data), but those methods don't update the original object inplace and instead
143+
return a new object referencing the same data.
160144

161145
**Group 4: Methods that can never operate inplace**
162146

163-
| Method Name | Keyword |
164-
|:-----------------------|-----------|
165-
| `drop` (dropping rows) | `inplace` |
166-
| `dropna` | `inplace` |
167-
| `drop_duplicates` | `inplace` |
168-
| `sort_values` | `inplace` |
169-
| `sort_index` | `inplace` |
170-
| `eval` | `inplace` |
171-
| `query` | `inplace` |
172-
| `transpose` | `copy` |
173-
| `swapaxes` | `copy` |
174-
| `align` | `copy` |
175-
| `reindex` | `copy` |
176-
| `reindex_like` | `copy` |
177-
| `truncate` | `copy` |
178-
179-
Although these methods the `inplace`/`copy` keywords, they can never operate inplace because the nature of the
147+
| Method Name |
148+
|:-----------------------|
149+
| `drop` (dropping rows) |
150+
| `dropna` |
151+
| `drop_duplicates` |
152+
| `sort_values` |
153+
| `sort_index` |
154+
| `eval` |
155+
| `query` |
156+
157+
Although these methods have the `inplace` keyword, they can never operate inplace because the nature of the
180158
operation requires copying (such as reordering or dropping rows). For those methods, `inplace=True` is essentially just
181159
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
182160

@@ -188,14 +166,14 @@ implementation detail for the purpose of this PDEP.
188166

189167
The methods from group 1 won't change behavior, and will remain always inplace.
190168

191-
Methods in groups 3 and 4 will lose their `copy` and `inplace` keywords. Under Copy-on-Write, every operation will
192-
potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
193-
equivalent to behavior with `copy=False` and/or `inplace=True` for those methods. If users want to make a hard
194-
copy(`copy=True`), they can call the `copy()` method on the result of the operation.
169+
Methods in groups 3 and 4 will lose their `inplace` keyword. Under Copy-on-Write, every operation will
170+
potentially return a shallow copy of the input object, if the performed operation does not require a copy of the data. This is
171+
equivalent to the behavior with `inplace=True` for those methods. If users want to make a hard
172+
copy, they can call the `copy()` method on the result of the operation.
195173

196174
Therefore, there is no benefit of keeping the keywords around for these methods.
197175

198-
To emulate behavior of the `inplace` keyword, we can reassig the result of an operation to the same variable:
176+
To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
199177

200178
:::python
201179
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -222,7 +200,7 @@ methods.
222200

223201
#### With `inplace=True`, should we silently copy or raise an error if the data has references?
224202

225-
For those methods where we would keep the `inplace=True` option, there is a complication that actually operating inplace
203+
For those methods where we would keep the `inplace=True` option (group 2), there is a complication that actually operating inplace
226204
is not always possible.
227205

228206
For example,
@@ -324,12 +302,6 @@ Removing the `inplace` keyword is a breaking change, but since the affected beha
324302
behaviour when not specifying the keyword (i.e. `inplace=False`) will not change and the keyword itself can first be
325303
deprecated before it is removed.
326304

327-
Similarly for the `copy` keyword, this can be deprecated before it is removed.
328-
329-
There are some behaviour changes (for example the current `copy=False` returning a shallow copy will no longer be an "
330-
actual" shallow copy, but protected under Copy-on-Write), but those behaviour changes are covered by the Copy-on-Write
331-
proposal[^1].
332-
333305
## Rejected ideas
334306

335307
### Remove the `inplace` keyword altogether
@@ -348,21 +320,22 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
348320
### Standardize on the `copy` keyword instead of `inplace`
349321

350322
It may seem more natural to standardize on the `copy` keyword instead of the `inplace` keyword, since the `copy`
351-
keyword already returns a new object instead of None (enabling method chaining) when it is set to `True`.
323+
keyword already returns a new object instead of None (enabling method chaining) and avoids a coopy when it is set to `False`.
352324

353325
However, the `copy` keyword is not supported in any of the values-mutating methods listed in Group 2 above
354326
unlike `inplace`, so semantics of future inplace mutation of values align better with the current behavior of
355327
the `inplace` keyword, than with the current behavior of the `copy` keyword.
356328

357329
Furthermore, with the Copy-on-Write proposal, the `copy` keyword also has become superfluous. With Copy-on-Write
358330
enabled, methods that return a new pandas object will always try to avoid a copy whenever possible, regardless of
359-
a `copy=False` keyword. Thus, the general proposal is to actually remove the `copy` keyword from the methods where it is
360-
currently used.
361-
362-
Currently, for methods where it is supported, when the `copy` keyword is `False`, a new pandas object (same
363-
as `copy=True`) is returned as the result of a method call, with the values backing the object being shared when
364-
possible. With the proposed inplace behavior, current behavior of `copy=False` would return a new pandas object with
365-
identical values as the original object(that was modified inplace), which may be confusing for users, and lead to
331+
a `copy=False` keyword. Thus, the Copy-on-Write PDEP proposes to actually remove the `copy` keyword from the methods
332+
where it is currently used (so it would be strange to add this as a new keyword to the Group 2 methods).
333+
334+
Currently, when using `copy=False` in methods where it is supported, a new pandas object is returned as the result
335+
of a method call (same as with `copy=True`), but with the values backing this object being shared with the calling
336+
object when possible (but the calling object is never modified). With the proposed inplace behavior for Group 2 methods,
337+
a potential `copy=False` option would return a new pandas object with identical values as the original object (that
338+
was modified inplace, in contrast to current usage of `copy=False`), which may be confusing for users, and lead to
366339
ambiguity with Copy on Write rules.
367340

368341
## History

0 commit comments

Comments
 (0)