@@ -65,7 +65,8 @@ adding to the inconsistencies. This keyword is also redundant now with the intro
65
65
66
66
Given the above reasons, we are convinced that there is no need for neither the ` inplace ` nor the ` copy ` keyword (except
67
67
for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
68
- consistent and less confusing API.
68
+ consistent and less confusing API. Removing the ` copy ` keyword is covered by PDEP-7 about Copy-on-Write,
69
+ and this PDEP will focus on the ` inplace ` keyword.
69
70
70
71
Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of operations
71
72
consistent, and compatible with Copy-on-Write.
@@ -79,21 +80,18 @@ the inplace behaviour of DataFrame and Series _methods_.
79
80
### Status Quo
80
81
81
82
Many methods in pandas currently have the ability to perform an operation inplace. For example, some methods such
82
- as `` DataFrame.insert `` , only support inplace operations, while other methods use keywords such as `` copy ``
83
- or `` inplace `` to control whether an operation is done inplace or not.
83
+ as `` DataFrame.insert `` only support inplace operations, while other methods use the ` inplace ` keyword to control
84
+ whether an operation is done inplace or not.
84
85
85
86
Unfortunately, many methods supporting the `` inplace `` keyword either cannot be done inplace, or make a copy as a
86
87
consequence of the operations they perform, regardless of whether `` inplace `` is `` True `` or not. This, coupled with the
87
88
fact that the `` inplace=True `` changes the return type of a method from a pandas object to `` None `` , makes usage of
88
89
the `` inplace `` keyword confusing and non-intuitive.
89
90
90
- In addition, some methods, such as `` DataFrame.rename `` and `` DataFrame.rename_axis `` confusingly support both
91
- the `` copy `` and `` inplace `` keywords, with the value of `` inplace `` overwriting the value of `` copy `` .
92
-
93
91
To summarize the status quo of inplace behavior of methods, we have divided methods that can operate inplace or have
94
- an `` inplace `` / `` copy `` keyword into 4 groups:
92
+ an `` inplace `` keyword into 4 groups:
95
93
96
- ** Group 1: Methods that always operate inplace (no user-control with `` inplace `` / `` copy `` keyword) **
94
+ ** Group 1: Methods that always operate inplace (no user-control with `` inplace `` keyword)**
97
95
98
96
| Method Name |
99
97
| :--------------|
@@ -125,58 +123,38 @@ the DataFrame or Series.
125
123
126
124
** Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
127
125
128
- | Method Name | Keyword |
129
- | :----------------------------| -----------------------|
130
- | `` drop `` (dropping columns) | `` inplace `` |
131
- | `` rename `` | `` inplace `` , `` copy `` |
132
- | `` rename_axis `` | `` inplace `` , `` copy `` |
133
- | `` reset_index `` | `` inplace `` |
134
- | `` set_index `` | `` inplace `` |
135
- | `` astype `` | `` copy `` |
136
- | `` infer_objects `` | `` copy `` |
137
- | `` set_axis `` | `` copy `` |
138
- | `` set_flags `` | `` copy `` |
139
- | `` to_period `` | `` copy `` |
140
- | `` to_timestamp `` | `` copy `` |
141
- | `` tz_localize `` | `` copy `` |
142
- | `` tz_convert `` | `` copy `` |
143
- | `` Series.swaplevel `` * | `` copy `` |
144
- | `` concat `` | `` copy `` |
145
-
146
- \* The ` copy ` keyword is only available for ` Series.swaplevel ` and not for ` DataFrame.swaplevel ` .
126
+ | Method Name |
127
+ | :----------------------------|
128
+ | `` drop `` (dropping columns) |
129
+ | `` rename `` |
130
+ | `` rename_axis `` |
131
+ | `` reset_index `` |
132
+ | `` set_index `` |
147
133
148
134
These methods can change the structure of the DataFrame or Series, such as changing the shape by adding or removing
149
135
columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
150
- underlying data of the object.
151
-
152
- All those methods (except for ` set_flags ` ) make a copy of the full data by default, but can be performed inplace with
153
- avoiding copying all data (currently enabled with the ` inplace ` or ` copy ` keyword).
136
+ underlying column data of the object.
154
137
155
- Some of these methods only have a ` copy ` keyword instead of an ` inplace `
156
- keyword. These allow the user to avoid a copy, but don't update the original object inplace and instead return a
157
- new object referencing the same data.
138
+ All those methods make a copy of the full data by default, but can be performed inplace with
139
+ avoiding copying all data (currently enabled with specifying ` inplace=True ` ).
158
140
159
- Two methods also have both keywords: ` rename ` , ` rename_axis ` , with the ` inplace ` keyword overriding ` copy ` .
141
+ Note: there are also methods that have a ` copy ` keyword instead of an ` inplace ` keyword (e.g. ` set_axis ` ). This serves
142
+ a similar purpose (avoid copying all data), but those methods don't update the original object inplace and instead
143
+ return a new object referencing the same data.
160
144
161
145
** Group 4: Methods that can never operate inplace**
162
146
163
- | Method Name | Keyword |
164
- | :-----------------------| -----------|
165
- | ` drop ` (dropping rows) | ` inplace ` |
166
- | ` dropna ` | ` inplace ` |
167
- | ` drop_duplicates ` | ` inplace ` |
168
- | ` sort_values ` | ` inplace ` |
169
- | ` sort_index ` | ` inplace ` |
170
- | ` eval ` | ` inplace ` |
171
- | ` query ` | ` inplace ` |
172
- | ` transpose ` | ` copy ` |
173
- | ` swapaxes ` | ` copy ` |
174
- | ` align ` | ` copy ` |
175
- | ` reindex ` | ` copy ` |
176
- | ` reindex_like ` | ` copy ` |
177
- | ` truncate ` | ` copy ` |
178
-
179
- Although these methods the ` inplace ` /` copy ` keywords, they can never operate inplace because the nature of the
147
+ | Method Name |
148
+ | :-----------------------|
149
+ | ` drop ` (dropping rows) |
150
+ | ` dropna ` |
151
+ | ` drop_duplicates ` |
152
+ | ` sort_values ` |
153
+ | ` sort_index ` |
154
+ | ` eval ` |
155
+ | ` query ` |
156
+
157
+ Although these methods have the ` inplace ` keyword, they can never operate inplace because the nature of the
180
158
operation requires copying (such as reordering or dropping rows). For those methods, ` inplace=True ` is essentially just
181
159
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
182
160
@@ -188,14 +166,14 @@ implementation detail for the purpose of this PDEP.
188
166
189
167
The methods from group 1 won't change behavior, and will remain always inplace.
190
168
191
- Methods in groups 3 and 4 will lose their ` copy ` and ` inplace ` keywords . Under Copy-on-Write, every operation will
192
- potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
193
- equivalent to behavior with ` copy=False ` and/or ` inplace=True ` for those methods. If users want to make a hard
194
- copy( ` copy=True ` ) , they can call the ` copy() ` method on the result of the operation.
169
+ Methods in groups 3 and 4 will lose their ` inplace ` keyword . Under Copy-on-Write, every operation will
170
+ potentially return a shallow copy of the input object, if the performed operation does not require a copy of the data . This is
171
+ equivalent to the behavior with ` inplace=True ` for those methods. If users want to make a hard
172
+ copy, they can call the ` copy() ` method on the result of the operation.
195
173
196
174
Therefore, there is no benefit of keeping the keywords around for these methods.
197
175
198
- To emulate behavior of the ` inplace ` keyword, we can reassig the result of an operation to the same variable:
176
+ To emulate behavior of the ` inplace ` keyword, we can reassign the result of an operation to the same variable:
199
177
200
178
:::python
201
179
df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -222,7 +200,7 @@ methods.
222
200
223
201
#### With ` inplace=True ` , should we silently copy or raise an error if the data has references?
224
202
225
- For those methods where we would keep the ` inplace=True ` option, there is a complication that actually operating inplace
203
+ For those methods where we would keep the ` inplace=True ` option (group 2) , there is a complication that actually operating inplace
226
204
is not always possible.
227
205
228
206
For example,
@@ -324,12 +302,6 @@ Removing the `inplace` keyword is a breaking change, but since the affected beha
324
302
behaviour when not specifying the keyword (i.e. ` inplace=False ` ) will not change and the keyword itself can first be
325
303
deprecated before it is removed.
326
304
327
- Similarly for the ` copy ` keyword, this can be deprecated before it is removed.
328
-
329
- There are some behaviour changes (for example the current ` copy=False ` returning a shallow copy will no longer be an "
330
- actual" shallow copy, but protected under Copy-on-Write), but those behaviour changes are covered by the Copy-on-Write
331
- proposal[ ^ 1 ] .
332
-
333
305
## Rejected ideas
334
306
335
307
### Remove the ` inplace ` keyword altogether
@@ -348,21 +320,22 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
348
320
### Standardize on the ` copy ` keyword instead of ` inplace `
349
321
350
322
It may seem more natural to standardize on the ` copy ` keyword instead of the ` inplace ` keyword, since the ` copy `
351
- keyword already returns a new object instead of None (enabling method chaining) when it is set to ` True ` .
323
+ keyword already returns a new object instead of None (enabling method chaining) and avoids a coopy when it is set to ` False ` .
352
324
353
325
However, the ` copy ` keyword is not supported in any of the values-mutating methods listed in Group 2 above
354
326
unlike ` inplace ` , so semantics of future inplace mutation of values align better with the current behavior of
355
327
the ` inplace ` keyword, than with the current behavior of the ` copy ` keyword.
356
328
357
329
Furthermore, with the Copy-on-Write proposal, the ` copy ` keyword also has become superfluous. With Copy-on-Write
358
330
enabled, methods that return a new pandas object will always try to avoid a copy whenever possible, regardless of
359
- a ` copy=False ` keyword. Thus, the general proposal is to actually remove the ` copy ` keyword from the methods where it is
360
- currently used.
361
-
362
- Currently, for methods where it is supported, when the ` copy ` keyword is ` False ` , a new pandas object (same
363
- as ` copy=True ` ) is returned as the result of a method call, with the values backing the object being shared when
364
- possible. With the proposed inplace behavior, current behavior of ` copy=False ` would return a new pandas object with
365
- identical values as the original object(that was modified inplace), which may be confusing for users, and lead to
331
+ a ` copy=False ` keyword. Thus, the Copy-on-Write PDEP proposes to actually remove the ` copy ` keyword from the methods
332
+ where it is currently used (so it would be strange to add this as a new keyword to the Group 2 methods).
333
+
334
+ Currently, when using ` copy=False ` in methods where it is supported, a new pandas object is returned as the result
335
+ of a method call (same as with ` copy=True ` ), but with the values backing this object being shared with the calling
336
+ object when possible (but the calling object is never modified). With the proposed inplace behavior for Group 2 methods,
337
+ a potential ` copy=False ` option would return a new pandas object with identical values as the original object (that
338
+ was modified inplace, in contrast to current usage of ` copy=False ` ), which may be confusing for users, and lead to
366
339
ambiguity with Copy on Write rules.
367
340
368
341
## History
0 commit comments