@@ -121,93 +121,94 @@ inplace (it will remove the values of the column being set, and insert new value
121
121
| `` bfill `` |
122
122
| `` clip `` |
123
123
124
- These methods don't operate inplace by default, but have the option to specify ` inlace =True` . All those methods leave
124
+ These methods don't operate inplace by default, but can be done inplace with ` inplace =True` . All those methods leave
125
125
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
126
126
the DataFrame or Series.
127
127
128
128
** Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
129
129
130
- | Method Name |
131
- | :----------------------------|
132
- | `` drop `` (dropping columns) |
133
- | `` eval `` |
134
- | `` rename `` |
135
- | `` rename_axis `` |
136
- | `` reset_index `` |
137
- | `` set_index `` |
138
- | `` astype `` |
139
- | `` infer_objects `` |
140
- | `` set_axis `` |
141
- | `` set_flags `` |
142
- | `` to_period `` |
143
- | `` to_timestamp `` |
144
- | `` tz_localize `` |
145
- | `` tz_convert `` |
146
- | `` swaplevel `` |
147
- | `` concat `` |
130
+ | Method Name | Keyword |
131
+ | :----------------------------| -----------------------|
132
+ | `` drop `` (dropping columns) | `` inplace `` |
133
+ | `` rename `` | `` inplace `` , `` copy `` |
134
+ | `` rename_axis `` | `` inplace `` , `` copy `` |
135
+ | `` reset_index `` | `` inplace `` |
136
+ | `` set_index `` | `` inplace `` |
137
+ | `` astype `` | `` copy `` |
138
+ | `` infer_objects `` | `` copy `` |
139
+ | `` set_axis `` | `` copy `` |
140
+ | `` set_flags `` | `` copy `` |
141
+ | `` to_period `` | `` copy `` |
142
+ | `` to_timestamp `` | `` copy `` |
143
+ | `` tz_localize `` | `` copy `` |
144
+ | `` tz_convert `` | `` copy `` |
145
+ | `` Series.swaplevel `` * | `` copy `` |
146
+ | `` concat `` | `` copy `` |
147
+
148
+ \* The ` copy ` keyword is only available for ` Series.swaplevel ` and not for ` DataFrame.swaplevel ` .
148
149
149
150
These methods can change the structure of the DataFrame or Series, such as changing the shape by adding or removing
150
151
columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
151
152
underlying data of the object.
153
+
152
154
All those methods (except for ` set_flags ` ) make a copy of the full data by default, but can be performed inplace with
153
155
avoiding copying all data (currently enabled with the ` inplace ` or ` copy ` keyword).
154
156
155
157
Some of these methods only have a ` copy ` keyword instead of an ` inplace `
156
- keyword: ` astype ` , ` infer_objects ` , ` set_axis ` , ` set_flags ` , ` to_period ` , ` to_timestamp ` , ` tz_localize ` , ` tz_convert ` , ` swaplevel ` , ` concat `
157
- and ` merge ` .
158
- These allow the user to avoid a copy, but don't update the original object inplace and instead return a new object
159
- referencing the same data.
158
+ keyword. These allow the user to avoid a copy, but don't update the original object inplace and instead return a
159
+ new object referencing the same data.
160
160
161
- Two methods also have both keywords: ` rename ` , ` rename_axis ` .
161
+ Two methods also have both keywords: ` rename ` , ` rename_axis ` , with the ` inplace ` keyword overriding ` copy ` .
162
162
163
163
** Group 4: Methods that can never operate inplace**
164
164
165
- | Method Name |
166
- | :-------------------------|
167
- | `` drop `` (dropping rows) |
168
- | `` dropna `` |
169
- | `` drop_duplicates `` |
170
- | `` sort_values `` |
171
- | `` sort_index `` |
172
- | `` query `` |
173
- | `` transpose `` |
174
- | `` swapaxes `` |
175
- | `` align `` |
176
- | `` reindex `` |
177
- | `` reindex_like `` |
178
- | `` truncate `` |
179
-
180
- These methods can never operate inplace because the nature of the operation requires copying (such as reordering or
181
- dropping rows). For those methods, ` inplace=True ` is essentially just synctactic sugar for reassigning the new result
182
- to ` self ` (the calling DataFrame).
165
+ | Method Name | Keyword |
166
+ | :-------------------------| -------------|
167
+ | ` drop ` (dropping rows) | ` inplace ` |
168
+ | ` dropna ` | ` inplace ` |
169
+ | ` drop_duplicates ` | ` inplace ` |
170
+ | ` sort_values ` | ` inplace ` |
171
+ | ` sort_index ` | ` inplace ` |
172
+ | ` eval ` | ` inplace ` |
173
+ | ` query ` | ` inplace ` |
174
+ | ` transpose ` | ` copy ` |
175
+ | ` swapaxes ` | ` copy ` |
176
+ | ` align ` | ` copy ` |
177
+ | ` reindex ` | ` copy ` |
178
+ | ` reindex_like ` | ` copy ` |
179
+ | ` truncate ` | ` copy ` |
180
+
181
+ Although all of these methods either ` inplace ` or ` copy ` , they can never operate inplace because the nature of the
182
+ operation requires copying (such as reordering or dropping rows). For those methods, ` inplace=True ` is essentially just
183
+ syntactic sugar for reassigning the new result to ` self ` (the calling DataFrame).
183
184
184
185
Note: in the case of a "no-op" (for example when sorting an already sorted DataFrame), some of those methods might not
185
- need to perform a copy. This currently happens with Copy-on-Write (regardless of `` inplace ` ` ), but this is considered an
186
+ need to perform a copy. This currently happens with Copy-on-Write (regardless of ` inplace ` ), but this is considered an
186
187
implementation detail for the purpose of this PDEP.
187
188
188
189
### Proposed changes and reasoning
189
190
190
191
The methods from group 1 won't change behavior, and will remain always inplace.
191
192
192
- Methods in groups 3 and 4 will lose their `` copy `` and `` inplace ` ` keywords. Under Copy-on-Write, every operation will
193
+ Methods in groups 3 and 4 will lose their ` copy ` and ` inplace ` keywords. Under Copy-on-Write, every operation will
193
194
potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
194
- equivalent to behavior with `` copy=False `` and/or `` inplace=True ` ` for those methods. If users want to make a hard
195
- copy(`` copy=True ` ` ), they can do:
195
+ equivalent to behavior with ` copy=False ` and/or ` inplace=True ` for those methods. If users want to make a hard
196
+ copy(` copy=True ` ), they can do:
196
197
197
198
:::python
198
199
df = df.func().copy()
199
200
200
201
Therefore, there is no benefit of keeping the keywords around for these methods.
201
202
202
- User can emulate behavior of the `` inplace ` ` keyword by assigning the result of an operation to the same variable:
203
+ User can emulate behavior of the ` inplace ` keyword by assigning the result of an operation to the same variable:
203
204
204
205
:::python
205
206
df = pd.DataFrame({"foo": [1, 2, 3]})
206
207
df = df.reset_index()
207
208
df.iloc[0, 1] = ...
208
209
209
- All references to the original object will go out of scope when the result of the `` reset_index ` ` operation is assigned
210
- to `` df `` . As a consequence, `` iloc ` ` will continue to operate inplace, and the underlying data will not be copied.
210
+ All references to the original object will go out of scope when the result of the ` reset_index ` operation is assigned
211
+ to ` df ` . As a consequence, ` iloc ` will continue to operate inplace, and the underlying data will not be copied.
211
212
212
213
The methods in group 2 behave different compared to the first three groups. These methods are actually able to operate
213
214
inplace because they only modify the underlying data.
@@ -220,7 +221,7 @@ If we follow the rules of Copy-on-Write[^1] where "any subset or returned series
220
221
the original, and thus never modifies the original", then there is no way of doing this operation inplace by default.
221
222
The original object would be modified before the reference goes out of scope.
222
223
223
- To avoid triggering a copy when a value would actually get replaced, we will keep the `` inplace ` ` argument for those
224
+ To avoid triggering a copy when a value would actually get replaced, we will keep the ` inplace ` argument for those
224
225
methods.
225
226
226
227
### Open Questions
@@ -238,7 +239,7 @@ For example,
238
239
239
240
can be performed inplace.
240
241
241
- This is only true if `` df ` ` does not share the values it stores with another pandas object. For example, the following
242
+ This is only true if ` df ` does not share the values it stores with another pandas object. For example, the following
242
243
operations
243
244
244
245
:::python
@@ -255,8 +256,8 @@ would be incompatible with the Copy-on-Write rules when actually done inplace. I
255
256
256
257
Raising an error here is problematic since oftentimes users do not have control over whether a method would cause a "
257
258
lazy copy" to be triggered under Copy-on-Write. It is also hard to fix, adding a ` copy() ` before calling a method
258
- with `` inplace=True ` ` might actually be worse than triggering the copy under the hood. We would only copy columns that
259
- share data with another object, not the whole object like `` .copy() ` ` would.
259
+ with ` inplace=True ` might actually be worse than triggering the copy under the hood. We would only copy columns that
260
+ share data with another object, not the whole object like ` .copy() ` would.
260
261
261
262
There is another possible variant, which would be to trigger the copy (like the first option), but have an option to
262
263
raise a warning whenever this happens.
@@ -305,13 +306,13 @@ was not inplace, since it is possible to go out of memory because of this.
305
306
The downsides of keeping the ` inplace=True ` option for certain methods, are that the return type of those methods will
306
307
now depend on the value of ` inplace ` , and that method chaining will no longer work.
307
308
308
- One way around this is to have the method return the original object that was operated on inplace when `` inplace=True ` ` .
309
+ One way around this is to have the method return the original object that was operated on inplace when ` inplace=True ` .
309
310
310
311
Advantages:
311
312
312
313
- It enables to use inplace operations in a method chain
313
314
- It simplifies type annotations
314
- - It enables to change the default for `` inplace ` ` to True under Copy-on-Write
315
+ - It enables to change the default for ` inplace ` to True under Copy-on-Write
315
316
316
317
Disadvantages:
317
318
@@ -320,7 +321,7 @@ Disadvantages:
320
321
returned (` df2 = df.method(inplace=True); assert df2 is df ` )
321
322
- It would change the behaviour of the current ` inplace=True `
322
323
323
- Given that `` inplace ` ` is already widely used by the pandas community, we would like to collect feedback about what the
324
+ Given that ` inplace ` is already widely used by the pandas community, we would like to collect feedback about what the
324
325
expected return type should be. Therefore, we will defer a decision on this until a later revision of this PDEP.
325
326
326
327
## Backward compatibility
@@ -339,11 +340,11 @@ proposal[^1].
339
340
340
341
### Remove the ` inplace ` keyword altogether
341
342
342
- In the past, it was considered to remove the `` inplace ` ` keyword entirely. This was because many operations that had
343
- the `` inplace ` ` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
343
+ In the past, it was considered to remove the ` inplace ` keyword entirely. This was because many operations that had
344
+ the ` inplace ` keyword did not actually operate inplace, but made a copy and re-assigned the underlying values under
344
345
the hood, causing confusion and providing no real benefit to users.
345
346
346
- Because a majority of the methods supporting `` inplace ` ` did not operate inplace, it was considered at the time to
347
+ Because a majority of the methods supporting ` inplace ` did not operate inplace, it was considered at the time to
347
348
deprecate and remove inplace from all methods, and add back the keyword as necessary.[ ^ 3 ]
348
349
349
350
For the subset of methods where the operation actually _ can_ be done inplace (group 2), however, removing the ` inplace `
@@ -352,7 +353,7 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
352
353
353
354
### Standardize on the ` copy ` keyword instead of ` inplace `
354
355
355
- It may seem more natural to standardize on the ` copy ` keyword instead of the ` inplace ` keyword, since the `` copy ` `
356
+ It may seem more natural to standardize on the ` copy ` keyword instead of the ` inplace ` keyword, since the ` copy `
356
357
keyword already returns a new object instead of None (enabling method chaining) when it is set to ` True ` .
357
358
358
359
However, the ` copy ` keyword is not supported in any of the values-mutating methods listed in Group 2 above
@@ -366,27 +367,27 @@ currently used.
366
367
367
368
Currently, for methods where it is supported, when the ` copy ` keyword is ` False ` , a new pandas object (same
368
369
as ` copy=True ` ) is returned as the result of a method call, with the values backing the object being shared when
369
- possible. With the proposed inplace behavior, current behavior of `` copy=False ` ` would return a new pandas object with
370
+ possible. With the proposed inplace behavior, current behavior of ` copy=False ` would return a new pandas object with
370
371
identical values as the original object(that was modified inplace), which may be confusing for users, and lead to
371
372
ambiguity with Copy on Write rules.
372
373
373
374
## History
374
375
375
- The future of the `` inplace ` ` keyword is something that has been debated a lot over the years.
376
+ The future of the ` inplace ` keyword is something that has been debated a lot over the years.
376
377
377
378
It may be helpful to review those discussions (see links) [ ^ 2 ] [ ^ 3 ] [ ^ 4 ] to better understand this PDEP.
378
379
379
380
## Timeline
380
381
381
382
Copy-on-Write is a relatively new feature (added in version 1.5) and some methods are missing the "lazy copy"
382
- optimization (equivalent to `` copy=False ` ` ).
383
+ optimization (equivalent to ` copy=False ` ).
383
384
384
- Therefore, we will start showing deprecation warnings for the `` copy `` and `` inplace ` ` parameters in pandas 2.1, to
385
+ Therefore, we will start showing deprecation warnings for the ` copy ` and ` inplace ` parameters in pandas 2.1, to
385
386
allow for bugs with Copy-on-Write to be addressed and for more optimizations to be added.
386
387
387
388
Hopefully, users will be able to switch to Copy-on-Write to keep the no-copy behavior and to silence the warnings.
388
389
389
- The full removal of the `` copy `` parameter and `` inplace ` ` (where necessary) is set for pandas 3.0, which will coincide
390
+ The full removal of the ` copy ` parameter and ` inplace ` (where necessary) is set for pandas 3.0, which will coincide
390
391
with the enablement of Copy-on-Write for pandas by default.
391
392
392
393
## PDEP History
0 commit comments