@@ -65,7 +65,8 @@ adding to the inconsistencies. This keyword is also redundant now with the intro
6565
6666Given the above reasons, we are convinced that there is no need for neither the ` inplace ` nor the ` copy ` keyword (except
6767for a small subset of methods that can actually update data inplace). Removing those keywords will give a more
68- consistent and less confusing API.
68+ consistent and less confusing API. Removing the ` copy ` keyword is covered by PDEP-7 about Copy-on-Write,
69+ and this PDEP will focus on the ` inplace ` keyword.
6970
7071Thus, in this PDEP, we aim to standardize behavior across methods to make control of inplace-ness of operations
7172consistent, and compatible with Copy-on-Write.
@@ -79,21 +80,18 @@ the inplace behaviour of DataFrame and Series _methods_.
7980### Status Quo
8081
8182Many methods in pandas currently have the ability to perform an operation inplace. For example, some methods such
82- as `` DataFrame.insert `` , only support inplace operations, while other methods use keywords such as `` copy ``
83- or `` inplace `` to control whether an operation is done inplace or not.
83+ as `` DataFrame.insert `` only support inplace operations, while other methods use the ` inplace ` keyword to control
84+ whether an operation is done inplace or not.
8485
8586Unfortunately, many methods supporting the `` inplace `` keyword either cannot be done inplace, or make a copy as a
8687consequence of the operations they perform, regardless of whether `` inplace `` is `` True `` or not. This, coupled with the
8788fact that the `` inplace=True `` changes the return type of a method from a pandas object to `` None `` , makes usage of
8889the `` inplace `` keyword confusing and non-intuitive.
8990
90- In addition, some methods, such as `` DataFrame.rename `` and `` DataFrame.rename_axis `` confusingly support both
91- the `` copy `` and `` inplace `` keywords, with the value of `` inplace `` overwriting the value of `` copy `` .
92-
9391To summarize the status quo of inplace behavior of methods, we have divided methods that can operate inplace or have
94- an `` inplace `` / `` copy `` keyword into 4 groups:
92+ an `` inplace `` keyword into 4 groups:
9593
96- ** Group 1: Methods that always operate inplace (no user-control with `` inplace `` / `` copy `` keyword) **
94+ ** Group 1: Methods that always operate inplace (no user-control with `` inplace `` keyword)**
9795
9896| Method Name |
9997| :--------------|
@@ -125,58 +123,38 @@ the DataFrame or Series.
125123
126124** Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
127125
128- | Method Name | Keyword |
129- | :----------------------------| -----------------------|
130- | `` drop `` (dropping columns) | `` inplace `` |
131- | `` rename `` | `` inplace `` , `` copy `` |
132- | `` rename_axis `` | `` inplace `` , `` copy `` |
133- | `` reset_index `` | `` inplace `` |
134- | `` set_index `` | `` inplace `` |
135- | `` astype `` | `` copy `` |
136- | `` infer_objects `` | `` copy `` |
137- | `` set_axis `` | `` copy `` |
138- | `` set_flags `` | `` copy `` |
139- | `` to_period `` | `` copy `` |
140- | `` to_timestamp `` | `` copy `` |
141- | `` tz_localize `` | `` copy `` |
142- | `` tz_convert `` | `` copy `` |
143- | `` Series.swaplevel `` * | `` copy `` |
144- | `` concat `` | `` copy `` |
145-
146- \* The ` copy ` keyword is only available for ` Series.swaplevel ` and not for ` DataFrame.swaplevel ` .
126+ | Method Name |
127+ | :----------------------------|
128+ | `` drop `` (dropping columns) |
129+ | `` rename `` |
130+ | `` rename_axis `` |
131+ | `` reset_index `` |
132+ | `` set_index `` |
147133
148134These methods can change the structure of the DataFrame or Series, such as changing the shape by adding or removing
149135columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
150- underlying data of the object.
151-
152- All those methods (except for ` set_flags ` ) make a copy of the full data by default, but can be performed inplace with
153- avoiding copying all data (currently enabled with the ` inplace ` or ` copy ` keyword).
136+ underlying column data of the object.
154137
155- Some of these methods only have a ` copy ` keyword instead of an ` inplace `
156- keyword. These allow the user to avoid a copy, but don't update the original object inplace and instead return a
157- new object referencing the same data.
138+ All those methods make a copy of the full data by default, but can be performed inplace with
139+ avoiding copying all data (currently enabled with specifying ` inplace=True ` ).
158140
159- Two methods also have both keywords: ` rename ` , ` rename_axis ` , with the ` inplace ` keyword overriding ` copy ` .
141+ Note: there are also methods that have a ` copy ` keyword instead of an ` inplace ` keyword (e.g. ` set_axis ` ). This serves
142+ a similar purpose (avoid copying all data), but those methods don't update the original object inplace and instead
143+ return a new object referencing the same data.
160144
161145** Group 4: Methods that can never operate inplace**
162146
163- | Method Name | Keyword |
164- | :-----------------------| -----------|
165- | ` drop ` (dropping rows) | ` inplace ` |
166- | ` dropna ` | ` inplace ` |
167- | ` drop_duplicates ` | ` inplace ` |
168- | ` sort_values ` | ` inplace ` |
169- | ` sort_index ` | ` inplace ` |
170- | ` eval ` | ` inplace ` |
171- | ` query ` | ` inplace ` |
172- | ` transpose ` | ` copy ` |
173- | ` swapaxes ` | ` copy ` |
174- | ` align ` | ` copy ` |
175- | ` reindex ` | ` copy ` |
176- | ` reindex_like ` | ` copy ` |
177- | ` truncate ` | ` copy ` |
178-
179- Although these methods the ` inplace ` /` copy ` keywords, they can never operate inplace because the nature of the
147+ | Method Name |
148+ | :-----------------------|
149+ | ` drop ` (dropping rows) |
150+ | ` dropna ` |
151+ | ` drop_duplicates ` |
152+ | ` sort_values ` |
153+ | ` sort_index ` |
154+ | ` eval ` |
155+ | ` query ` |
156+
157+ Although these methods have the ` inplace ` keyword, they can never operate inplace because the nature of the
180158operation requires copying (such as reordering or dropping rows). For those methods, ` inplace=True ` is essentially just
181159syntactic sugar for reassigning the new result to the calling DataFrame/Series.
182160
@@ -188,14 +166,14 @@ implementation detail for the purpose of this PDEP.
188166
189167The methods from group 1 won't change behavior, and will remain always inplace.
190168
191- Methods in groups 3 and 4 will lose their ` copy ` and ` inplace ` keywords . Under Copy-on-Write, every operation will
192- potentially return a shallow copy of the input object, if the performed operation does not require a copy. This is
193- equivalent to behavior with ` copy=False ` and/or ` inplace=True ` for those methods. If users want to make a hard
194- copy( ` copy=True ` ) , they can call the ` copy() ` method on the result of the operation.
169+ Methods in groups 3 and 4 will lose their ` inplace ` keyword . Under Copy-on-Write, every operation will
170+ potentially return a shallow copy of the input object, if the performed operation does not require a copy of the data . This is
171+ equivalent to the behavior with ` inplace=True ` for those methods. If users want to make a hard
172+ copy, they can call the ` copy() ` method on the result of the operation.
195173
196174Therefore, there is no benefit of keeping the keywords around for these methods.
197175
198- To emulate behavior of the ` inplace ` keyword, we can reassig the result of an operation to the same variable:
176+ To emulate behavior of the ` inplace ` keyword, we can reassign the result of an operation to the same variable:
199177
200178 :::python
201179 df = pd.DataFrame({"foo": [1, 2, 3]})
@@ -222,7 +200,7 @@ methods.
222200
223201#### With ` inplace=True ` , should we silently copy or raise an error if the data has references?
224202
225- For those methods where we would keep the ` inplace=True ` option, there is a complication that actually operating inplace
203+ For those methods where we would keep the ` inplace=True ` option (group 2) , there is a complication that actually operating inplace
226204is not always possible.
227205
228206For example,
@@ -324,12 +302,6 @@ Removing the `inplace` keyword is a breaking change, but since the affected beha
324302behaviour when not specifying the keyword (i.e. ` inplace=False ` ) will not change and the keyword itself can first be
325303deprecated before it is removed.
326304
327- Similarly for the ` copy ` keyword, this can be deprecated before it is removed.
328-
329- There are some behaviour changes (for example the current ` copy=False ` returning a shallow copy will no longer be an "
330- actual" shallow copy, but protected under Copy-on-Write), but those behaviour changes are covered by the Copy-on-Write
331- proposal[ ^ 1 ] .
332-
333305## Rejected ideas
334306
335307### Remove the ` inplace ` keyword altogether
@@ -348,21 +320,22 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
348320### Standardize on the ` copy ` keyword instead of ` inplace `
349321
350322It may seem more natural to standardize on the ` copy ` keyword instead of the ` inplace ` keyword, since the ` copy `
351- keyword already returns a new object instead of None (enabling method chaining) when it is set to ` True ` .
323+ keyword already returns a new object instead of None (enabling method chaining) and avoids a coopy when it is set to ` False ` .
352324
353325However, the ` copy ` keyword is not supported in any of the values-mutating methods listed in Group 2 above
354326unlike ` inplace ` , so semantics of future inplace mutation of values align better with the current behavior of
355327the ` inplace ` keyword, than with the current behavior of the ` copy ` keyword.
356328
357329Furthermore, with the Copy-on-Write proposal, the ` copy ` keyword also has become superfluous. With Copy-on-Write
358330enabled, methods that return a new pandas object will always try to avoid a copy whenever possible, regardless of
359- a ` copy=False ` keyword. Thus, the general proposal is to actually remove the ` copy ` keyword from the methods where it is
360- currently used.
361-
362- Currently, for methods where it is supported, when the ` copy ` keyword is ` False ` , a new pandas object (same
363- as ` copy=True ` ) is returned as the result of a method call, with the values backing the object being shared when
364- possible. With the proposed inplace behavior, current behavior of ` copy=False ` would return a new pandas object with
365- identical values as the original object(that was modified inplace), which may be confusing for users, and lead to
331+ a ` copy=False ` keyword. Thus, the Copy-on-Write PDEP proposes to actually remove the ` copy ` keyword from the methods
332+ where it is currently used (so it would be strange to add this as a new keyword to the Group 2 methods).
333+
334+ Currently, when using ` copy=False ` in methods where it is supported, a new pandas object is returned as the result
335+ of a method call (same as with ` copy=True ` ), but with the values backing this object being shared with the calling
336+ object when possible (but the calling object is never modified). With the proposed inplace behavior for Group 2 methods,
337+ a potential ` copy=False ` option would return a new pandas object with identical values as the original object (that
338+ was modified inplace, in contrast to current usage of ` copy=False ` ), which may be confusing for users, and lead to
366339ambiguity with Copy on Write rules.
367340
368341## History
0 commit comments