Skip to content

Commit 2ca875a

Browse files
jorisvandenbosschelithomas1
authored andcommitted
explain values-inplace vs object-inplace
1 parent 2110b34 commit 2ca875a

File tree

1 file changed

+37
-10
lines changed

1 file changed

+37
-10
lines changed

web/pandas/pdeps/0008-inplace-methods-in-pandas.md

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,32 @@ Many methods in pandas currently have the ability to perform an operation inplac
8383
as ``DataFrame.insert`` only support inplace operations, while other methods use the `inplace` keyword to control
8484
whether an operation is done inplace or not.
8585

86-
Unfortunately, many methods supporting the ``inplace`` keyword either cannot be done inplace, or make a copy as a
86+
While we generally speak about "inplace" operations, this term is used in various context. Broadly speaking,
87+
for this PDEP, we can distinguish two kinds of "inplace" operations:
88+
89+
* **"values-inplace"**: an operation that updates the underlying values of a Series or DataFrame columns inplace
90+
(without making a copy of the array).
91+
92+
As illustration, an example of such a values-inplace operation without using a method:
93+
94+
:::python
95+
# if the dtype is compatible, this setitem operation updates the underlying array inplace
96+
df.loc[0, "col"] = val
97+
98+
* **"object-inplace"**: an operation that updates a pandas DataFrame or Series _object_ inplace, but without
99+
updating existing column values inplace.
100+
101+
As illustration, an example of such an object-inplace operation without using a method:
102+
103+
:::python
104+
# we replace the Index on `df` inplace, but without actually updating any existing array
105+
df.index = pd.Index(...)
106+
107+
Object-inplace operations, while not actually modifying existing column values, keep
108+
(a subset of) those columns and thus can avoid copying the data of those existing columns.
109+
110+
In addition, several methods supporting the ``inplace`` keyword cannot actually be done inplace (in neither meaning)
111+
because they make a copy as a
87112
consequence of the operations they perform, regardless of whether ``inplace`` is ``True`` or not. This, coupled with the
88113
fact that the ``inplace=True`` changes the return type of a method from a pandas object to ``None``, makes usage of
89114
the ``inplace`` keyword confusing and non-intuitive.
@@ -98,12 +123,13 @@ an ``inplace`` keyword into 4 groups:
98123
| ``insert`` |
99124
| ``pop`` |
100125
| ``update`` |
101-
| ``isetitem``* |
126+
| ``isetitem`` |
102127

103-
\* Although ``isetitem`` operates on the original pandas object inplace, it will not change any existing values
104-
inplace (it will remove the values of the column being set, and insert new values).
128+
This group encompasses both kinds of inplace: `update` can be values-inplace, while the others are object-inplace
129+
(for example, although ``isetitem`` operates on the original pandas object inplace,
130+
it will not change any existing values inplace; rather it will remove the values of the column being set, and insert new values).
105131

106-
**Group 2: Methods that modify the underlying data of the DataFrame/Series object and can be done inplace**
132+
**Group 2: Methods that modify the underlying data of the DataFrame/Series object ("values-inplace")**
107133

108134
| Method Name |
109135
|:----------------|
@@ -121,7 +147,7 @@ These methods don't operate inplace by default, but can be done inplace with `in
121147
the structure of the DataFrame or Series intact (shape, row/column labels), but can mutate some elements of the data of
122148
the DataFrame or Series.
123149

124-
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values**
150+
**Group 3: Methods that modify the DataFrame/Series object, but not the pre-existing values ("object-inplace")**
125151

126152
| Method Name |
127153
|:----------------------------|
@@ -135,7 +161,7 @@ These methods can change the structure of the DataFrame or Series, such as chang
135161
columns, or changing the row/column labels (changing the index/columns attributes), but don't modify the existing
136162
underlying column data of the object.
137163

138-
All those methods make a copy of the full data by default, but can be performed inplace with
164+
All those methods make a copy of the full data by default, but can be performed object-inplace with
139165
avoiding copying all data (currently enabled with specifying `inplace=True`).
140166

141167
Note: there are also methods that have a `copy` keyword instead of an `inplace` keyword (e.g. `set_axis`). This serves
@@ -159,7 +185,8 @@ operation requires copying (such as reordering or dropping rows). For those meth
159185
syntactic sugar for reassigning the new result to the calling DataFrame/Series.
160186

161187
Note: in the case of a "no-op" (for example when sorting an already sorted DataFrame), some of those methods might not
162-
need to perform a copy. This currently happens with Copy-on-Write (regardless of `inplace`), but this is considered an
188+
need to perform a copy and could be considered as "object-inplace" in that case.
189+
This currently happens with Copy-on-Write (regardless of `inplace`), but this is considered an
163190
implementation detail for the purpose of this PDEP.
164191

165192
### Proposed changes and reasoning
@@ -171,7 +198,7 @@ potentially return a shallow copy of the input object, if the performed operatio
171198
equivalent to the behavior with `inplace=True` for those methods. If users want to make a hard
172199
copy, they can call the `copy()` method on the result of the operation.
173200

174-
Therefore, there is no benefit of keeping the keywords around for these methods.
201+
Therefore, there is no benefit of keeping the keyword around for these methods.
175202

176203
To emulate behavior of the `inplace` keyword, we can reassign the result of an operation to the same variable:
177204

@@ -320,7 +347,7 @@ DataFrames. Therefore, we decided to keep the `inplace` keyword for this small s
320347
### Standardize on the `copy` keyword instead of `inplace`
321348

322349
It may seem more natural to standardize on the `copy` keyword instead of the `inplace` keyword, since the `copy`
323-
keyword already returns a new object instead of None (enabling method chaining) and avoids a coopy when it is set to `False`.
350+
keyword already returns a new object instead of None (enabling method chaining) and avoids a copy when it is set to `False`.
324351

325352
However, the `copy` keyword is not supported in any of the values-mutating methods listed in Group 2 above
326353
unlike `inplace`, so semantics of future inplace mutation of values align better with the current behavior of

0 commit comments

Comments
 (0)