Skip to content

Commit e1194a5

Browse files
committed
Update from feedback
1 parent 6984c7e commit e1194a5

File tree

1 file changed

+35
-5
lines changed

1 file changed

+35
-5
lines changed

web/pandas/pdeps/0011-dropna-default.md

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ print(ser.value_counts())
2525
dtype: Int64
2626
```
2727

28-
users may be surprised that the Series can contain NA values. By then operating
28+
users may be surprised that the Series can contain NA values, as is argued in
29+
[#21890](https://github.com/pandas-dev/pandas/issues/21890). By then operating
2930
on data under the assumption NA values are not present, erroroneous results can
3031
arise. The same issue can occur with `groupby`, which can also be used to produce
3132
detailed summary statistics of data. We think it is not unreasonable that an
@@ -43,11 +44,35 @@ This is correct, except that NA values in the column `a` will be dropped from
4344
the computation. That pandas is taking this additional step in the computation
4445
is not apparent from the code, and can surprise users.
4546

47+
###
48+
49+
### Keeping the default `skipna=True`
50+
51+
Many reductions methods, such as `sum`, `mean`, and `var`, have a `skipna` argument.
52+
In such operations, setting `skipna=False` would make the output of any operation
53+
NA if a single NA value is encountered.
54+
55+
```python
56+
df = pd.DataFrame({'a': [1, np.nan], 'b': [2, np.nan]})
57+
print(df.sum(skipna=False))
58+
# a NaN
59+
# b NaN
60+
# dtype: float64
61+
```
62+
63+
This makes `skipna=False` an undesirable default. In the methods with `dropna`, this phenomena does not occur. By defaulting to `dropna=False` in these
64+
methods, the results when NA values are encountered do not obscure the results of non-NA values.
65+
66+
### Possible deprecation of `dropna`
67+
68+
This PDEP takes no position on whether some methods with a `dropna` argument should have said argument deprecated.
69+
However, if such a deprecation is to be pursued, then we believe that the final behavior should
70+
be that of `dropna=False` across any of the methods listed below. With this, a necessary first step
71+
in the deprecation process would be to change the default value to `dropna=False`.
72+
4673
## Detailed Description
4774

48-
We propose to deprecate the current default of `dropna` and change it to
49-
`False` across all applicable methods. The following methods have a dropna
50-
argument, those marked with a `*` already default to `False`.
75+
The following methods have a dropna argument, those marked with a `*` already default to `False`.
5176

5277
```python
5378
Series.groupby
@@ -68,10 +93,15 @@ DataFrameGroupBy.nunique
6893
DataFrameGroupBy.value_counts
6994
```
7095

96+
We propose to deprecate the current default of `dropna` and change it to
97+
`False` across all methods listed above.
98+
7199
## Timeline
72100

73101
If accepted, the current `dropna` default would be deprecated as part of pandas
74-
2.x and this deprecation would be enforced in pandas 3.0.
102+
2.x and this deprecation would be enforced in pandas 3.0. In pandas 2.x, `FutureWarning` messages would
103+
be emitted on any calls to these methods where the value of `dropna` is unspecified and
104+
an NA value is present.
75105

76106
## PDEP History
77107

0 commit comments

Comments
 (0)