Skip to content

Commit b5663cc

Browse files
Apply suggestions from code review
Co-authored-by: Irv Lustig <[email protected]>
1 parent 9c5342a commit b5663cc

File tree

1 file changed

+10
-9
lines changed

1 file changed

+10
-9
lines changed

web/pandas/pdeps/0014-string-dtype.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,8 @@ should not have a direct impact on users (except for performance).
185185

186186
For the original variant of `StringDtype` using `pd.NA`, currently the default
187187
storage is `"python"` (the object-dtype based implementation). Also for this
188-
variant, it is proposed follow the same logic for determining the default
189-
storage, i.e. the default to `"pyarrow"` if available, and otherwise
188+
variant, it is proposed to follow the same logic for determining the default
189+
storage, i.e. default to `"pyarrow"` if available, and otherwise
190190
fall back to `"python"`.
191191

192192
### Naming
@@ -214,8 +214,8 @@ Currently (pandas 2.2), `StringDtype(storage="pyarrow_numpy")` is used for the n
214214
where the `"pyarrow_numpy"` storage was used to disambiguate from the existing
215215
`"pyarrow"` option using `pd.NA`. However, `"pyarrow_numpy"` is a rather confusing
216216
option and doesn't generalize well. Therefore, this PDEP proposes a new naming
217-
scheme as outlined below, and `"pyarrow_numpy"` will be deprecated and removed
218-
before pandas 3.0.
217+
scheme as outlined below, and `"pyarrow_numpy"` will be deprecated as an alias
218+
in pandas 2.3 and removed in pandas 3.0.
219219

220220
The `storage` keyword of `StringDtype` is kept to disambiguate the underlying
221221
storage of the string data (using pyarrow or python objects), but an additional
@@ -240,7 +240,7 @@ Notes:
240240

241241
- (1) You get "pyarrow" or "python" depending on pyarrow being installed.
242242
- (2) "pyarrow_numpy" is kept temporarily because this is already in a released
243-
version, but we can deprecate it in 2.x and have it removed for 3.0.
243+
version, but it will be deprecated it in 2.x and removed for 3.0.
244244

245245
For the new default string dtype, only the `"str"` alias can be used to
246246
specify the dtype as a string, i.e. pandas would not provide a way to make the
@@ -304,7 +304,8 @@ when explicitly opting into this.
304304
An initial version of this PDEP proposed to use the `"string"` alias and the
305305
default `pd.StringDtype()` class constructor for the new default dtype.
306306
However, that caused a lot of discussion around backwards compatibility for
307-
existing users of the `StringDtype` using `pd.NA`.
307+
existing users of `dtype=pd.StringDtype()` and `dtype="string"`, that uses
308+
`pd.NA` to represent missing values.
308309

309310
During the discussion, several alternatives have been brought up. Both
310311
alternative keyword names as using a different constructor. In the end,
@@ -340,7 +341,7 @@ backwards compatible on this aspect. When storing strings in object dtype, panda
340341
however did allow using `None` as the missing value indicator as well (and in
341342
certain cases such as the `shift` method, pandas even introduced this itself).
342343
For all the cases where currently `None` was used as the missing value sentinel,
343-
this will change to use `NaN` consistently.
344+
this will change to consistently use `NaN`.
344345

345346
### For existing users of `StringDtype`
346347

@@ -350,8 +351,8 @@ the behaviour of `dtype="string"` or `dtype=pd.StringDtype()` to mean the
350351
`pd.NA` variant of the dtype.
351352

352353
It does propose the change the default storage to `"pyarrow"` (if available) for
353-
the opt-in `pd.NA` variant as well, but this should not have much user-visible
354-
impact.
354+
the opt-in `pd.NA` variant as well, but this should have limited, if any,
355+
user-visible impact.
355356

356357
## Timeline
357358

0 commit comments

Comments
 (0)