@@ -185,8 +185,8 @@ should not have a direct impact on users (except for performance).
185
185
186
186
For the original variant of ` StringDtype ` using ` pd.NA ` , currently the default
187
187
storage is ` "python" ` (the object-dtype based implementation). Also for this
188
- variant, it is proposed follow the same logic for determining the default
189
- storage, i.e. the default to ` "pyarrow" ` if available, and otherwise
188
+ variant, it is proposed to follow the same logic for determining the default
189
+ storage, i.e. default to ` "pyarrow" ` if available, and otherwise
190
190
fall back to ` "python" ` .
191
191
192
192
### Naming
@@ -214,8 +214,8 @@ Currently (pandas 2.2), `StringDtype(storage="pyarrow_numpy")` is used for the n
214
214
where the ` "pyarrow_numpy" ` storage was used to disambiguate from the existing
215
215
` "pyarrow" ` option using ` pd.NA ` . However, ` "pyarrow_numpy" ` is a rather confusing
216
216
option and doesn't generalize well. Therefore, this PDEP proposes a new naming
217
- scheme as outlined below, and ` "pyarrow_numpy" ` will be deprecated and removed
218
- before pandas 3.0.
217
+ scheme as outlined below, and ` "pyarrow_numpy" ` will be deprecated as an alias
218
+ in pandas 2.3 and removed in pandas 3.0.
219
219
220
220
The ` storage ` keyword of ` StringDtype ` is kept to disambiguate the underlying
221
221
storage of the string data (using pyarrow or python objects), but an additional
@@ -240,7 +240,7 @@ Notes:
240
240
241
241
- (1) You get "pyarrow" or "python" depending on pyarrow being installed.
242
242
- (2) "pyarrow_numpy" is kept temporarily because this is already in a released
243
- version, but we can deprecate it in 2.x and have it removed for 3.0.
243
+ version, but it will be deprecated it in 2.x and removed for 3.0.
244
244
245
245
For the new default string dtype, only the ` "str" ` alias can be used to
246
246
specify the dtype as a string, i.e. pandas would not provide a way to make the
@@ -304,7 +304,8 @@ when explicitly opting into this.
304
304
An initial version of this PDEP proposed to use the ` "string" ` alias and the
305
305
default ` pd.StringDtype() ` class constructor for the new default dtype.
306
306
However, that caused a lot of discussion around backwards compatibility for
307
- existing users of the ` StringDtype ` using ` pd.NA ` .
307
+ existing users of ` dtype=pd.StringDtype() ` and ` dtype="string" ` , that uses
308
+ ` pd.NA ` to represent missing values.
308
309
309
310
During the discussion, several alternatives have been brought up. Both
310
311
alternative keyword names as using a different constructor. In the end,
@@ -340,7 +341,7 @@ backwards compatible on this aspect. When storing strings in object dtype, panda
340
341
however did allow using ` None ` as the missing value indicator as well (and in
341
342
certain cases such as the ` shift ` method, pandas even introduced this itself).
342
343
For all the cases where currently ` None ` was used as the missing value sentinel,
343
- this will change to use ` NaN ` consistently .
344
+ this will change to consistently use ` NaN ` .
344
345
345
346
### For existing users of ` StringDtype `
346
347
@@ -350,8 +351,8 @@ the behaviour of `dtype="string"` or `dtype=pd.StringDtype()` to mean the
350
351
` pd.NA ` variant of the dtype.
351
352
352
353
It does propose the change the default storage to ` "pyarrow" ` (if available) for
353
- the opt-in ` pd.NA ` variant as well, but this should not have much user-visible
354
- impact.
354
+ the opt-in ` pd.NA ` variant as well, but this should have limited, if any,
355
+ user-visible impact.
355
356
356
357
## Timeline
357
358
0 commit comments