@@ -179,12 +179,11 @@ needs minor changes to follow the above-mentioned missing value semantics
179
179
([ GH-58451 ] ( https://github.com/pandas-dev/pandas/pull/58451 ) ).
180
180
181
181
For pandas 3.0, this is the most realistic option given this implementation has
182
- already been available for a long time. Beyond 3.0, further
183
- improvements such as using NumPy 2.0 ([ GH-58503 ] ( https://github.com/pandas-dev/pandas/issues/58503 ) )
184
- or nanoarrow ([ GH-58552 ] ( https://github.com/pandas-dev/pandas/issues/58552 ) )
185
- can still be explored,
186
- but at that point that is an implementation detail that should not have a
187
- direct impact on users (except for performance).
182
+ already been available for a long time. Beyond 3.0, further improvements such as
183
+ using NumPy 2.0 ([ GH-58503 ] ( https://github.com/pandas-dev/pandas/issues/58503 ) )
184
+ or nanoarrow ([ GH-58552 ] ( https://github.com/pandas-dev/pandas/issues/58552 ) ) can
185
+ still be explored, but at that point that is an implementation detail that
186
+ should not have a direct impact on users (except for performance).
188
187
189
188
### Naming
190
189
@@ -203,10 +202,10 @@ dtype need a way to specify this.
203
202
204
203
Currently (pandas 2.2), ` StringDtype(storage="pyarrow_numpy") ` is used, where
205
204
the ` "pyarrow_numpy" ` storage was used to disambiguate from the existing
206
- ` "pyarrow" ` option using ` pd.NA ` . However, "pyarrow_numpy" is a rather
207
- confusing option and doesn't generalize well. Therefore, this PDEP proposes
208
- a new naming scheme as outlined below, and
209
- "pyarrow_numpy" will be deprecated and removed before pandas 3.0.
205
+ ` "pyarrow" ` option using ` pd.NA ` . However, "pyarrow_numpy" is a rather confusing
206
+ option and doesn't generalize well. Therefore, this PDEP proposes a new naming
207
+ scheme as outlined below, and "pyarrow_numpy" will be deprecated and removed
208
+ before pandas 3.0.
210
209
211
210
The ` storage ` keyword of ` StringDtype ` is kept to disambiguate the underlying
212
211
storage of the string data (using pyarrow or python objects), but an additional
@@ -249,8 +248,8 @@ sufficient (they don't need to specify the storage), and the explicit
249
248
To avoid introducing a new string dtype while other discussions and changes are
250
249
in flux (eventually making pyarrow a required dependency? adopting ` pd.NA ` as
251
250
the default missing value sentinel? using the new NumPy 2.0 capabilities?
252
- overhauling all our dtypes to use a logical data type system?),
253
- introducing a default string dtype could also be delayed until there is more clarity in those
251
+ overhauling all our dtypes to use a logical data type system?), introducing a
252
+ default string dtype could also be delayed until there is more clarity in those
254
253
other discussions.
255
254
256
255
However:
@@ -263,10 +262,10 @@ However:
263
262
the challenges around this will not be unique to the string dtype and
264
263
therefore not a reason to delay this.
265
264
266
- Making this change now for 3.0 will benefit the majority of users, while
267
- coming at a cost for a part of the users who already started using the
268
- ` "string" ` or ` pd.StringDtype() ` dtype (they will have to update their code to continue to the variant
269
- using ` pd.NA ` , see the "Backward compatibility" section below).
265
+ Making this change now for 3.0 will benefit the majority of users, while coming
266
+ at a cost for a part of the users who already started using the ` "string" ` or
267
+ ` pd.StringDtype() ` dtype (they will have to update their code to continue to the
268
+ variant using ` pd.NA ` , see the "Backward compatibility" section below).
270
269
271
270
### Why not use the existing StringDtype with ` pd.NA ` ?
272
271
0 commit comments