Skip to content

Commit 82027d2

Browse files
reflow after online edits
1 parent ac2d21a commit 82027d2

File tree

1 file changed

+15
-16
lines changed

1 file changed

+15
-16
lines changed

web/pandas/pdeps/0014-string-dtype.md

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -179,12 +179,11 @@ needs minor changes to follow the above-mentioned missing value semantics
179179
([GH-58451](https://github.com/pandas-dev/pandas/pull/58451)).
180180

181181
For pandas 3.0, this is the most realistic option given this implementation has
182-
already been available for a long time. Beyond 3.0, further
183-
improvements such as using NumPy 2.0 ([GH-58503](https://github.com/pandas-dev/pandas/issues/58503))
184-
or nanoarrow ([GH-58552](https://github.com/pandas-dev/pandas/issues/58552))
185-
can still be explored,
186-
but at that point that is an implementation detail that should not have a
187-
direct impact on users (except for performance).
182+
already been available for a long time. Beyond 3.0, further improvements such as
183+
using NumPy 2.0 ([GH-58503](https://github.com/pandas-dev/pandas/issues/58503))
184+
or nanoarrow ([GH-58552](https://github.com/pandas-dev/pandas/issues/58552)) can
185+
still be explored, but at that point that is an implementation detail that
186+
should not have a direct impact on users (except for performance).
188187

189188
### Naming
190189

@@ -203,10 +202,10 @@ dtype need a way to specify this.
203202

204203
Currently (pandas 2.2), `StringDtype(storage="pyarrow_numpy")` is used, where
205204
the `"pyarrow_numpy"` storage was used to disambiguate from the existing
206-
`"pyarrow"` option using `pd.NA`. However, "pyarrow_numpy" is a rather
207-
confusing option and doesn't generalize well. Therefore, this PDEP proposes
208-
a new naming scheme as outlined below, and
209-
"pyarrow_numpy" will be deprecated and removed before pandas 3.0.
205+
`"pyarrow"` option using `pd.NA`. However, "pyarrow_numpy" is a rather confusing
206+
option and doesn't generalize well. Therefore, this PDEP proposes a new naming
207+
scheme as outlined below, and "pyarrow_numpy" will be deprecated and removed
208+
before pandas 3.0.
210209

211210
The `storage` keyword of `StringDtype` is kept to disambiguate the underlying
212211
storage of the string data (using pyarrow or python objects), but an additional
@@ -249,8 +248,8 @@ sufficient (they don't need to specify the storage), and the explicit
249248
To avoid introducing a new string dtype while other discussions and changes are
250249
in flux (eventually making pyarrow a required dependency? adopting `pd.NA` as
251250
the default missing value sentinel? using the new NumPy 2.0 capabilities?
252-
overhauling all our dtypes to use a logical data type system?),
253-
introducing a default string dtype could also be delayed until there is more clarity in those
251+
overhauling all our dtypes to use a logical data type system?), introducing a
252+
default string dtype could also be delayed until there is more clarity in those
254253
other discussions.
255254

256255
However:
@@ -263,10 +262,10 @@ However:
263262
the challenges around this will not be unique to the string dtype and
264263
therefore not a reason to delay this.
265264

266-
Making this change now for 3.0 will benefit the majority of users, while
267-
coming at a cost for a part of the users who already started using the
268-
`"string"` or `pd.StringDtype()` dtype (they will have to update their code to continue to the variant
269-
using `pd.NA`, see the "Backward compatibility" section below).
265+
Making this change now for 3.0 will benefit the majority of users, while coming
266+
at a cost for a part of the users who already started using the `"string"` or
267+
`pd.StringDtype()` dtype (they will have to update their code to continue to the
268+
variant using `pd.NA`, see the "Backward compatibility" section below).
270269

271270
### Why not use the existing StringDtype with `pd.NA`?
272271

0 commit comments

Comments
 (0)