-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
PDEP-14: Dedicated string data type for pandas 3.0 #58551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
fbeb69d
f03f54d
561de87
86f4e51
30c7b43
54a43b3
5b5835b
9ede2e6
f5faf4e
f554909
ac2d21a
82027d2
5b24c24
f9c55f4
2c58c4c
0a68504
8974c5b
cca3a7f
d24a80a
9c5342a
b5663cc
1c4c2d9
c44bfb5
af5ad3c
bd52f39
f8fbc61
d78462d
4de20d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,247 @@ | ||||||||||||||
# PDEP-XX: Dedicated string data type for pandas 3.0 | ||||||||||||||
|
||||||||||||||
- Created: May 3, 2024 | ||||||||||||||
- Status: Under discussion | ||||||||||||||
- Discussion: | ||||||||||||||
|
||||||||||||||
- Author: [Joris Van den Bossche](https://github.com/jorisvandenbossche) | ||||||||||||||
- Revision: 1 | ||||||||||||||
|
||||||||||||||
## Abstract | ||||||||||||||
|
||||||||||||||
This PDEP proposes to introduce a dedicated string dtype that will be used by | ||||||||||||||
default in pandas 3.0: | ||||||||||||||
|
||||||||||||||
* In pandas 3.0, enable a "string" dtype by default, using PyArrow if available | ||||||||||||||
or otherwise the numpy object-dtype alternative. | ||||||||||||||
|
||||||||||||||
* The default string dtype will use missing value semantics (using NaN) consistent | ||||||||||||||
with the other default data types. | ||||||||||||||
|
||||||||||||||
This will give users a long-awaited proper string dtype for 3.0, while 1) not | ||||||||||||||
(yet) making PyArrow a _hard_ dependency, but only a dependency used by default, | ||||||||||||||
and 2) leaving room for future improvements (different missing value semantics, | ||||||||||||||
using NumPy 2.0, etc). | ||||||||||||||
|
||||||||||||||
# Dedicated string data type for pandas 3.0 | ||||||||||||||
|
||||||||||||||
## Background | ||||||||||||||
|
||||||||||||||
Currently, pandas by default stores text data in an `object`-dtype NumPy array. | ||||||||||||||
The current implementation has two primary drawbacks. First, `object` dtype is | ||||||||||||||
not specific to strings: any Python object can be stored in an `object`-dtype | ||||||||||||||
array, not just strings, and seeing `object` as the dtype for a column with | ||||||||||||||
strings is confusing for users. Second: this is not efficient (all string | ||||||||||||||
methods on a Series are eventually calling Python methods on the individual | ||||||||||||||
string objects). | ||||||||||||||
|
||||||||||||||
To solve the first issue, a dedicated extension dtype for string data has | ||||||||||||||
already been | ||||||||||||||
[added in pandas 1.0](https://pandas.pydata.org/docs/whatsnew/v1.0.0.html#dedicated-string-data-type). | ||||||||||||||
This has always been opt-in for now, requiring users to explicitly request the | ||||||||||||||
dtype (with `dtype="string"` or `dtype=pd.StringDtype()`). The array backing | ||||||||||||||
this string dtype was initially almost the same as the default implementation, | ||||||||||||||
i.e. an `object`-dtype NumPy array of Python strings. | ||||||||||||||
|
||||||||||||||
To solve the second issue (performance), pandas contributed to the development | ||||||||||||||
of string kernels in the PyArrow package, and a variant of the string dtype | ||||||||||||||
backed by PyArrow was | ||||||||||||||
[added in pandas 1.3](https://pandas.pydata.org/docs/whatsnew/v1.3.0.html#pyarrow-backed-string-data-type). | ||||||||||||||
This could be specified with the `storage` keyword in the opt-in string dtype | ||||||||||||||
(`pd.StringDtype(storage="pyarrow")`). | ||||||||||||||
|
||||||||||||||
Since its introduction, the `StringDtype` has always been opt-in, and has used | ||||||||||||||
the experimental `pd.NA` sentinel for missing values (which was also [introduced | ||||||||||||||
in pandas 1.0](https://pandas.pydata.org/docs/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values)). | ||||||||||||||
However, up to this date, pandas has not yet taken the step to use `pd.NA` by | ||||||||||||||
default, and thus the `StringDtype` deviates in missing value behaviour compared | ||||||||||||||
to the default data types. | ||||||||||||||
|
||||||||||||||
In 2023, [PDEP-10](https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html) | ||||||||||||||
proposed to start using a PyArrow-backed string dtype by default in pandas 3.0 | ||||||||||||||
(i.e. infer this type for string data instead of object dtype). To ensure we | ||||||||||||||
could use the variant of `StringDtype` backed by PyArrow instead of Python | ||||||||||||||
objects (for better performance), it proposed to make `pyarrow` a new required | ||||||||||||||
runtime dependency of pandas. | ||||||||||||||
|
||||||||||||||
In the meantime, NumPy has also been working on a native variable-width string | ||||||||||||||
data type, which will be available [starting with NumPy | ||||||||||||||
2.0](https://numpy.org/devdocs/release/2.0.0-notes.html#stringdtype-has-been-added-to-numpy). | ||||||||||||||
This can provide a potential alternative to PyArrow for implementing a string | ||||||||||||||
data type in pandas that is not backed by Python objects. | ||||||||||||||
|
||||||||||||||
After acceptance of PDEP-10, two aspects of the proposal have been under | ||||||||||||||
reconsideration: | ||||||||||||||
|
||||||||||||||
- Based on user feedback, it has been considered to relax the new `pyarrow` | ||||||||||||||
|
||||||||||||||
requirement to not be a _hard_ runtime dependency. In addition, NumPy 2.0 can | ||||||||||||||
|
||||||||||||||
potentially reduce the need to make PyArrow a required dependency specifically | ||||||||||||||
for a dedicated pandas string dtype. | ||||||||||||||
- The PDEP did not consider the usage of the experimental `pd.NA` as a | ||||||||||||||
consequence of adopting one of the existing implementations of the | ||||||||||||||
`StringDtype`. | ||||||||||||||
|
||||||||||||||
For the second aspect, another variant of the `StringDtype` was | ||||||||||||||
[introduced in pandas 2.1](https://pandas.pydata.org/docs/whatsnew/v2.1.0.html#whatsnew-210-enhancements-infer-strings) | ||||||||||||||
that is still backed by PyArrow but follows the default missing values semantics | ||||||||||||||
pandas uses for all other default data types (and using `NaN` as the missing | ||||||||||||||
value sentinel) ([GH-54792](https://github.com/pandas-dev/pandas/issues/54792)). | ||||||||||||||
|
||||||||||||||
At the time, the `storage` option for this new variant was called | ||||||||||||||
`"pyarrow_numpy"` to disambiguate from the existing `"pyarrow"` option using `pd.NA`. | ||||||||||||||
|
||||||||||||||
This last dtype variant is what you currently (pandas 2.2) get for string data | ||||||||||||||
when enabling the ``future.infer_string`` option (to enable the behaviour which | ||||||||||||||
is intended to become the default in pandas 3.0). | ||||||||||||||
|
||||||||||||||
## Proposal | ||||||||||||||
|
||||||||||||||
To be able to move forward with a string data type in pandas 3.0, this PDEP proposes: | ||||||||||||||
|
||||||||||||||
1. For pandas 3.0, we enable a "string" dtype by default, which will use PyArrow | ||||||||||||||
jorisvandenbossche marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
if installed, and otherwise falls back to an in-house functionally-equivalent | ||||||||||||||
mroeschke marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
(but slower) version. | ||||||||||||||
Dr-Irv marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
2. This default "string" dtype will follow the same behaviour for missing values | ||||||||||||||
as our other default data types, and use `NaN` as the missing value sentinel. | ||||||||||||||
3. The version that is not backed by PyArrow can reuse the existing numpy | ||||||||||||||
object-dtype backed StringArray for its implementation. | ||||||||||||||
simonjayhawkins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jorisvandenbossche marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
4. We update installation guidelines to clearly encourage users to install | ||||||||||||||
pyarrow for the default user experience. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
### Default inference of a string dtype | ||||||||||||||
|
||||||||||||||
By default, pandas will infer this new string dtype for string data (when | ||||||||||||||
creating pandas objects, such as in constructors or IO functions). | ||||||||||||||
|
||||||||||||||
The existing `future.infer_string` option can be used to opt-in to the future | ||||||||||||||
default behaviour: | ||||||||||||||
|
||||||||||||||
```python | ||||||||||||||
>>> pd.options.future.infer_string = True | ||||||||||||||
>>> pd.Series(["a", "b", None]) | ||||||||||||||
0 a | ||||||||||||||
1 b | ||||||||||||||
2 NaN | ||||||||||||||
dtype: string | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
This option will be expanded to also work when PyArrow is not installed. | ||||||||||||||
|
||||||||||||||
### Missing value semantics | ||||||||||||||
|
||||||||||||||
Given that all other default data types use NaN semantics for missing values, | ||||||||||||||
this proposal says that a new default string dtype should still use the same | ||||||||||||||
default semantics. Further, it should result in default data types when doing | ||||||||||||||
operations on the string column that result in a boolean or numeric data type | ||||||||||||||
(e.g., methods like `.str.startswith(..)` or `.str.len(..)`, or comparison | ||||||||||||||
operators like `==`, should result in default `int64` and `bool` data types). | ||||||||||||||
|
||||||||||||||
Because the original `StringDtype` implementations already use `pd.NA` and | ||||||||||||||
return masked integer and boolean arrays in operations, a new variant of the | ||||||||||||||
existing dtypes that uses `NaN` and default data types is needed. | ||||||||||||||
jorisvandenbossche marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
### Object-dtype "fallback" implementation | ||||||||||||||
|
||||||||||||||
To avoid a hard dependency on PyArrow for pandas 3.0, this PDEP proposes to keep | ||||||||||||||
a "fallback" option in case PyArrow is not installed. The original `StringDtype` | ||||||||||||||
backed by a numpy object-dtype array of Python strings can be used for this, and | ||||||||||||||
|
||||||||||||||
only need minor updates to follow the above-mentioned missing value semantics | ||||||||||||||
([GH-58451](https://github.com/pandas-dev/pandas/pull/58451)). | ||||||||||||||
|
||||||||||||||
For pandas 3.0, this is the most realistic option given this implementation is | ||||||||||||||
jorisvandenbossche marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
already available for a long time. Beyond 3.0, we can still explore further | ||||||||||||||
improvements such as using nanoarrow or NumPy 2.0, but at that point that is an | ||||||||||||||
|
||||||||||||||
implementation detail that should not have a direct impact on users (except for | ||||||||||||||
performance). | ||||||||||||||
|
||||||||||||||
### Naming | ||||||||||||||
|
||||||||||||||
Given the long history of this topic, the naming of the dtypes is a difficult | ||||||||||||||
topic. | ||||||||||||||
|
||||||||||||||
In the first place, we need to acknowledge that most users should not need to | ||||||||||||||
use storage-specific options. Users are expected to specify `pd.StringDtype()` | ||||||||||||||
|
||||||||||||||
or `"string"`, and that will give them their default string dtype (which | ||||||||||||||
depends on whether PyArrow is installed or not). | ||||||||||||||
|
||||||||||||||
But for testing purposes and advanced use cases that want control over this, we | ||||||||||||||
need some way to specify this and distinguish them from the other string dtypes. | ||||||||||||||
Currently, the `StringDtype(storage="pyarrow_numpy")` is used, where | ||||||||||||||
"pyarrow_numpy" is a rather confusing option. | ||||||||||||||
|
||||||||||||||
TODO see if we can come up with a better naming scheme | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
## Alternatives | ||||||||||||||
|
||||||||||||||
### Why not delay introducing a default string dtype? | ||||||||||||||
|
||||||||||||||
To avoid introducing a new string dtype while other discussions and changes are | ||||||||||||||
in flux (eventually making pyarrow a required dependency? adopting `pd.NA` as | ||||||||||||||
the default missing value sentinel? using the new NumPy 2.0 capabilities?), we | ||||||||||||||
could also delay introducing a default string dtype until there is more clarity | ||||||||||||||
in those other discussions. | ||||||||||||||
|
||||||||||||||
However: | ||||||||||||||
|
||||||||||||||
1. Delaying has a cost: it further postpones introducing a dedicated string | ||||||||||||||
dtype that has massive benefits for our users, both in usability as (for the | ||||||||||||||
significant part of the user base that has PyArrow installed) in performance. | ||||||||||||||
|
dtype that has massive benefits for our users, both in usability as (for the | |
significant part of the user base that has PyArrow installed) in performance. | |
dtype that has massive benefits for our users, both in usability and, for users that already have PyArrow installed or have no issues installing PyArrow, in performance. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the challenges around this will not be unique to the string dtype and
therefore not a reason to delay this.
I might be missing the intent but I don't understand why the larger issue of NA handling means we should be faster to implement this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the larger issue of NA handling means we should be faster to implement this
It's not a reason to do it "faster", but I meant to say that the discussion regarding NA is not a reason to do it "slower" (to delay introducing a dedicated string dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the flip side is that if we aren't careful about the NA handling we can introduce some new keywords / terminology that makes it very confusing in the long run (which is essentially one of the problems with our strings naming conventions)
As a practical example, if we decided we wanted semantics=
as a keyword argument to StringDtype
in this PDEP to move the NA discussion along, that might be counter-productive when we look at more data types and decide semantics=
was not a clear way to allow datetime data types to support pd.NaT
as the missing value.
(not saying the above is necessarily the truth, just cherry picking from conversation so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one reason that I personally would prefer not introducing a keyword specifically for the missing value semantics, for now (just for this PDEP / the string dtype). I just listed some options in #58613, and I think we can do without it.
simonjayhawkins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just retroactively clarifies the reasoning for string[pyarrow_numpy]
to have existed in the first place right? Or is it supposed to be hinting at some other feature that the implementation details of the PDEP is proposing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's indeed explaining why we did this, which is of course "retroactively" given I was asked to write this PDEP partly for changes that have already been released. So a big part of the PDEP is retroactively in that sense (which it not necessarily helping to write it clearly ..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it supposed to be hinting at some other feature that the implementation details of the PDEP is proposing?
however, more importantly, the PDEP makes this (the already added dtype) the default in 3.0. It would remain behind the future flag for the next release if enough people feel we are not ready.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically you would get this by using dtype="string"
too right? I'm a little wary that we are underestimating the scope of how breaking this could be; I didn't even realize we considered that dtype experimental all this time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been available (as pyarrow backed) since 1.3, so almost three years (July 2, 2021). Even though considered experimental, if the new string dtype is not accepted for 3.0, then maybe a deprecation warning should be added? (We could also do this if decided a 2.3 release is needed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A deprecation warning about what exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little wary that we are underestimating the scope of how breaking this could be
The scope of changing NaN to NA for all users is much bigger though (essentially what was decided in PDEP-10 if we would follow it strictly to the letter).
And similarly if we would in the future change NaN/NaT semantics to NA for all dtypes, the scope will be much bigger (because once that is enabled by default, for example a user that was doing dtype="float64"
will probably get the new NA behaviour while now it uses NaN), but we are still considering that (granted, it's exactly those details that we have to discuss a lot more in detail (elsewhere) and figure out, though).
I know that this is not necessarily a good argument to justify this breaking change (because we certainly should be wary of the scope of those breaking changes), but I do want to point out again that the choice in this PDEP to use NaN semantics is to reduce the scope of the breaking changes for most users (at the expense of increasing the scope of breaking changes for the smaller subset of users that was already using dtype="string"
).
If we don't want to make dtype="string"
breaking, then either we need to come up with a different name for the dtype (not using "string", like "utf8" or "text"), or either we need to delay introducing a default string dtype until after we have agreement on the NA discussions.
And personally I think "string" is by far the best name (and I find the small breakage worth it for being able to use that name), and as I argued elsewhere (and in the Why not delay introducing a default string dtype? section in the PDEP text), I think it is valuable for our users to not wait with adding a dedicated string dtype until we are ready with the NA discussion and implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the expense of increasing the scope of breaking changes for the smaller subset of users that was already using
dtype="string"
This is where I am a little uncomfortable - I don't know how to measure the size of that, but I am wary of assuming it is not a signifcant number of users. The fact that "string" returns NA as a missing value is a documented difference in our code base:
https://pandas.pydata.org/docs/dev/user_guide/text.html#behavior-differences
And its usage has been promoted for quite some time:
https://stackoverflow.com/a/60553529/621736
https://towardsdatascience.com/why-we-need-to-use-pandas-new-string-dtype-instead-of-object-for-textual-data-6fd419842e24
https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.1.0.html#all-dtypes-can-now-be-converted-to-stringdtype
If we don't want to make
dtype="string"
breaking, then either we need to come up with a different name for the dtype (not using "string", like "utf8" or "text"), or either we need to delay introducing a default string dtype until after we have agreement on the NA discussions.
Yea none of these options are great...but out of them I still would probably prefer waiting. I think right now we are marching down a path of "string" missing values:
- Returning pd.NA today
- Returning np.nan with this PDEP (granted those changes are already in main)
- Going back to returning pd.NA with the NA PDEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But personally I think
dtype="string"
meaning something different than the default string dtype you get without specifying the dtype is going to be very confusing ..)
I think we have to carefully specify what the user specifies in a dtype
argument and how that gets interpreted, versus what we return as the dtype
when they look at Series.dtype
.
So we could have a mapping that says
User specifies dtype = |
pandas returns Series.dtype |
---|---|
Unspecified | "string[pyarrow_numpy]" OR "string[python]" |
"string" |
"string[pyarrow]" |
StringDtype("pyarrow") | "string[pyarrow]" |
StringDtype("python") | "string[python]" |
StringDtype("pyarrow_numpy") | "string[pyarrow_numpy]" |
The first row depends on whether pyarrow
is installed.
For the second, third and fifth rows, if pyarrow
is not installed, we raise an Exception.
Separately, we can then debate what the values in the second column should look like in #58613 . I personally am not a fan of "pyarrow_numpy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, my answer to your example snippet was trying to explain how I would ensure this does not break (if we return
bool
column instead of object dtype with True/False/NaN will ensure that filtering keeps working).
Ah OK - I didn't realize you were proposing that change be a part of this PDEP, just thought it was an idea you had for the future. But that's a completely new behavior...and then begs the question of do we go back and change dtype=object to have that same behavior or just have dtype="string" exclusively have it. Ultimately we end up with the same issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also agree with Will that it's not fair to change this without warning for people already using "string".
(pd.NA is also a big selling point of the dtype="string"
too)
Maybe a good compromise would be to use string[pyarrow]
under the hood for those users (if they had it installed)?
If we were to move ahead with the move to nullable dtypes in general, I worry that this changing of the na value for dtype="string"
from pd.NA -> np.nan -> pd.NA will cause a lot of confusion.
If we were to do 2.3 (like I suggested below), this might be addressable there (with a deprecation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still adding some deprecation warnings in 2.x for current users of StringDtype is something we certainly could do. I am personally ambivalent about it, but fine with adding it if others think that is better (I do think it might become quite noisy, and it also does not change the fact that 3.0 would switch from NA to NaN)
The warning message could then point people to enable pd.options.future.infer_string = True
in case they only care about having the (faster) string dtype, or otherwise update their dtype specification if they want the NA instead of NaN version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to carefully specify what the user specifies in a
dtype
argument and how that gets interpreted, versus what we return as thedtype
when they look atSeries.dtype
.So we could have a mapping that says
I created a variant of that table #58613 (comment) with a concrete proposal
For the second, third and fifth rows, if
pyarrow
is not installed, we raise an Exception.
(for clarity, this "second" row referred to specifying a dtype with "string"
)
If you explicitly ask for pyarrow, then yes raising an exception is fine and expected. But a generic "string"
(or StringDtype()
) has to mean "whatever string dtype that is the default" and so cannot raise an exception if pyarrow is not installed, but should return the object-dtype based fallback.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the plan worries me a little.
Maybe it would be better to cut off a 2.3 from 2.2.x.
I think there's a significant proportion of the downloads for 2.2 that aren't on the latest patch release.
I think there's ~ 1/3 of the downloads that are fetching 2.2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
it would be good to mention which version of pandas is expected to have infer_string
be able to infer to the object fallback option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a 2.3 release (maybe around the same time as 3.0rc) sounds reasonable.
If the features/bugfixes added to 2.3 are limited to the string dtype then we shouldn't need many patch releases. We may not need to fix any string dtype related issues that are fixed for 3.0 as these will be behind a flag in 2.3 and so shouldn't break existing code.
On the other hand, as these features are behind a flag, maybe releasing a 2.3 would not gain the field testing we hope for.
And therefore, instead of doing a 2.3, planning for at least a couple of release candidates for 3.0 would better achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to cut off a 2.3 from 2.2.x.
Yes, if we still plan to add a deprecation warning and change the naming scheme in StringDtype
, calling that 2.3.0 sounds as the best option (I had been planning to propose doing a 2.3.0 (from the 2.2.x branch) anyway to bump the warning for CoW from DeprecationWarning to FutureWarning)
Uh oh!
There was an error while loading. Please reload this page.