@@ -86,14 +86,17 @@ It can also be specified explicitly using the ``"str"`` alias:
86
86
2 NaN
87
87
dtype: str
88
88
89
+ Similarly, functions like :func: `read_csv `, :func: `read_parquet `, and otherwise
90
+ will now use the new string dtype when reading string data.
91
+
89
92
In contrast to the current object dtype, the new string dtype will only store
90
93
strings. This also means that it will raise an error if you try to store a
91
94
non-string value in it (see below for more details).
92
95
93
- Missing values with the new string dtype are always represented as ``NaN ``, and
94
- the missing value behaviour is similar to other default dtypes.
96
+ Missing values with the new string dtype are always represented as ``NaN `` (`` np.nan ``),
97
+ and the missing value behavior is similar to other default dtypes.
95
98
96
- This new string dtype should work the same as how you have been
99
+ This new string dtype should otherwise work the same as how you have been
97
100
using pandas with string data today. For example, all string-specific methods
98
101
through the ``str `` accessor will work the same:
99
102
@@ -112,13 +115,13 @@ through the ``str`` accessor will work the same:
112
115
class. The dtype can be constructed as ``pd.StringDtype(na_value=np.nan) ``,
113
116
but for general usage we recommend to use the shorter ``"str" `` alias.
114
117
115
- Overview of behaviour differences and how to address them
118
+ Overview of behavior differences and how to address them
116
119
---------------------------------------------------------
117
120
118
121
The dtype is no longer object dtype
119
122
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120
123
121
- When inferring string data, the data type of the resulting DataFrame column or
124
+ When inferring or reading string data, the data type of the resulting DataFrame column or
122
125
Series will silently start being the new ``"str" `` dtype instead of ``"object" ``
123
126
dtype, and this can have some impact on your code.
124
127
@@ -209,7 +212,7 @@ the missing value sentinel is always NaN (``np.nan``):
209
212
>> > print (ser[2 ])
210
213
nan
211
214
212
- Generally this should be no problem when relying on missing value behaviour in
215
+ Generally this should be no problem when relying on missing value behavior in
213
216
pandas methods (for example, ``ser.isna() `` will give the same result as before).
214
217
But when you relied on the exact value of ``None `` being present, that can
215
218
impact your code.
@@ -227,9 +230,8 @@ the dtype and the exact missing value sentinel:
227
230
True
228
231
229
232
One caveat: this function works both on scalars and on array-likes, and in the
230
- latter case it will return an array of boolean dtype. When using it in a boolean
231
- context (for example, ``if pd.isna(..): .. ``) be sure to only pass a scalar to
232
- it.
233
+ latter case it will return an array of bools. When using it in a Boolean context
234
+ (for example, ``if pd.isna(..): .. ``) be sure to only pass a scalar to it.
233
235
234
236
"setitem" operations will now raise an error for non-string data
235
237
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0 commit comments