|
12 | 12 | " * Separate types for atomic columns (such as `int`, `bool`, and `float`) and columns of objects (such as `str`).\n", |
13 | 13 | " * No out-of-band representation of missing values. Instead, missingness must be signaled by the insertion of a value representing missingness. This causes problems for types that don't have such a representation such as `int` and `bool`.\n", |
14 | 14 | "\n", |
15 | | - "To work around the above the Pandas data frame have a number of non-avoidable column type promotion rules and cell type promotion rules. Let's take a look at a problem data frame." |
| 15 | + "To work around the above the Pandas data frame have a number of non-avoidable column type promotion rules and cell type promotion rules. These promotion rules can introduce their own complexity.\n", |
| 16 | + "\n", |
| 17 | + "Let's take a look at a Pandas data frame." |
16 | 18 | ] |
17 | 19 | }, |
18 | 20 | { |
|
198 | 200 | "metadata": {}, |
199 | 201 | "outputs": [ |
200 | 202 | { |
201 | | - "data": { |
202 | | - "text/plain": [ |
203 | | - "{'b': {int},\n", |
204 | | - " 'q': None,\n", |
205 | | - " 'r': {float},\n", |
206 | | - " 's': {float},\n", |
207 | | - " 'x': {float},\n", |
208 | | - " 'y': {str},\n", |
209 | | - " 'z': {bool, float, int}}" |
210 | | - ] |
211 | | - }, |
212 | | - "execution_count": 5, |
213 | | - "metadata": {}, |
214 | | - "output_type": "execute_result" |
| 203 | + "name": "stdout", |
| 204 | + "output_type": "stream", |
| 205 | + "text": [ |
| 206 | + "{'b': {<class 'int'>},\n", |
| 207 | + " 'q': None,\n", |
| 208 | + " 'r': {<class 'float'>},\n", |
| 209 | + " 's': {<class 'float'>},\n", |
| 210 | + " 'x': {<class 'float'>},\n", |
| 211 | + " 'y': {<class 'str'>},\n", |
| 212 | + " 'z': {<class 'bool'>, <class 'float'>, <class 'int'>}}\n" |
| 213 | + ] |
215 | 214 | } |
216 | 215 | ], |
217 | 216 | "source": [ |
218 | 217 | "# report non-null (not None, NaN, or NaT) found in cells\n", |
219 | | - "non_null_types_in_frame(d)" |
| 218 | + "pprint(non_null_types_in_frame(d))" |
220 | 219 | ] |
221 | 220 | }, |
222 | 221 | { |
|
299 | 298 | "If you are not sure all of your code base (and its dependencies) are consistently only using columns or only using the values attribute, you may experience incompatible mixed types even on uniform data. We know one is not supposed to use \"`.values`\" [from the Pandas documention](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html):\n", |
300 | 299 | "\n", |
301 | 300 | "\n", |
302 | | - "\n", |
| 301 | + "<dd>\n", |
| 302 | + "pandas.DataFrame.values\n", |
| 303 | + "property DataFrame.values\n", |
303 | 304 | "<dd><p>Return a Numpy representation of the DataFrame.</p>\n", |
304 | 305 | "<div class=\"admonition warning\">\n", |
305 | 306 | "<p class=\"admonition-title\">Warning</p>\n", |
|
311 | 312 | "<dt class=\"field-odd\">Returns<span class=\"colon\">:</span></dt>\n", |
312 | 313 | "<dd class=\"field-odd\"><dl class=\"simple\">\n", |
313 | 314 | "<dt>numpy.ndarray</dt><dd><p>The values of the DataFrame.</p>\n", |
314 | | - "</dd></dd></dd>\n", |
| 315 | + "</dd></dd></dd></dd>\n", |
315 | 316 | "\n", |
316 | 317 | "So, presumably, Pandas `.values` is not in fact the attribute it syntactically presents as, but in fact a method interface.\n", |
317 | 318 | "\n", |
318 | | - "The type recommended `.to_numpy()` seems to return the same `numpy.float64`, which presumably is *not* what is inside the Pandas data frame columns are Series representations." |
| 319 | + "The type the recommended method `.to_numpy()` seems to return the same `numpy.float64`, which presumably is *not* what is inside the Pandas data frame columns or Series representations. In any case, what types you see in a cell is dependent on what types are in related cells, and what path you use to access the value." |
319 | 320 | ] |
320 | 321 | }, |
321 | 322 | { |
|
342 | 343 | "cell_type": "markdown", |
343 | 344 | "metadata": {}, |
344 | 345 | "source": [ |
345 | | - "Any and all of the above inconsistencies can be fairly hazardous to any system that tries to export Pandas to other type sensitive systems (such as databases, JSON, arrow and so on)." |
| 346 | + "Any and all of the above inconsistencies can be fairly hazardous to any insufficiently careful system that tries to export Pandas to other type sensitive systems (such as databases, JSON, arrow and so on)." |
346 | 347 | ] |
347 | 348 | }, |
348 | 349 | { |
|
351 | 352 | "metadata": {}, |
352 | 353 | "outputs": [ |
353 | 354 | { |
354 | | - "data": { |
355 | | - "text/plain": [ |
356 | | - "'2.0.3'" |
357 | | - ] |
358 | | - }, |
359 | | - "execution_count": 10, |
360 | | - "metadata": {}, |
361 | | - "output_type": "execute_result" |
362 | | - } |
363 | | - ], |
364 | | - "source": [ |
365 | | - "pd.__version__" |
366 | | - ] |
367 | | - }, |
368 | | - { |
369 | | - "cell_type": "code", |
370 | | - "execution_count": 11, |
371 | | - "metadata": {}, |
372 | | - "outputs": [ |
373 | | - { |
374 | | - "data": { |
375 | | - "text/plain": [ |
376 | | - "'1.25.2'" |
377 | | - ] |
378 | | - }, |
379 | | - "execution_count": 11, |
380 | | - "metadata": {}, |
381 | | - "output_type": "execute_result" |
| 355 | + "name": "stdout", |
| 356 | + "output_type": "stream", |
| 357 | + "text": [ |
| 358 | + "{'np': '1.25.2', 'pd': '2.0.3'}\n" |
| 359 | + ] |
382 | 360 | } |
383 | 361 | ], |
384 | 362 | "source": [ |
385 | | - "np.__version__" |
| 363 | + "pprint({'np': np.__version__, 'pd': pd.__version__})" |
386 | 364 | ] |
387 | 365 | } |
388 | 366 | ], |
|
0 commit comments