Skip to content

Commit 46e8c9c

Browse files
sibbiiiblink1073
andauthored
Add documentation on how to deal with nulls in data (#138)
* Added hint how to flatten result efficiently * null conversion added * Update schemas.rst * Update schemas.rst revert * Added review comments of @juliusgeo * Update supported_types.rst * lint --------- Co-authored-by: Steven Silvester <[email protected]>
1 parent b67be9c commit 46e8c9c

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

bindings/python/docs/source/supported_types.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,34 @@ respectively, and '_id' that is an `ObjectId`, your schema can be defined as::
5353

5454
Unsupported data types in a schema cause a ``ValueError`` identifying the
5555
field and its data type.
56+
57+
Null Values and Conversion to Pandas DataFrames
58+
-----------------------------------------------
59+
60+
In Arrow, all Arrays are always nullable.
61+
Pandas has experimental nullable data types as, e.g., "Int64" (note the capital "I").
62+
You can instruct Arrow to create a pandas DataFrame using nullable dtypes
63+
with the code below (taken from `here <https://arrow.apache.org/docs/python/pandas.html>`_)
64+
65+
.. code-block:: pycon
66+
67+
>>> dtype_mapping = {
68+
... pa.int8(): pd.Int8Dtype(),
69+
... pa.int16(): pd.Int16Dtype(),
70+
... pa.int32(): pd.Int32Dtype(),
71+
... pa.int64(): pd.Int64Dtype(),
72+
... pa.uint8(): pd.UInt8Dtype(),
73+
... pa.uint16(): pd.UInt16Dtype(),
74+
... pa.uint32(): pd.UInt32Dtype(),
75+
... pa.uint64(): pd.UInt64Dtype(),
76+
... pa.bool_(): pd.BooleanDtype(),
77+
... pa.float32(): pd.Float32Dtype(),
78+
... pa.float64(): pd.Float64Dtype(),
79+
... pa.string(): pd.StringDtype(),
80+
... }
81+
... df = arrow_table.to_pandas(
82+
... types_mapper=dtype_mapping.get, split_blocks=True, self_destruct=True
83+
... )
84+
... del arrow_table
85+
86+
Defining a conversion for `pa.string()` in addition converts Arrow strings to NumPy strings, and not objects.

0 commit comments

Comments
 (0)