Skip to content

Commit 2dc13fa

Browse files
authored
ARROW-151 Update documentation for extension types (#154)
1 parent c63dd94 commit 2dc13fa

File tree

4 files changed

+88
-63
lines changed

4 files changed

+88
-63
lines changed

bindings/python/docs/source/supported_types.rst renamed to bindings/python/docs/source/data_types.rst

Lines changed: 83 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
.. _type support:
22

3-
Supported Types
4-
===============
3+
Data Types
4+
==========
55

6-
PyMongoArrow currently supports a small subset of all BSON types.
6+
PyMongoArrow supports a majority of the BSON types.
77
Support for additional types will be added in subsequent releases.
88

9-
109
.. note:: For more information about BSON types, see the
1110
`BSON specification <http://bsonspec.org/spec.html>`_.
1211

@@ -24,7 +23,7 @@ Support for additional types will be added in subsequent releases.
2423
* - Embedded document
2524
- :class:`py.dict`, and instance of :class:`pyarrow.struct`
2625
* - Embedded array
27-
- :class:`py.list`, an instance of :class:`pyarrow.list_`,
26+
- An instance of :class:`pyarrow.list_`,
2827
* - ObjectId
2928
- :class:`py.bytes`, :class:`bson.ObjectId`, an instance of :class:`pymongoarrow.types.ObjectIdType`, an instance of :class:`pymongoarrow.pandas_types.PandasObjectId`
3029
* - Decimal128
@@ -39,6 +38,10 @@ Support for additional types will be added in subsequent releases.
3938
- :class:`~py.int`, :class:`bson.int64.Int64`, an instance of :meth:`pyarrow.int64`
4039
* - UTC datetime
4140
- an instance of :class:`~pyarrow.timestamp` with ``ms`` resolution, :class:`py.datetime.datetime`
41+
* - Binary data
42+
- :class:`bson.Binary`, an instance of :class:`pymongoarrow.types.BinaryType`, an instance of :class:`pymongoarrow.pandas_types.PandasBinary`.
43+
* - JavaScript code
44+
- :class:`bson.Code`, an instance of :class:`pymongoarrow.types.CodeType`, an instance of :class:`pymongoarrow.pandas_types.PandasCode`.
4245

4346
Type identifiers can be used to specify that a field is of a certain type
4447
during :class:`pymongoarrow.api.Schema` declaration. For example, if your data
@@ -54,6 +57,81 @@ respectively, and '_id' that is an `ObjectId`, your schema can be defined as::
5457
Unsupported data types in a schema cause a ``ValueError`` identifying the
5558
field and its data type.
5659

60+
61+
Embedded Array Considerations
62+
-----------------------------
63+
64+
The schema used for an Embedded Array must use the `pyarrow.list_()` type,
65+
so that the type of the array elements can be specified. For example,
66+
67+
.. code-block: python
68+
69+
from pyarrow import list_, float64
70+
schema = Schema({'_id': ObjectId,
71+
'location': {'coordinates': list_(float64())}
72+
})
73+
74+
75+
Extension Types
76+
---------------
77+
78+
The ``ObjectId``, ``Decimal128``, ``Binary data`` and ``JavaScript code``
79+
are implemented as extension types for PyArrow and Pandas.
80+
For arrow tables, values of these types will have the appropriate
81+
``pymongoarrow`` extension type (e.g. :class:`pymongoarrow.types.ObjectIdType`). The appropriate ``bson`` Python object can be obtained using the ``.as_py()`` method,
82+
or by calling ``.to_pylist()`` on the table.
83+
84+
.. code-block:: pycon
85+
86+
>>> from pymongo import MongoClient
87+
>>> from bson import ObjectId
88+
>>> from pymongoarrow.api import find_arrow_all
89+
>>> client = MongoClient()
90+
>>> coll = client.test.test
91+
>>> coll.insert_many([{"_id": ObjectId(), "foo": 100}, {"_id": ObjectId(), "foo": 200}])
92+
<pymongo.results.InsertManyResult at 0x1080a72b0>
93+
>>> table = find_arrow_all(coll, {})
94+
>>> table
95+
pyarrow.Table
96+
_id: extension<arrow.py_extension_type<ObjectIdType>>
97+
foo: int32
98+
----
99+
_id: [[64408B0D5AC9E208AF220142,64408B0D5AC9E208AF220143]]
100+
foo: [[100,200]]
101+
>>> table["_id"][0]
102+
<pyarrow.ObjectIdScalar: ObjectId('64408b0d5ac9e208af220142')>
103+
>>> table["_id"][0].as_py()
104+
ObjectId('64408b0d5ac9e208af220142')
105+
>>> table.to_pylist()
106+
[{'_id': ObjectId('64408b0d5ac9e208af220142'), 'foo': 100},
107+
{'_id': ObjectId('64408b0d5ac9e208af220143'), 'foo': 200}]
108+
109+
When converting to pandas, the extension type columns will have an appropriate
110+
``pymongoarrow`` extension type (e.g. :class:`pymongoarrow.pandas_types.PandasDecimal128`). The value of the element in the
111+
dataframe will be the appropriate ``bson`` type.
112+
113+
.. code-block:: pycon
114+
115+
>>> from pymongo import MongoClient
116+
>>> from bson import Decimal128
117+
>>> from pymongoarrow.api import find_pandas_all
118+
>>> client = MongoClient()
119+
>>> coll = client.test.test
120+
>>> coll.insert_many([{"foo": Decimal128("0.1")}, {"foo": Decimal128("0.1")}])
121+
<pymongo.results.InsertManyResult at 0x1080a72b0>
122+
>>> df = find_pandas_all(coll, {})
123+
>>> df
124+
_id foo
125+
0 64408bf65ac9e208af220144 0.1
126+
1 64408bf65ac9e208af220145 0.1
127+
>>> df["foo"].dtype
128+
<pymongoarrow.pandas_types.PandasDecimal128 at 0x11fe0ae90>
129+
>>> df["foo"][0]
130+
Decimal128('0.1')
131+
>>> df["_id"][0]
132+
ObjectId('64408bf65ac9e208af220144')
133+
134+
57135
Null Values and Conversion to Pandas DataFrames
58136
-----------------------------------------------
59137

bindings/python/docs/source/extension_types.rst

Lines changed: 0 additions & 49 deletions
This file was deleted.

bindings/python/docs/source/index.rst

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,8 @@ know to use **PyMongoArrow**.
1818
:doc:`quickstart`
1919
Start here for a quick overview.
2020

21-
:doc:`supported_types`
22-
A list of BSON types that are supported by PyMongoArrow.
23-
24-
:doc:`extension_types`
25-
A more in-depth explanation of the support for extension types such as ObjectId.
21+
:doc:`data_types`
22+
Data type support with PyMongoArrow.
2623

2724
:doc:`faq`
2825
Frequently asked questions.
@@ -88,8 +85,7 @@ Indices and tables
8885

8986
installation
9087
quickstart
91-
supported_types
92-
extension_types
88+
data_types
9389
faq
9490
api/index
9591
changelog

bindings/python/docs/source/quickstart.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,8 @@ to type-specifiers, e.g.::
6868
schema = Schema({'_id': int, 'amount': float, 'last_updated': datetime})
6969

7070
There are multiple permissible type-identifiers for each supported BSON type.
71-
For a full-list of supported types and associated type-identifiers see
72-
:doc:`supported_types`.
71+
For a full-list of data types and associated type-identifiers see
72+
:doc:`data_types`.
7373

7474
Nested data (embedded documents) are also supported::
7575

0 commit comments

Comments
 (0)