Skip to content

Commit 3e84f47

Browse files
authored
ARROW-142 Update documentation to make clear that extension types are not fully supported (#123)
1 parent 60e7706 commit 3e84f47

File tree

3 files changed

+59
-0
lines changed

3 files changed

+59
-0
lines changed
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Extension Types
2+
===============
3+
4+
This tutorial is intended as an introduction to working with
5+
**PyMongoArrow** and its corresponding extension types. The reader is assumed to be familiar with basic
6+
`PyMongo <https://pymongo.readthedocs.io/en/stable/tutorial.html>`_ and
7+
`MongoDB <https://docs.mongodb.com>`_ concepts. For more information see the `Arrow extension type docs <https://arrow.apache.org/docs/python/extending_types.html>`_.
8+
9+
Extension types with Arrow
10+
^^^^^^^^^^^^^^^^^^^^^^^^^^
11+
Both extension types, :class:`pymongoarrow.types.ObjectIdType` and :class:`pymongoarrow.types.Decimal128StringType`, are only partially supported in PyArrow. They will work when used in a
12+
schema, but will show up in the table as a `fixed_size_binary(12)` or `string` respectively, and will be :class:`pyarrow.lib.FixedSizeBinaryScalar` or :class:`pyarrow.lib.StringScalar`
13+
upon accessing the values::
14+
15+
schema = Schema({"_id": ObjectIdType(), "data": Decimal128StringType()})
16+
table = find_arrow_all(coll, {}, schema=schema)
17+
print(table)
18+
>>> pyarrow.Table
19+
>>> _id: fixed_size_binary[12]
20+
>>> data: string
21+
>>> ----
22+
>>> _id: [[63C003BF0A1D5281D33B0AFD,63C003BF0A1D5281D33B0AFE,63C003BF0A1D5281D33B0AFF,63C003BF0A1D5281D33B0B00]]
23+
>>> data: [["0.1","1.0","0.00001",null]]
24+
>>> ...
25+
print(type(table["_id"][0]))
26+
print(type(table["data"][0]))
27+
>>> <class 'pyarrow.lib.FixedSizeBinaryScalar'>
28+
>>> <class 'pyarrow.lib.StringScalar'>
29+
30+
31+
Extension types with Pandas/NumPy
32+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
33+
Extension types with Pandas/NumPy are only partially supported. They will work when used in a
34+
schema, but will show up in the table as a :class:`pandas.object`, but will be converted to either :class:`py.bytes` or
35+
:class:`py.str` upon accessing the values::
36+
37+
schema = Schema({"_id": ObjectIdType(), "data": Decimal128StringType()})
38+
table = find_pandas_all(coll, {}, schema=schema)
39+
print(table.info())
40+
>>> RangeIndex: 4 entries, 0 to 3
41+
>>> Data columns (total 2 columns):
42+
>>> # Column Non-Null Count Dtype
43+
>>> --- ------ -------------- -----
44+
>>> 0 _id 4 non-null object
45+
>>> 1 data 3 non-null object
46+
print(type(table["_id"][0]))
47+
print(type(table["data"][0]))
48+
>>> <class 'bytes'>
49+
>>> <class 'str'>

bindings/python/docs/source/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ know to use **PyMongoArrow**.
2121
:doc:`supported_types`
2222
A list of BSON types that are supported by PyMongoArrow.
2323

24+
:doc:`extension_types`
25+
A more in-depth explanation of the support for extension types such as ObjectId.
26+
2427
:doc:`faq`
2528
Frequently asked questions.
2629

@@ -83,6 +86,7 @@ Indices and tables
8386
installation
8487
quickstart
8588
supported_types
89+
extension_types
8690
faq
8791
api/index
8892
changelog

bindings/python/docs/source/supported_types.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ Supported Types
66
PyMongoArrow currently supports a small subset of all BSON types.
77
Support for additional types will be added in subsequent releases.
88

9+
.. note:: PyMongoArrow does not currently fully support extension types with Pandas/NumPy or Arrow.
10+
However, they can be used in schemas.
11+
This means that ObjectId and Decimal128 are not fully supported in Pandas DataFrames or Arrow Tables.
12+
Instead, the schema type will be converted to a string or object representation of the type.
13+
For more information see :doc:`extension_types`.
14+
915
.. note:: For more information about BSON types, see the
1016
`BSON specification <http://bsonspec.org/spec.html>`_.
1117

0 commit comments

Comments
 (0)