|
| 1 | +Extension Types |
| 2 | +=============== |
| 3 | + |
| 4 | +This tutorial is intended as an introduction to working with |
| 5 | +**PyMongoArrow** and its corresponding extension types. The reader is assumed to be familiar with basic |
| 6 | +`PyMongo <https://pymongo.readthedocs.io/en/stable/tutorial.html>`_ and |
| 7 | +`MongoDB <https://docs.mongodb.com>`_ concepts. For more information see the `Arrow extension type docs <https://arrow.apache.org/docs/python/extending_types.html>`_. |
| 8 | + |
| 9 | +Extension types with Arrow |
| 10 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 11 | +Both extension types, :class:`pymongoarrow.types.ObjectIdType` and :class:`pymongoarrow.types.Decimal128StringType`, are only partially supported in PyArrow. They will work when used in a |
| 12 | +schema, but will show up in the table as a `fixed_size_binary(12)` or `string` respectively, and will be :class:`pyarrow.lib.FixedSizeBinaryScalar` or :class:`pyarrow.lib.StringScalar` |
| 13 | +upon accessing the values:: |
| 14 | + |
| 15 | + schema = Schema({"_id": ObjectIdType(), "data": Decimal128StringType()}) |
| 16 | + table = find_arrow_all(coll, {}, schema=schema) |
| 17 | + print(table) |
| 18 | + >>> pyarrow.Table |
| 19 | + >>> _id: fixed_size_binary[12] |
| 20 | + >>> data: string |
| 21 | + >>> ---- |
| 22 | + >>> _id: [[63C003BF0A1D5281D33B0AFD,63C003BF0A1D5281D33B0AFE,63C003BF0A1D5281D33B0AFF,63C003BF0A1D5281D33B0B00]] |
| 23 | + >>> data: [["0.1","1.0","0.00001",null]] |
| 24 | + >>> ... |
| 25 | + print(type(table["_id"][0])) |
| 26 | + print(type(table["data"][0])) |
| 27 | + >>> <class 'pyarrow.lib.FixedSizeBinaryScalar'> |
| 28 | + >>> <class 'pyarrow.lib.StringScalar'> |
| 29 | + |
| 30 | + |
| 31 | +Extension types with Pandas/NumPy |
| 32 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 33 | +Extension types with Pandas/NumPy are only partially supported. They will work when used in a |
| 34 | +schema, but will show up in the table as a :class:`pandas.object`, but will be converted to either :class:`py.bytes` or |
| 35 | +:class:`py.str` upon accessing the values:: |
| 36 | + |
| 37 | + schema = Schema({"_id": ObjectIdType(), "data": Decimal128StringType()}) |
| 38 | + table = find_pandas_all(coll, {}, schema=schema) |
| 39 | + print(table.info()) |
| 40 | + >>> RangeIndex: 4 entries, 0 to 3 |
| 41 | + >>> Data columns (total 2 columns): |
| 42 | + >>> # Column Non-Null Count Dtype |
| 43 | + >>> --- ------ -------------- ----- |
| 44 | + >>> 0 _id 4 non-null object |
| 45 | + >>> 1 data 3 non-null object |
| 46 | + print(type(table["_id"][0])) |
| 47 | + print(type(table["data"][0])) |
| 48 | + >>> <class 'bytes'> |
| 49 | + >>> <class 'str'> |
0 commit comments