Skip to content

Commit 0ea5a15

Browse files
committed
PYTHON-1819 Documentation & examples for custom type encoding/decoding
functionality
1 parent 3b29458 commit 0ea5a15

File tree

3 files changed

+201
-57
lines changed

3 files changed

+201
-57
lines changed

bson/codec_options.py

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ class TypeEncoder(ABC):
4141
4242
Codec classes must implement the ``python_type`` attribute, and the
4343
``transform_python`` method to support encoding.
44+
45+
See :ref:`custom-type-type-codec` documentation for an example.
4446
"""
4547
@abstractproperty
4648
def python_type(self):
@@ -59,6 +61,8 @@ class TypeDecoder(ABC):
5961
6062
Codec classes must implement the ``bson_type`` attribute, and the
6163
``transform_bson`` method to support decoding.
64+
65+
See :ref:`custom-type-type-codec` documentation for an example.
6266
"""
6367
@abstractproperty
6468
def bson_type(self):
@@ -73,13 +77,15 @@ def transform_bson(self, value):
7377

7478
class TypeCodec(TypeEncoder, TypeDecoder):
7579
"""Base class for defining type codec classes which describe how a
76-
custom type can be transformed to/from one of the types BSON already
77-
understands, and can encode/decode.
80+
custom type can be transformed to/from one of the types :mod:`bson`
81+
can already encode/decode.
7882
7983
Codec classes must implement the ``python_type`` attribute, and the
8084
``transform_python`` method to support encoding, as well as the
8185
``bson_type`` attribute, and the ``transform_bson`` method to support
8286
decoding.
87+
88+
See :ref:`custom-type-type-codec` documentation for an example.
8389
"""
8490
pass
8591

@@ -96,14 +102,19 @@ class TypeRegistry(object):
96102
>>> type_registry = TypeRegistry([Codec1, Codec2, Codec3, ...],
97103
... fallback_encoder)
98104
105+
See :ref:`custom-type-type-registry` documentation for an example.
106+
99107
:Parameters:
100108
- `type_codecs` (optional): iterable of type codec instances. If
101109
``type_codecs`` contains multiple codecs that transform a single
102110
python or BSON type, the transformation specified by the type codec
103-
occurring last prevails.
111+
occurring last prevails. A TypeError will be raised if one or more
112+
type codecs modify the encoding behavior of a built-in :mod:`bson`
113+
type.
104114
- `fallback_encoder` (optional): callable that accepts a single,
105-
unencodable python value and transforms it into a type that BSON can
106-
encode.
115+
unencodable python value and transforms it into a type that
116+
:mod:`bson` can encode. See :ref:`fallback-encoder-callable`
117+
documentation for an example.
107118
"""
108119
def __init__(self, type_codecs=None, fallback_encoder=None):
109120
self.__type_codecs = list(type_codecs or [])
@@ -217,6 +228,9 @@ class CodecOptions(_options_base):
217228
- `type_registry`: Instance of :class:`TypeRegistry` used to customize
218229
encoding and decoding behavior.
219230
231+
.. versionadded:: 3.8
232+
`type_registry` attribute.
233+
220234
.. warning:: Care must be taken when changing
221235
`unicode_decode_error_handler` from its default value ('strict').
222236
The 'replace' and 'ignore' modes should not be used when documents

doc/examples/custom_type.rst

Lines changed: 179 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ codec, which is used to populate a :class:`~bson.codec_options.TypeRegistry`.
77
The type registry can then be used to create a custom-type-aware
88
:class:`~pymongo.collection.Collection`. Read and write operations
99
issued against the resulting collection object transparently manipulate
10-
documents as they are saved or retrieved from MongoDB.
10+
documents as they are saved to or retrieved from MongoDB.
1111

1212

13-
Setup
14-
-----
13+
Setting Up
14+
----------
1515

1616
We'll start by getting a clean database to use for the example:
1717

@@ -26,10 +26,10 @@ We'll start by getting a clean database to use for the example:
2626
Since the purpose of the example is to demonstrate working with custom types,
2727
we'll need a custom data type to use. For this example, we will be working with
2828
the :py:class:`~decimal.Decimal` type from Python's standard library. Since the
29-
BSON library has a :class:`~bson.decimal128.Decimal128` type (that implements
30-
the IEEE 754 decimal128 decimal-based floating-point numbering format) which
31-
is distinct from Python's built-in :py:class:`~decimal.Decimal` type, when we
32-
try to save an instance of ``Decimal`` with PyMongo, we get an
29+
BSON library's :class:`~bson.decimal128.Decimal128` type (that implements
30+
the IEEE 754 decimal128 decimal-based floating-point numbering format) is
31+
distinct from Python's built-in :py:class:`~decimal.Decimal` type, attempting
32+
to save an instance of ``Decimal`` with PyMongo, results in an
3333
:exc:`~bson.errors.InvalidDocument` exception.
3434

3535
.. doctest::
@@ -44,13 +44,13 @@ try to save an instance of ``Decimal`` with PyMongo, we get an
4444

4545
.. _custom-type-type-codec:
4646

47-
The Type Codec
48-
--------------
47+
The :class:`~bson.codec_options.TypeCodec` Class
48+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4949

5050
.. versionadded:: 3.8
5151

52-
In order to encode custom types, we must first define a **type codec** for our
53-
type. A type codec describes how an instance of a custom type can be
52+
In order to encode a custom type, we must first define a **type codec** for
53+
that type. A type codec describes how an instance of a custom type can be
5454
*transformed* to and/or from one of the types :mod:`~bson` already understands.
5555
Depending on the desired functionality, users must choose from the following
5656
base classes when defining type codecs:
@@ -62,7 +62,7 @@ base classes when defining type codecs:
6262
decodes a specified BSON type into a custom Python type. Users must implement
6363
the ``bson_type`` property/attribute and the ``transform_bson`` method.
6464
* :class:`~bson.codec_options.TypeCodec`: subclass this to define a codec that
65-
can both encode from and decode to a custom type. Users must implement the
65+
can both encode and decode a custom type. Users must implement the
6666
``python_type`` and ``bson_type`` properties/attributes, as well as the
6767
``transform_python`` and ``transform_bson`` methods.
6868

@@ -93,14 +93,14 @@ interested in both encoding and decoding our custom type, we use the
9393

9494
.. _custom-type-type-registry:
9595

96-
The Type Registry
97-
-----------------
96+
The :class:`~bson.codec_options.TypeRegistry` Class
97+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9898

9999
.. versionadded:: 3.8
100100

101101
Before we can begin encoding and decoding our custom type objects, we must
102-
first inform PyMongo about our type codec. This is done by creating a
103-
:class:`~bson.codec_options.TypeRegistry` instance:
102+
first inform PyMongo about the corresponding codec. This is done by creating
103+
a :class:`~bson.codec_options.TypeRegistry` instance:
104104

105105
.. doctest::
106106

@@ -113,7 +113,7 @@ Once instantiated, registries are immutable and the only way to add codecs
113113
to a registry is to create a new one.
114114

115115

116-
Putting it together
116+
Putting It Together
117117
-------------------
118118

119119
Finally, we can define a :class:`~bson.codec_options.CodecOptions` instance
@@ -201,35 +201,79 @@ This is trivial to do since the same transformation as the one used for
201201
information, it is impossible to discern which incoming
202202
:class:`~bson.decimal128.Decimal128` value needs to be decoded as ``Decimal``
203203
and which needs to be decoded as ``DecimalInt``. This example only considers
204-
the situation where a user wants to *encode* documents containing one or both
204+
the situation where a user wants to *encode* documents containing either
205205
of these types.
206206

207-
Now, we can create a new codec options object and use it to get a collection
208-
object:
207+
After creating a new codec options object and using it to get a collection
208+
object, we can seamlessly encode instances of ``DecimalInt``:
209209

210210
.. doctest::
211211

212212
>>> type_registry = TypeRegistry([decimal_codec, decimalint_codec])
213213
>>> codec_options = CodecOptions(type_registry=type_registry)
214214
>>> collection = db.get_collection('test', codec_options=codec_options)
215215
>>> collection.drop()
216-
217-
218-
We can now seamlessly encode instances of ``DecimalInt``. Note that the
219-
``transform_bson`` method of the base codec class results in these values
220-
being decoded as ``Decimal`` (and not ``DecimalInt``):
221-
222-
.. doctest::
223-
224216
>>> collection.insert_one({'num': DecimalInt("45.321")})
225217
<pymongo.results.InsertOneResult object at ...>
226218
>>> mydoc = collection.find_one()
227219
>>> pprint.pprint(mydoc)
228220
{u'_id': ObjectId('...'), u'num': Decimal('45.321')}
229221

222+
Note that the ``transform_bson`` method of the base codec class results in
223+
these values being decoded as ``Decimal`` (and not ``DecimalInt``).
224+
225+
226+
.. _decoding-binary-types:
227+
228+
Decoding :class:`~bson.binary.Binary` Types
229+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
230+
231+
The decoding treatment of :class:`~bson.binary.Binary` types having
232+
``subtype = 0`` by the :mod:`bson` module varies slightly depending on the
233+
version of the Python runtime in use. This must be taken into account while
234+
writing a ``TypeDecoder`` that modifies how this datatype is decoded.
235+
236+
On Python 3.x, :class:`~bson.binary.Binary` data (``subtype = 0``) is decoded
237+
as a ``bytes`` instance:
238+
239+
.. code-block:: python
240+
241+
>>> # On Python 3.x.
242+
>>> from bson.binary import Binary
243+
>>> newcoll = db.get_collection('new')
244+
>>> newcoll.insert_one({'_id': 1, 'data': Binary(b"123", subtype=0)})
245+
>>> doc = newcoll.find_one()
246+
>>> type(doc['data'])
247+
bytes
248+
249+
250+
On Python 2.7.x, the same data is decoded as a :class:`~bson.binary.Binary`
251+
instance:
252+
253+
.. code-block:: python
254+
255+
>>> # On Python 2.7.x
256+
>>> newcoll = db.get_collection('new')
257+
>>> doc = newcoll.find_one()
258+
>>> type(doc['data'])
259+
bson.binary.Binary
230260
231-
The Fallback Encoder
232-
--------------------
261+
262+
As a consequence of this disparity, users must set the ``bson_type`` attribute
263+
on their :class:`~bson.codec_options.TypeDecoder` classes differently,
264+
depending on the python version in use.
265+
266+
267+
.. note::
268+
269+
For codebases requiring compatibility with both Python 2 and 3, type
270+
decoders will have to be registered for both possible ``bson_type`` values.
271+
272+
273+
.. _fallback-encoder-callable:
274+
275+
The ``fallback_encoder`` Callable
276+
---------------------------------
233277

234278
.. versionadded:: 3.8
235279

@@ -268,27 +312,110 @@ We can now seamlessly encode instances of :py:class:`~decimal.Decimal`:
268312
>>> pprint.pprint(mydoc)
269313
{u'_id': ObjectId('...'), u'num': Decimal128('45.321')}
270314

271-
As you can tell, fallback encoders are a compelling alternative to type codecs
272-
when we only want to encode custom types due to their much simpler API.
273-
Users should note however, that fallback encoders cannot be used to modify the
274-
encoding of types that PyMongo already understands, as illustrated by the
275-
following example:
276315

277-
>>> def fallback_encoder(value):
278-
... """Encoder that converts floats to int."""
279-
... if isinstance(value, float):
280-
... return int(value)
281-
... return value
282-
>>> type_registry = TypeRegistry(fallback_encoder=fallback_encoder)
283-
>>> codec_options = CodecOptions(type_registry=type_registry)
284-
>>> collection = db.get_collection('test', codec_options=codec_options)
285-
>>> collection.drop()
286-
>>> collection.insert_one({'num': 45.321})
287-
<pymongo.results.InsertOneResult object at ...>
288-
>>> mydoc = collection.find_one()
289-
>>> pprint.pprint(mydoc)
290-
{u'_id': ObjectId('...'), u'num': 45.321}
316+
.. note::
317+
318+
Fallback encoders are invoked *after* attempts to encode the given value
319+
with standard BSON encoders and any configured type encoders have failed.
320+
Therefore, in a type registry configured with a type encoder and fallback
321+
encoder that both target the same custom type, the behavior specified in
322+
the type encoder will prevail.
323+
324+
325+
Because fallback encoders don't need to declare the types that they encode
326+
beforehand, they can be used to support interesting use-cases that cannot be
327+
serviced by ``TypeEncoder``. One such use-case is described in the next
328+
section.
329+
330+
331+
Encoding Unknown Types
332+
^^^^^^^^^^^^^^^^^^^^^^
333+
334+
In this example, we demonstrate how a fallback encoder can be used to save
335+
arbitrary objects to the database. We will use the the standard library's
336+
:py:mod:`pickle` module to serialize the unknown types and so naturally, this
337+
approach only works for types that are picklable.
338+
339+
We start by defining some arbitrary custom types:
340+
341+
.. code-block:: python
342+
343+
class MyStringType(object):
344+
def __init__(self, value):
345+
self.__value = value
346+
def __repr__(self):
347+
return "MyStringType('%s')" % (self.__value,)
348+
349+
class MyNumberType(object):
350+
def __init__(self, value):
351+
self.__value = value
352+
def __repr__(self):
353+
return "MyNumberType(%s)" % (self.__value,)
354+
355+
We also define a fallback encoder that pickles whatever objects it receives
356+
and returns them as :class:`~bson.binary.Binary` instances with a custom
357+
subtype. The custom subtype, in turn, allows us to write a TypeDecoder that
358+
identifies pickled artifacts upon retrieval and transparently decodes them
359+
back into Python objects:
360+
361+
.. code-block:: python
362+
363+
import pickle
364+
from bson.binary import Binary, USER_DEFINED_SUBTYPE
365+
def fallback_pickle_encoder(value):
366+
return Binary(pickle.dumps(value), USER_DEFINED_SUBTYPE)
367+
368+
class PickledBinaryDecoder(TypeDecoder):
369+
bson_type = Binary
370+
def transform_bson(self, value):
371+
if value.subtype == USER_DEFINED_SUBTYPE:
372+
return pickle.loads(value)
373+
return value
374+
375+
376+
.. note::
377+
378+
The above example is written assuming the use of Python 3. If you are using
379+
Python 2, ``bson_type`` must be set to ``Binary``. See the
380+
:ref:`decoding-binary-types` section for a detailed explanation.
381+
382+
383+
Finally, we create a ``CodecOptions`` instance:
384+
385+
.. code-block:: python
386+
387+
codec_options = CodecOptions(type_registry=TypeRegistry(
388+
[PickledBinaryDecoder()], fallback_encoder=fallback_pickle_encoder))
389+
390+
We can now round trip our custom objects to MongoDB:
391+
392+
.. code-block:: python
393+
394+
collection = db.get_collection('test_fe', codec_options=codec_options)
395+
collection.insert_one({'_id': 1, 'str': MyStringType("hello world"),
396+
'num': MyNumberType(2)})
397+
mydoc = collection.find_one()
398+
assert isinstance(mydoc['str'], MyStringType)
399+
assert isinstance(mydoc['num'], MyNumberType)
400+
401+
402+
Limitations
403+
-----------
404+
405+
PyMongo's type codec and fallback encoder features have the following
406+
limitations:
291407

292-
This is due to the fact that fallback encoders are invoked only after
293-
an attempt to encode the value with type codecs and standard BSON encoding
294-
routines has been unsuccessful.
408+
#. Users cannot customize the encoding behavior of Python types that PyMongo
409+
already understands like ``int`` and ``str`` (the 'built-in types').
410+
Attempting to instantiate a type registry with one or more codecs that act
411+
upon a built-in type results in a ``TypeError``. This limitation extends
412+
to all subtypes of the standard types.
413+
#. Chaining type encoders is not supported. A custom type value, once
414+
transformed by a codec's ``transform_python`` method, *must* result in a
415+
type that is either BSON-encodable by default, or can be
416+
transformed by the fallback encoder into something BSON-encodable--it
417+
*cannot* be transformed a second time by a different type codec.
418+
#. The :meth:`~pymongo.database.Database.command` method does not apply the
419+
user's TypeDecoders while decoding the command response document.
420+
#. :mod:`gridfs` does not apply custom type encoding or decoding to any
421+
documents received from or to returned to the user.

pymongo/database.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,9 @@ def command(self, command, value=1, check=True,
616616
:attr:`read_preference` or :attr:`codec_options`. You must use the
617617
`read_preference` and `codec_options` parameters instead.
618618
619+
.. note:: :meth:`command` does **not** apply any custom TypeDecoders
620+
when decoding the command response.
621+
619622
.. versionchanged:: 3.6
620623
Added ``session`` parameter.
621624

0 commit comments

Comments
 (0)