You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:ref:`custom_operators_label` if you are considering whether they are fruits or not. In that case, you can pass a *custom_operators* for the job.
38
38
39
39
encodings: List, default = None
40
-
Character encodings to iterate through when we convert bytes into strings. You may want to pass an explicit list of encodings in your objects if you start getting UnicodeDecodeError from DeepHash. Also check out ignore_encoding_errors if you can get away with ignoring these errors and don't want to bother with an explicit list of encodings but it will come at the price of slightly less accuracy of the final results. Example: encodings=["utf-8", "latin-1"]
40
+
:ref:`encodings_label` Character encodings to iterate through when we convert bytes into strings. You may want to pass an explicit list of encodings in your objects if you start getting UnicodeDecodeError from DeepHash. Also check out :ref:`ignore_encoding_errors_label` if you can get away with ignoring these errors and don't want to bother with an explicit list of encodings but it will come at the price of slightly less accuracy of the final results. Example: encodings=["utf-8", "latin-1"]
If you want to get away with UnicodeDecodeError without passing explicit character encodings, set this option to True. If you want to make sure the encoding is done properly, keep this as False and instead pass an explicit list of character encodings to be considered via the encodings parameter.
109
+
:ref:`ignore_encoding_errors_label` If you want to get away with UnicodeDecodeError without passing explicit character encodings, set this option to True. If you want to make sure the encoding is done properly, keep this as False and instead pass an explicit list of character encodings to be considered via the :ref:`encodings_label` parameter.
Character encodings to iterate through when we convert bytes into strings. You may want to pass an explicit list of encodings in your objects if you start getting UnicodeDecodeError from DeepHash. Also check out :ref:`ignore_encoding_errors_label` if you can get away with ignoring these errors and don't want to bother with an explicit list of encodings but it will come at the price of slightly less accuracy of the final results. Example: encodings=["utf-8", "latin-1"]
15
+
16
+
The reason the decoding of bytes to string is needed is that when `ignore_order = True` we calculate the hash of the objects in order to facilitate in diffing them. In order to calculate the hash, we serialize all objects into strings. During the serialization we may encounter issues with character encodings.
17
+
18
+
**Examples:**
19
+
20
+
Comparing bytes that have non UTF-8 encoding:
21
+
>>> from deepdiff import DeepDiff
22
+
>>> item =b"\xbc cup of flour"
23
+
>>> DeepDiff([b'foo'], [item], ignore_order=True)
24
+
Traceback (most recent call last):
25
+
raise UnicodeDecodeError(
26
+
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: Can not produce a hash for root: invalid start byte in 'p of flo...'. Please either pass ignore_encoding_errors=True or pass the encoding via encodings=['utf-8', '...'].
27
+
28
+
Let's try to pass both 'utf-8' and 'latin-1' as encodings to be tries:
{'values_changed': {'root[0]': {'new_value': b'\xbc cup of flour', 'old_value': b'foo'}}}
31
+
32
+
33
+
.. _ignore_encoding_errors_label:
34
+
35
+
Ignore Encoding Errors
36
+
----------------------
37
+
38
+
ignore_encoding_errors: Boolean, default = False
39
+
40
+
If you want to get away with UnicodeDecodeError without passing explicit character encodings, set this option to True. If you want to make sure the encoding is done properly, keep this as False and instead pass an explicit list of character encodings to be considered via the encodings parameter.
41
+
42
+
We can generally get the same results as above example if we just pass `ignore_encoding_errors=True`. However it comes at the cost of less accuracy of the results.
43
+
>>> DeepDiff([b'foo'], [b"\xbc cup of flour"], ignore_encoding_errors=True, ignore_order=True)
44
+
{'values_changed': {'root[0]': {'new_value': b'\xbc cup of flour', 'old_value': b'foo'}}}
45
+
46
+
For example if we replace `foo` with ` cup of flour`, we have bytes that are only different in the problematic character. Ignoring that character means DeepDiff will consider these 2 strings to be equal since their hash becomes the same. Note that we only hash items when `ignore_order=True`.
47
+
>>> DeepDiff([b" cup of flour"], [b"\xbc cup of flour"], ignore_encoding_errors=True, ignore_order=True)
48
+
{}
49
+
50
+
But if we had passed the proper encoding, it would have detected that these 2 bytes are different:
51
+
>>> DeepDiff([b" cup of flour"], [b"\xbc cup of flour"], encodings=['latin-1'], ignore_order=True)
52
+
{'values_changed': {'root[0]': {'new_value': b'\xbc cup of flour', 'old_value': b' cup of flour'}}}
0 commit comments