Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Doc/library/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,8 @@ error handling schemes by accepting the *errors* string argument:
The following error handlers can be used with all Python
:ref:`standard-encodings` codecs:

.. The following tables are reproduced on the library/functions page under open.

.. tabularcolumns:: |l|L|

+-------------------------+-----------------------------------------------+
Expand Down
69 changes: 41 additions & 28 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1423,37 +1423,50 @@ are always available. They are listed here in alphabetical order.
*errors* is an optional string that specifies how encoding and decoding
errors are to be handled—this cannot be used in binary mode.
A variety of standard error handlers are available
(listed under :ref:`error-handlers`), though any
error handling name that has been registered with
(listed under :ref:`error-handlers`, and summarized below for convenience),
though any error handling name that has been registered with
:func:`codecs.register_error` is also valid. The standard names
include:

* ``'strict'`` to raise a :exc:`ValueError` exception if there is
an encoding error. The default value of ``None`` has the same
effect.

* ``'ignore'`` ignores errors. Note that ignoring encoding errors
can lead to data loss.

* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
where there is malformed data.

* ``'surrogateescape'`` will represent any incorrect bytes as low
surrogate code units ranging from U+DC80 to U+DCFF.
These surrogate code units will then be turned back into
the same bytes when the ``surrogateescape`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.

* ``'xmlcharrefreplace'`` is only supported when writing to a file.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.

* ``'backslashreplace'`` replaces malformed data by Python's backslashed
escape sequences.

* ``'namereplace'`` (also only supported when writing)
replaces unsupported characters with ``\N{...}`` escape sequences.
.. list-table::
:header-rows: 1

* - Error handler
- Description
* - ``'strict'``
- Raise a :exc:`UnicodeError` (or a subclass) exception if there is
an error. The default value of ``None`` has the same effect.
* - ``'ignore'``
- Ignore the malformed data and continue without further notice.
Note that ignoring encoding errors can lead to data loss.
* - ``'replace'``
- Replace malformed data with a replacement marker.
On writing, use ``?`` (ASCII character 63).
On reading, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER)
* - ``'backslashreplace'``
- Replace malformed data with backslashed escape sequences.
On writing, use hexadecimal form of Unicode code points with formats
:samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`.
On reading, use hexadecimal form of byte value with format :samp:`\\x{hh}`.
* - ``'surrogateescape'``
- Will represent any incorrect bytes as low
surrogate code units ranging from ``U+DC80`` to ``U+DCFF``.
These surrogate code units will then be turned back into
the same bytes when the ``'surrogateescape'`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.
* - ``'surrogatepass'``
- Only available for Unicode codecs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these all Unicode codecs?

Suggested change
- Only available for Unicode codecs.
- Only available for UTF-8, UTF-16 and UTF-32 codecs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codecs documentation lists the little/big endian variants, though I think wr can be less specific here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but “Unicode codecs” sounds like a proper term, while I see no definition that would link it to the UTF-{8,16,32} codecs specifically.

Allow encoding and decoding surrogate code points
(``U+D800`` - ``U+DFFF``) as normal code points. Otherwise these codecs
treat the presence of surrogate code points in :class:`str` as an error.
* - ``'xmlcharrefreplace'``
- Only supported when writing.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.
* - ``'namereplace'``
- Only supported when writing. Replaces unsupported characters with
``\N{...}`` escape sequences.

.. index::
single: universal newlines; open() built-in function
Expand Down
Loading