Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 39 additions & 21 deletions peps/pep-0819.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ Specification
JSON Format Core Metadata File
------------------------------

A new optional but recommended file ``METADATA.json`` shall be introduced as a
metadata file for Python distribution packages. If generated, the ``METADATA.json`` file
A new required file ``METADATA.json`` shall be introduced as a
metadata file for Python distribution packages. The ``METADATA.json`` file
MUST be placed in the same directory as the current email formatted
``METADATA`` or ``PKG-INFO`` file.

Expand Down Expand Up @@ -200,8 +200,8 @@ encoded core metadata file MUST be served at
JSON Format Wheel Metadata File
-------------------------------

A new optional but recommended file ``WHEEL.json`` shall be introduced as a
JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json``
A new required file ``WHEEL.json`` shall be introduced as a
JSON encoded version of the ``WHEEL`` file. The ``WHEEL.json``
file MUST be placed in the same directory as the current key-value formatted
``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of
the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. The wheel file
Expand Down Expand Up @@ -235,6 +235,20 @@ JSON schema for wheel metadata has been produced.
This schema will be updated with each revision to the wheel metadata
specification. The schema is available in :ref:`0819-wheel-json-schema`.

Handling of Duplicate Keys in JSON Package Metadata
---------------------------------------------------

JSON does not define semantics for duplicate keys in a JSON document. However,
different parsers treat duplicate keys differently. Tools SHOULD NOT generate
duplicate keys in JSON package metadata. However, it is likely duplicate keys
may be generated anyway, so tools consuming JSON package metadata should handle
duplicate keys gracefully. In the interest of compatibility and matching the
behavior of the Python :mod:`!json` module, if duplicate keys are encountered,
the second duplicate key should be used as the data for that key. This matches
the behavior of many JSON parsers such as those in Python, Rust, Go, and the
ECMAScript Standard. Tools MAY warn about duplicate keys in JSON package
metadata.

Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files
------------------------------------------------------------------

Expand Down Expand Up @@ -272,25 +286,20 @@ or ``WHEEL`` files.
Security Implications
=====================

One attack vector with JSON encoded core metadata is if the JSON payload is
designed to consume excessive memory or CPU resources in a denial of service
(DoS) attack. While this attack is not likely to affect users whom can cancel
resource-intensive interactive operations, it may be an issue for package
indexes.

There are several mitigations that can be made to prevent this:
Maliciously crafted JSON encoded metadata files have the potential to cause a
denial of service attack due to the quadratic parsing time complexity of
reading integer strings as reported in
`CVE-2020-10735 <https://github.com/advisories/GHSA-6jr7-xr67-mgxw>`__. No
package metadata fields are currently encoded as integers, so this risk can be
mitigated by decoding integer values as strings when parsing JSON package
metadata.

#. The length of the JSON payload can be restricted to a reasonable size.
#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int`
and :class:`float` values to avoid quadratic number parsing time complexity
attacks.
#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python
3.15+ that will allow it to be configured to restrict the nesting of JSON
payloads to a reasonable depth. Core metadata currently has a maximum depth
of 2 to encode mapping and list fields.
If using the Python :mod:`!json` module, parsing integers as strings
can be accomplished by setting the ``parse_int`` keyword argument to
:func:`json.load` or :func:`json.loads` to :class:`str`.

With these mitigations in place, concerns about denial of service attacks with
JSON encoded core metadata are minimal.
With this mitigation in place, concerns about denial of service attacks with
JSON encoded package metadata are considered minimal.


Reference Implementation
Expand Down Expand Up @@ -326,6 +335,15 @@ format, JSON has been chosen for a few reasons:
#. JSON is fast to parse and emit.
#. JSON schemas are JSON native and commonly used.

Make the JSON Package Metadata Files Optional
---------------------------------------------

A future major revision of the wheel format specification may make the
``METADATA.json`` and ``WHEEL.json`` files the default. Therefore, tools should
begin generating and consuming JSON package metadata files to ensure tools are
prepared for the future transition to the JSON package metadata files being
the default.


Open Issues
===========
Expand Down
Loading