Skip to content

Commit fe18443

Browse files
PEP 819: JSON Package Metadata (#4751)
Co-authored-by: Adam Turner <[email protected]>
1 parent 9b46975 commit fe18443

File tree

5 files changed

+656
-0
lines changed

5 files changed

+656
-0
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,8 @@ peps/pep-0814.rst @vstinner @corona10
692692
peps/pep-0815.rst @emmatyping
693693
peps/pep-0816.rst @brettcannon
694694
# ...
695+
peps/pep-0819.rst @emmatyping
696+
# ...
695697
peps/pep-2026.rst @hugovk
696698
# ...
697699
peps/pep-3000.rst @gvanrossum

peps/pep-0819.rst

Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
PEP: 819
2+
Title: JSON Package Metadata
3+
Author: Emma Harper Smith <[email protected]>
4+
PEP-Delegate: Paul Moore
5+
Discussions-To: Pending
6+
Status: Draft
7+
Type: Standards Track
8+
Topic: Packaging
9+
Created: 18-Dec-2025
10+
Post-History: Pending
11+
12+
13+
Abstract
14+
========
15+
16+
This PEP proposes introducing JSON encoded core metadata and wheel file format
17+
metadata files in Python packages. Python package metadata ("core metadata")
18+
was first defined in :pep:`241` to use :rfc:`822` email headers to encode
19+
information about packages. This was reasonable in 2001; email messages
20+
were the only widely used, standardized text format that had a parser in
21+
the standard library. However, issues with handling different encodings,
22+
differing handling of line breaks, and other differences between
23+
implementations have caused numerous packaging bugs. Using the JSON format for
24+
encoding metadata files would eliminate a wide range of these potential issues.
25+
26+
27+
Motivation
28+
==========
29+
30+
The email message format has a number of complexities and limitations which
31+
reduce its utility as a portable textual interchange format for packaging
32+
metadata. Due to the :mod:`email` parser requiring configuration changes to
33+
properly generate valid core metadata, many projects do not use the
34+
:mod:`!email` module and instead generate core metadata in a custom manner.
35+
There are many pitfalls with generating email headers that can be encountered
36+
by such custom generators. First, core metadata fields may contain newlines in the
37+
value of fields. These newlines must be handled properly to "unfolded" multiple
38+
lines per :rfc:`822`. One particularly difficult to encode field is the
39+
``Description`` field, which may contain newlines and indentation. To encode
40+
the field in email headers, CRLF line breaks must be followed by seven (7)
41+
spaces and a pipe ('``|``') character. While ``Description`` may now be encoded in
42+
the message body, similar escaping issues occur for the ``Author`` and
43+
``Maintainer`` fields. Improperly escaped newlines can lead to missing,
44+
partial, or invalid core metadata. Second, as discussed in the
45+
:ref:`core metadata specifications <packaging:core-metadata>`:
46+
47+
.. epigraph::
48+
49+
The standard file format for metadata (including in wheels and installed
50+
projects) is based on the format of email headers. However, email formats
51+
have been revised several times, and exactly which email RFC applies to
52+
packaging metadata is not specified. In the absence of a precise
53+
definition, the practical standard is set by what the standard library
54+
:mod:`email.parser` module can parse using the
55+
:data:`email.policy.compat32` policy.
56+
57+
Since no specific email RFC is selected, the current core metadata
58+
specification is ambiguous whether a given core metadata document is valid.
59+
:rfc:`822` is the only email standard to be explicitly listed in a PEP.
60+
However, the core metadata specifications also requires that core metadata is
61+
encoded using UTF-8 when written to a file. This de-facto makes the core
62+
metadata follow :rfc:`6532`, which specifies internationalization of email
63+
headers. This has practical interoperability concerns. Until a few years ago,
64+
it was unspecified how to properly encode non-ASCII emails in core
65+
metadata, making parsing ambiguous. Third, the current format is difficult to
66+
properly validate and parse. Many tools do not check for issues with the output
67+
of the :mod:`!email` parser. If a document is malformed, it may still parse
68+
without error by the :mod:`!email` module as a valid email message. Furthermore,
69+
due to limitations in the email format, fields like ``Project-Url`` must create
70+
custom encodings of nested key-value items, further complicating parsing and
71+
validation. Finally, the lack of a schema makes it difficult to validate the
72+
contents of email message encoded metadata. While introducing a specification
73+
for the current format has been
74+
`discussed previously <https://discuss.python.org/t/7550>`__, no progress had
75+
been made, and converting to JSON was a suggested resolution to the issues
76+
raised.
77+
78+
The ``WHEEL`` file format is currently encoded in a custom key-value format.
79+
While this format is easy to parse and write, it requires manual parsing and
80+
validation to ensure that the contents are valid. Moving to a JSON encoded
81+
format will allow for easier parsing and validation of the contents, and
82+
simplify packaging tools and services by using a consistent format for
83+
distribution metadata.
84+
85+
86+
Rationale
87+
=========
88+
89+
Introducing a new core metadata file with a well-specified format will greatly
90+
ease generating, parsing, and validating metadata. JSON is a natural choice for
91+
storing package core metadata. It is easily machine readable and writable, is
92+
understandable to humans, and is well supported across many languages.
93+
Furthermore, :pep:`566` already specifies a canonicalization of email formatted
94+
core metadata to JSON. JSON is also a frequently used format for data
95+
interchange on the web. For discussion of other formats considered, please
96+
refer to the rejected ideas section.
97+
98+
To maintain backwards compatibility, the JSON metadata file MUST be generated
99+
alongside the existing email formatted metadata file. This ensures that tools
100+
that do not support the new format can still read package metadata for new
101+
packages.
102+
103+
The JSON formatted metadata file must be semantically equivalent to the email
104+
encoded file. This ensures that the metadata is unambiguous between the two
105+
formats, and tools may read either when both are present. To maintain
106+
performance, this equivalence is not required to be verified by installers,
107+
though other tools may do so. Some tools may choose to make the check dependent
108+
on a configuration flag.
109+
110+
Package indexes SHOULD check that the metadata files are semantically
111+
equivalent when the package is added to the index. This is a low-cost, one-time
112+
check that ensures users of the index are served valid packages.
113+
114+
115+
Specification
116+
=============
117+
118+
JSON Format Core Metadata File
119+
------------------------------
120+
121+
A new optional but recommended file ``METADATA.json`` shall be introduced as a
122+
metadata file for Python distribution packages. If generated, the ``METADATA.json`` file
123+
MUST be placed in the same directory as the current email formatted
124+
``METADATA`` or ``PKG-INFO`` file.
125+
126+
For wheels, this means that ``METADATA.json`` MUST be located in the
127+
``.dist-info`` directory.
128+
129+
If present, the ``METADATA.json`` file MUST be located in the root directory of
130+
the project sources in a source distribution package. Tools that prefer the
131+
JSON formatted metadata file MUST NOT assume the presence of the
132+
``METADATA.json`` file in the source distribution before reading the file.
133+
134+
The semantic contents of the ``METADATA`` and ``METADATA.json`` files MUST be
135+
equivalent if ``METADATA.json`` is present. Installers MAY verify this
136+
information. Public package indexes SHOULD verify the files are semantically
137+
equivalent.
138+
139+
The new ``METADATA.json`` file MUST be included in the
140+
:ref:`installed project metadata <packaging:recording-installed-packages>`,
141+
if present in the distribution metadata.
142+
143+
Conversion of ``METADATA`` to JSON Encoding
144+
-------------------------------------------
145+
146+
Conversion from the current email format for core metadata to JSON should
147+
follow the process described in :pep:`566`, with the following modification:
148+
the ``Project-URL`` entries should be converted into an object with keys
149+
containing the labels and values containing the URLs from the original email
150+
value. The overall process thus becomes:
151+
152+
#. The original key-value format should be read with
153+
``email.parser.HeaderParser``;
154+
#. All transformed keys should be reduced to lower case. Hyphens should be
155+
replaced with underscores, but otherwise should retain all other characters;
156+
#. The transformed value for any field marked with "(Multiple-use") should be a
157+
single list containing all the original values for the given key;
158+
#. The ``Keywords`` field should be converted to a list by splitting the
159+
original value on commas;
160+
#. The ``Project-URL`` field should be converted into a JSON object with keys
161+
containing the labels and values containing the URLs from the original email
162+
value.
163+
#. The message body, if present, should be set to the value of the
164+
``description`` key.
165+
#. The result should be stored as a string-keyed dictionary.
166+
167+
One edge case in the above conversion is that the ``Project-URL`` label is
168+
"free text, with a maximum length of 32 characters." This presents a problem
169+
when trying to decode the label. Therefore this PEP sets the requirement that
170+
the ``Project-URL`` label be any text *except* the comma (``,``) character.
171+
This allows for unambiguous parsing of the ``Project-URL`` entries by splitting
172+
the text on the left-most comma (``,``) character.
173+
174+
JSON Schema for Core Metadata
175+
-----------------------------
176+
177+
To enable verification of JSON encoded core metadata, a
178+
`JSON schema <https://json-schema.org/>`__ for core metadata has been produced.
179+
This schema will be updated with each revision to the core metadata
180+
specification. The schema is available in
181+
:ref:`0819-core-metadata-json-schema`.
182+
183+
Serving METADATA.json in the Simple Repository API
184+
--------------------------------------------------
185+
186+
:pep:`658` introduced a means of serving package metadata in the Simple
187+
Repository API. The JSON encoded version of the package metadata may also be
188+
served, via the following modifications to the Simple Repository API:
189+
190+
A new attribute ``data-dist-info-metadata-json`` may be added to anchor tags
191+
in the Simple API. This attribute should have a value containing the hash
192+
information for the ``METADATA.json`` file in the same format as
193+
``data-dist-info-metadata``. If ``data-dist-info-metadata-json`` is present,
194+
the repository MUST serve the JSON encoded metadata file at the
195+
distribution's path with ``.metadata.json`` appended to it. For example, if a
196+
distribution is served at ``/simple/foo-1.0-py3-none-any.whl``, the JSON
197+
encoded core metadata file MUST be served at
198+
``/simple/foo-1.0-py3-none-any.whl.metadata.json``.
199+
200+
JSON Format Wheel Metadata File
201+
-------------------------------
202+
203+
A new optional but recommended file ``WHEEL.json`` shall be introduced as a
204+
JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json``
205+
file MUST be placed in the same directory as the current key-value formatted
206+
``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of
207+
the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. The wheel file
208+
format version will be incremented to ``1.1`` to reflect the introduction
209+
of ``WHEEL.json``.
210+
211+
The ``WHEEL.json`` file SHOULD be preferred over the ``WHEEL`` file when both
212+
are present.
213+
214+
Conversion of ``WHEEL`` to JSON Encoding
215+
----------------------------------------
216+
217+
Conversion from the current key-value format for wheel file format metadata to
218+
JSON should proceed as follows:
219+
220+
#. The original key-value format should be read.
221+
#. All transformed keys should be reduced to lower case. Hyphens should be
222+
replaced with underscores, but otherwise should retain all other characters.
223+
#. The ``Tag`` field's entries should be converted to a list containing the
224+
original values.
225+
#. The result should be stored as a string-keyed dictionary.
226+
227+
This follows a similar process to the conversion of ``METADATA`` to JSON
228+
encoding.
229+
230+
JSON Schema for Wheel Metadata
231+
------------------------------
232+
233+
To enable verification of JSON encoded wheel file format metadata, a
234+
JSON schema for wheel metadata has been produced.
235+
This schema will be updated with each revision to the wheel metadata
236+
specification. The schema is available in :ref:`0819-wheel-json-schema`.
237+
238+
Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files
239+
------------------------------------------------------------------
240+
241+
The ``METADATA``, ``PKG-INFO``, and ``WHEEL`` files are now deprecated. This
242+
means that a future PEP may make the ``METADATA``, ``PKG-INFO``, and ``WHEEL``
243+
files optional and require ``METADATA.json`` and ``WHEEL.json`` to be present.
244+
Please see the next section for more information on backwards compatibility
245+
caveats to that change.
246+
247+
Despite the ``METADATA`` and ``PKG-INFO`` files being deprecated, new core
248+
metadata revisions should be implemented for both JSON and email to ensure that
249+
they may remain semantically equivalent. Similarly, new ``WHEEL`` metadata keys
250+
should be implemented for both JSON and key-value formats to ensure that they
251+
may remain semantically equivalent.
252+
253+
254+
Backwards Compatibility
255+
=======================
256+
257+
The specification for ``METADATA.json`` and ``WHEEL.json`` is designed such
258+
that the new format is completely backwards compatible. Existing tools may read
259+
metadata from the existing email formatted files, and new tools may take
260+
advantage of the new format.
261+
262+
A future major revision of the wheel specification may make the ``METADATA``,
263+
``PKG-INFO``, and ``WHEEL`` files optional and make the ``METADATA.json`` and
264+
``WHEEL.json`` files required.
265+
266+
Note that tools will need to maintain parsing of email metadata and the
267+
key-value formatted ``WHEEL`` file indefinitely to support parsing metadata
268+
for old packages which only have the ``METADATA``, ``PKG-INFO``,
269+
or ``WHEEL`` files.
270+
271+
272+
Security Implications
273+
=====================
274+
275+
One attack vector with JSON encoded core metadata is if the JSON payload is
276+
designed to consume excessive memory or CPU resources in a denial of service
277+
(DoS) attack. While this attack is not likely to affect users whom can cancel
278+
resource-intensive interactive operations, it may be an issue for package
279+
indexes.
280+
281+
There are several mitigations that can be made to prevent this:
282+
283+
#. The length of the JSON payload can be restricted to a reasonable size.
284+
#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int`
285+
and :class:`float` values to avoid quadratic number parsing time complexity
286+
attacks.
287+
#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python
288+
3.15+ that will allow it to be configured to restrict the nesting of JSON
289+
payloads to a reasonable depth. Core metadata currently has a maximum depth
290+
of 2 to encode mapping and list fields.
291+
292+
With these mitigations in place, concerns about denial of service attacks with
293+
JSON encoded core metadata are minimal.
294+
295+
296+
Reference Implementation
297+
========================
298+
299+
A reference implementation of the JSON schema for JSON core metadata is
300+
available in :ref:`0819-core-metadata-json-schema`.
301+
302+
Furthermore, a reference implementation in the ``packaging`` library `is
303+
available
304+
<https://github.com/wheelnext/packaging/tree/PEP-9999-JSON-metadata>`__.
305+
306+
A reference implementation generating both ``METADATA.json`` and ``WHEEL.json``
307+
in the ``uv`` build backend `is also available <https://github.com/astral-sh/uv/pull/15510>`__.
308+
309+
310+
Rejected Ideas
311+
==============
312+
313+
Using Another File Format (TOML, YAML, etc.)
314+
--------------------------------------------
315+
316+
While TOML or another format could be used for the new core metadata file
317+
format, JSON has been chosen for a few reasons:
318+
319+
#. Core metadata is mostly meant as a machine interchange format to be used by
320+
tools and services which wish to interoperate. Therefore the
321+
human-readability of TOML is not an important consideration in this
322+
selection.
323+
#. JSON parsers are implemented in many languages' standard libraries and the
324+
:mod:`json` module has been part of Python's standard library for a very
325+
long time.
326+
#. JSON is fast to parse and emit.
327+
#. JSON schemas are JSON native and commonly used.
328+
329+
330+
Open Issues
331+
===========
332+
333+
Where should the JSON schema be served?
334+
---------------------------------------
335+
336+
Where should the standard JSON Schema be served? Some options would be
337+
packaging.python.org, pypi.org, python.org, or pypa.org.
338+
339+
My first choice would be packaging.python.org, but I am open to other options.
340+
341+
342+
Acknowledgements
343+
================
344+
345+
Thanks to Konstantin Schütze for implementing the reference implementation of
346+
this PEP in the ``uv`` build backend and for providing valuable feedback on the
347+
specification.
348+
349+
350+
Copyright
351+
=========
352+
353+
This document is placed in the public domain or under the
354+
CC0-1.0-Universal license, whichever is more permissive.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:orphan:
2+
3+
.. _0819-core-metadata-json-schema:
4+
5+
Appendix: JSON Schema for Core Metadata
6+
=======================================
7+
8+
.. literalinclude:: core-metadata.schema.json
9+
:language: json
10+
:linenos:
11+
:name: core-metadata-schema
12+
13+
.. _0819-wheel-json-schema:
14+
15+
Appendix: JSON Schema for Wheel Metadata
16+
========================================
17+
18+
.. literalinclude:: wheel.schema.json
19+
:language: json
20+
:linenos:
21+
:name: wheel-schema

0 commit comments

Comments
 (0)