|
| 1 | +PEP: 819 |
| 2 | +Title: JSON Package Metadata |
| 3 | +Author: Emma Harper Smith < [email protected]> |
| 4 | +PEP-Delegate: Paul Moore |
| 5 | +Discussions-To: Pending |
| 6 | +Status: Draft |
| 7 | +Type: Standards Track |
| 8 | +Topic: Packaging |
| 9 | +Created: 18-Dec-2025 |
| 10 | +Post-History: Pending |
| 11 | + |
| 12 | + |
| 13 | +Abstract |
| 14 | +======== |
| 15 | + |
| 16 | +This PEP proposes introducing JSON encoded core metadata and wheel file format |
| 17 | +metadata files in Python packages. Python package metadata ("core metadata") |
| 18 | +was first defined in :pep:`241` to use :rfc:`822` email headers to encode |
| 19 | +information about packages. This was reasonable in 2001; email messages |
| 20 | +were the only widely used, standardized text format that had a parser in |
| 21 | +the standard library. However, issues with handling different encodings, |
| 22 | +differing handling of line breaks, and other differences between |
| 23 | +implementations have caused numerous packaging bugs. Using the JSON format for |
| 24 | +encoding metadata files would eliminate a wide range of these potential issues. |
| 25 | + |
| 26 | + |
| 27 | +Motivation |
| 28 | +========== |
| 29 | + |
| 30 | +The email message format has a number of complexities and limitations which |
| 31 | +reduce its utility as a portable textual interchange format for packaging |
| 32 | +metadata. Due to the :mod:`email` parser requiring configuration changes to |
| 33 | +properly generate valid core metadata, many projects do not use the |
| 34 | +:mod:`!email` module and instead generate core metadata in a custom manner. |
| 35 | +There are many pitfalls with generating email headers that can be encountered |
| 36 | +by such custom generators. First, core metadata fields may contain newlines in the |
| 37 | +value of fields. These newlines must be handled properly to "unfolded" multiple |
| 38 | +lines per :rfc:`822`. One particularly difficult to encode field is the |
| 39 | +``Description`` field, which may contain newlines and indentation. To encode |
| 40 | +the field in email headers, CRLF line breaks must be followed by seven (7) |
| 41 | +spaces and a pipe ('``|``') character. While ``Description`` may now be encoded in |
| 42 | +the message body, similar escaping issues occur for the ``Author`` and |
| 43 | +``Maintainer`` fields. Improperly escaped newlines can lead to missing, |
| 44 | +partial, or invalid core metadata. Second, as discussed in the |
| 45 | +:ref:`core metadata specifications <packaging:core-metadata>`: |
| 46 | + |
| 47 | +.. epigraph:: |
| 48 | + |
| 49 | + The standard file format for metadata (including in wheels and installed |
| 50 | + projects) is based on the format of email headers. However, email formats |
| 51 | + have been revised several times, and exactly which email RFC applies to |
| 52 | + packaging metadata is not specified. In the absence of a precise |
| 53 | + definition, the practical standard is set by what the standard library |
| 54 | + :mod:`email.parser` module can parse using the |
| 55 | + :data:`email.policy.compat32` policy. |
| 56 | + |
| 57 | +Since no specific email RFC is selected, the current core metadata |
| 58 | +specification is ambiguous whether a given core metadata document is valid. |
| 59 | +:rfc:`822` is the only email standard to be explicitly listed in a PEP. |
| 60 | +However, the core metadata specifications also requires that core metadata is |
| 61 | +encoded using UTF-8 when written to a file. This de-facto makes the core |
| 62 | +metadata follow :rfc:`6532`, which specifies internationalization of email |
| 63 | +headers. This has practical interoperability concerns. Until a few years ago, |
| 64 | +it was unspecified how to properly encode non-ASCII emails in core |
| 65 | +metadata, making parsing ambiguous. Third, the current format is difficult to |
| 66 | +properly validate and parse. Many tools do not check for issues with the output |
| 67 | +of the :mod:`!email` parser. If a document is malformed, it may still parse |
| 68 | +without error by the :mod:`!email` module as a valid email message. Furthermore, |
| 69 | +due to limitations in the email format, fields like ``Project-Url`` must create |
| 70 | +custom encodings of nested key-value items, further complicating parsing and |
| 71 | +validation. Finally, the lack of a schema makes it difficult to validate the |
| 72 | +contents of email message encoded metadata. While introducing a specification |
| 73 | +for the current format has been |
| 74 | +`discussed previously <https://discuss.python.org/t/7550>`__, no progress had |
| 75 | +been made, and converting to JSON was a suggested resolution to the issues |
| 76 | +raised. |
| 77 | + |
| 78 | +The ``WHEEL`` file format is currently encoded in a custom key-value format. |
| 79 | +While this format is easy to parse and write, it requires manual parsing and |
| 80 | +validation to ensure that the contents are valid. Moving to a JSON encoded |
| 81 | +format will allow for easier parsing and validation of the contents, and |
| 82 | +simplify packaging tools and services by using a consistent format for |
| 83 | +distribution metadata. |
| 84 | + |
| 85 | + |
| 86 | +Rationale |
| 87 | +========= |
| 88 | + |
| 89 | +Introducing a new core metadata file with a well-specified format will greatly |
| 90 | +ease generating, parsing, and validating metadata. JSON is a natural choice for |
| 91 | +storing package core metadata. It is easily machine readable and writable, is |
| 92 | +understandable to humans, and is well supported across many languages. |
| 93 | +Furthermore, :pep:`566` already specifies a canonicalization of email formatted |
| 94 | +core metadata to JSON. JSON is also a frequently used format for data |
| 95 | +interchange on the web. For discussion of other formats considered, please |
| 96 | +refer to the rejected ideas section. |
| 97 | + |
| 98 | +To maintain backwards compatibility, the JSON metadata file MUST be generated |
| 99 | +alongside the existing email formatted metadata file. This ensures that tools |
| 100 | +that do not support the new format can still read package metadata for new |
| 101 | +packages. |
| 102 | + |
| 103 | +The JSON formatted metadata file must be semantically equivalent to the email |
| 104 | +encoded file. This ensures that the metadata is unambiguous between the two |
| 105 | +formats, and tools may read either when both are present. To maintain |
| 106 | +performance, this equivalence is not required to be verified by installers, |
| 107 | +though other tools may do so. Some tools may choose to make the check dependent |
| 108 | +on a configuration flag. |
| 109 | + |
| 110 | +Package indexes SHOULD check that the metadata files are semantically |
| 111 | +equivalent when the package is added to the index. This is a low-cost, one-time |
| 112 | +check that ensures users of the index are served valid packages. |
| 113 | + |
| 114 | + |
| 115 | +Specification |
| 116 | +============= |
| 117 | + |
| 118 | +JSON Format Core Metadata File |
| 119 | +------------------------------ |
| 120 | + |
| 121 | +A new optional but recommended file ``METADATA.json`` shall be introduced as a |
| 122 | +metadata file for Python distribution packages. If generated, the ``METADATA.json`` file |
| 123 | +MUST be placed in the same directory as the current email formatted |
| 124 | +``METADATA`` or ``PKG-INFO`` file. |
| 125 | + |
| 126 | +For wheels, this means that ``METADATA.json`` MUST be located in the |
| 127 | +``.dist-info`` directory. |
| 128 | + |
| 129 | +If present, the ``METADATA.json`` file MUST be located in the root directory of |
| 130 | +the project sources in a source distribution package. Tools that prefer the |
| 131 | +JSON formatted metadata file MUST NOT assume the presence of the |
| 132 | +``METADATA.json`` file in the source distribution before reading the file. |
| 133 | + |
| 134 | +The semantic contents of the ``METADATA`` and ``METADATA.json`` files MUST be |
| 135 | +equivalent if ``METADATA.json`` is present. Installers MAY verify this |
| 136 | +information. Public package indexes SHOULD verify the files are semantically |
| 137 | +equivalent. |
| 138 | + |
| 139 | +The new ``METADATA.json`` file MUST be included in the |
| 140 | +:ref:`installed project metadata <packaging:recording-installed-packages>`, |
| 141 | +if present in the distribution metadata. |
| 142 | + |
| 143 | +Conversion of ``METADATA`` to JSON Encoding |
| 144 | +------------------------------------------- |
| 145 | + |
| 146 | +Conversion from the current email format for core metadata to JSON should |
| 147 | +follow the process described in :pep:`566`, with the following modification: |
| 148 | +the ``Project-URL`` entries should be converted into an object with keys |
| 149 | +containing the labels and values containing the URLs from the original email |
| 150 | +value. The overall process thus becomes: |
| 151 | + |
| 152 | +#. The original key-value format should be read with |
| 153 | + ``email.parser.HeaderParser``; |
| 154 | +#. All transformed keys should be reduced to lower case. Hyphens should be |
| 155 | + replaced with underscores, but otherwise should retain all other characters; |
| 156 | +#. The transformed value for any field marked with "(Multiple-use") should be a |
| 157 | + single list containing all the original values for the given key; |
| 158 | +#. The ``Keywords`` field should be converted to a list by splitting the |
| 159 | + original value on commas; |
| 160 | +#. The ``Project-URL`` field should be converted into a JSON object with keys |
| 161 | + containing the labels and values containing the URLs from the original email |
| 162 | + value. |
| 163 | +#. The message body, if present, should be set to the value of the |
| 164 | + ``description`` key. |
| 165 | +#. The result should be stored as a string-keyed dictionary. |
| 166 | + |
| 167 | +One edge case in the above conversion is that the ``Project-URL`` label is |
| 168 | +"free text, with a maximum length of 32 characters." This presents a problem |
| 169 | +when trying to decode the label. Therefore this PEP sets the requirement that |
| 170 | +the ``Project-URL`` label be any text *except* the comma (``,``) character. |
| 171 | +This allows for unambiguous parsing of the ``Project-URL`` entries by splitting |
| 172 | +the text on the left-most comma (``,``) character. |
| 173 | + |
| 174 | +JSON Schema for Core Metadata |
| 175 | +----------------------------- |
| 176 | + |
| 177 | +To enable verification of JSON encoded core metadata, a |
| 178 | +`JSON schema <https://json-schema.org/>`__ for core metadata has been produced. |
| 179 | +This schema will be updated with each revision to the core metadata |
| 180 | +specification. The schema is available in |
| 181 | +:ref:`0819-core-metadata-json-schema`. |
| 182 | + |
| 183 | +Serving METADATA.json in the Simple Repository API |
| 184 | +-------------------------------------------------- |
| 185 | + |
| 186 | +:pep:`658` introduced a means of serving package metadata in the Simple |
| 187 | +Repository API. The JSON encoded version of the package metadata may also be |
| 188 | +served, via the following modifications to the Simple Repository API: |
| 189 | + |
| 190 | +A new attribute ``data-dist-info-metadata-json`` may be added to anchor tags |
| 191 | +in the Simple API. This attribute should have a value containing the hash |
| 192 | +information for the ``METADATA.json`` file in the same format as |
| 193 | +``data-dist-info-metadata``. If ``data-dist-info-metadata-json`` is present, |
| 194 | +the repository MUST serve the JSON encoded metadata file at the |
| 195 | +distribution's path with ``.metadata.json`` appended to it. For example, if a |
| 196 | +distribution is served at ``/simple/foo-1.0-py3-none-any.whl``, the JSON |
| 197 | +encoded core metadata file MUST be served at |
| 198 | +``/simple/foo-1.0-py3-none-any.whl.metadata.json``. |
| 199 | + |
| 200 | +JSON Format Wheel Metadata File |
| 201 | +------------------------------- |
| 202 | + |
| 203 | +A new optional but recommended file ``WHEEL.json`` shall be introduced as a |
| 204 | +JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json`` |
| 205 | +file MUST be placed in the same directory as the current key-value formatted |
| 206 | +``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of |
| 207 | +the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. The wheel file |
| 208 | +format version will be incremented to ``1.1`` to reflect the introduction |
| 209 | +of ``WHEEL.json``. |
| 210 | + |
| 211 | +The ``WHEEL.json`` file SHOULD be preferred over the ``WHEEL`` file when both |
| 212 | +are present. |
| 213 | + |
| 214 | +Conversion of ``WHEEL`` to JSON Encoding |
| 215 | +---------------------------------------- |
| 216 | + |
| 217 | +Conversion from the current key-value format for wheel file format metadata to |
| 218 | +JSON should proceed as follows: |
| 219 | + |
| 220 | +#. The original key-value format should be read. |
| 221 | +#. All transformed keys should be reduced to lower case. Hyphens should be |
| 222 | + replaced with underscores, but otherwise should retain all other characters. |
| 223 | +#. The ``Tag`` field's entries should be converted to a list containing the |
| 224 | + original values. |
| 225 | +#. The result should be stored as a string-keyed dictionary. |
| 226 | + |
| 227 | +This follows a similar process to the conversion of ``METADATA`` to JSON |
| 228 | +encoding. |
| 229 | + |
| 230 | +JSON Schema for Wheel Metadata |
| 231 | +------------------------------ |
| 232 | + |
| 233 | +To enable verification of JSON encoded wheel file format metadata, a |
| 234 | +JSON schema for wheel metadata has been produced. |
| 235 | +This schema will be updated with each revision to the wheel metadata |
| 236 | +specification. The schema is available in :ref:`0819-wheel-json-schema`. |
| 237 | + |
| 238 | +Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files |
| 239 | +------------------------------------------------------------------ |
| 240 | + |
| 241 | +The ``METADATA``, ``PKG-INFO``, and ``WHEEL`` files are now deprecated. This |
| 242 | +means that a future PEP may make the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` |
| 243 | +files optional and require ``METADATA.json`` and ``WHEEL.json`` to be present. |
| 244 | +Please see the next section for more information on backwards compatibility |
| 245 | +caveats to that change. |
| 246 | + |
| 247 | +Despite the ``METADATA`` and ``PKG-INFO`` files being deprecated, new core |
| 248 | +metadata revisions should be implemented for both JSON and email to ensure that |
| 249 | +they may remain semantically equivalent. Similarly, new ``WHEEL`` metadata keys |
| 250 | +should be implemented for both JSON and key-value formats to ensure that they |
| 251 | +may remain semantically equivalent. |
| 252 | + |
| 253 | + |
| 254 | +Backwards Compatibility |
| 255 | +======================= |
| 256 | + |
| 257 | +The specification for ``METADATA.json`` and ``WHEEL.json`` is designed such |
| 258 | +that the new format is completely backwards compatible. Existing tools may read |
| 259 | +metadata from the existing email formatted files, and new tools may take |
| 260 | +advantage of the new format. |
| 261 | + |
| 262 | +A future major revision of the wheel specification may make the ``METADATA``, |
| 263 | +``PKG-INFO``, and ``WHEEL`` files optional and make the ``METADATA.json`` and |
| 264 | +``WHEEL.json`` files required. |
| 265 | + |
| 266 | +Note that tools will need to maintain parsing of email metadata and the |
| 267 | +key-value formatted ``WHEEL`` file indefinitely to support parsing metadata |
| 268 | +for old packages which only have the ``METADATA``, ``PKG-INFO``, |
| 269 | +or ``WHEEL`` files. |
| 270 | + |
| 271 | + |
| 272 | +Security Implications |
| 273 | +===================== |
| 274 | + |
| 275 | +One attack vector with JSON encoded core metadata is if the JSON payload is |
| 276 | +designed to consume excessive memory or CPU resources in a denial of service |
| 277 | +(DoS) attack. While this attack is not likely to affect users whom can cancel |
| 278 | +resource-intensive interactive operations, it may be an issue for package |
| 279 | +indexes. |
| 280 | + |
| 281 | +There are several mitigations that can be made to prevent this: |
| 282 | + |
| 283 | +#. The length of the JSON payload can be restricted to a reasonable size. |
| 284 | +#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int` |
| 285 | + and :class:`float` values to avoid quadratic number parsing time complexity |
| 286 | + attacks. |
| 287 | +#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python |
| 288 | + 3.15+ that will allow it to be configured to restrict the nesting of JSON |
| 289 | + payloads to a reasonable depth. Core metadata currently has a maximum depth |
| 290 | + of 2 to encode mapping and list fields. |
| 291 | + |
| 292 | +With these mitigations in place, concerns about denial of service attacks with |
| 293 | +JSON encoded core metadata are minimal. |
| 294 | + |
| 295 | + |
| 296 | +Reference Implementation |
| 297 | +======================== |
| 298 | + |
| 299 | +A reference implementation of the JSON schema for JSON core metadata is |
| 300 | +available in :ref:`0819-core-metadata-json-schema`. |
| 301 | + |
| 302 | +Furthermore, a reference implementation in the ``packaging`` library `is |
| 303 | +available |
| 304 | +<https://github.com/wheelnext/packaging/tree/PEP-9999-JSON-metadata>`__. |
| 305 | + |
| 306 | +A reference implementation generating both ``METADATA.json`` and ``WHEEL.json`` |
| 307 | +in the ``uv`` build backend `is also available <https://github.com/astral-sh/uv/pull/15510>`__. |
| 308 | + |
| 309 | + |
| 310 | +Rejected Ideas |
| 311 | +============== |
| 312 | + |
| 313 | +Using Another File Format (TOML, YAML, etc.) |
| 314 | +-------------------------------------------- |
| 315 | + |
| 316 | +While TOML or another format could be used for the new core metadata file |
| 317 | +format, JSON has been chosen for a few reasons: |
| 318 | + |
| 319 | +#. Core metadata is mostly meant as a machine interchange format to be used by |
| 320 | + tools and services which wish to interoperate. Therefore the |
| 321 | + human-readability of TOML is not an important consideration in this |
| 322 | + selection. |
| 323 | +#. JSON parsers are implemented in many languages' standard libraries and the |
| 324 | + :mod:`json` module has been part of Python's standard library for a very |
| 325 | + long time. |
| 326 | +#. JSON is fast to parse and emit. |
| 327 | +#. JSON schemas are JSON native and commonly used. |
| 328 | + |
| 329 | + |
| 330 | +Open Issues |
| 331 | +=========== |
| 332 | + |
| 333 | +Where should the JSON schema be served? |
| 334 | +--------------------------------------- |
| 335 | + |
| 336 | +Where should the standard JSON Schema be served? Some options would be |
| 337 | +packaging.python.org, pypi.org, python.org, or pypa.org. |
| 338 | + |
| 339 | +My first choice would be packaging.python.org, but I am open to other options. |
| 340 | + |
| 341 | + |
| 342 | +Acknowledgements |
| 343 | +================ |
| 344 | + |
| 345 | +Thanks to Konstantin Schütze for implementing the reference implementation of |
| 346 | +this PEP in the ``uv`` build backend and for providing valuable feedback on the |
| 347 | +specification. |
| 348 | + |
| 349 | + |
| 350 | +Copyright |
| 351 | +========= |
| 352 | + |
| 353 | +This document is placed in the public domain or under the |
| 354 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments