diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index a48e9dbc1b0..2b5b4fc18c1 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -646,6 +646,7 @@ peps/pep-0765.rst @iritkatriel @ncoghlan peps/pep-0766.rst @warsaw peps/pep-0767.rst @carljm peps/pep-0768.rst @pablogsal +peps/pep-0770.rst @sethmlarson @brettcannon # ... peps/pep-0777.rst @warsaw # ... diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst new file mode 100644 index 00000000000..8f451c14ab5 --- /dev/null +++ b/peps/pep-0770.rst @@ -0,0 +1,430 @@ +PEP: 770 +Title: Improving measurability of Python packages with Software Bill-of-Materials +Author: Seth Larson +Sponsor: Brett Cannon +PEP-Delegate: Brett Cannon +Discussions-To: https://discuss.python.org/t/70261 +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 02-Jan-2025 + +Abstract +======== + +Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic +method for describing software composition, provenance, heritage, and more. +SBOMs are used as inputs for software composition analysis (SCA) tools, +such as scanners for vulnerabilities and licenses, and have been gaining +traction in global software regulations and frameworks. + +This PEP proposes using SBOM documents included in Python packages as a +means to improve software measurability for Python packages. + +The changes will update the +`Core Metadata specification `__ to version 2.5. + +Motivation +========== + +Measurability and Phantom Dependencies +-------------------------------------- + +Python packages are particularly affected by the "`phantom dependency`_" +problem, where software components that aren't written in Python are included +in Python packages for many reasons, such as ease of installation and +compatibility with standards: + +* Python serves scientific, data, web, and machine-learning use-cases which + use compiled or non-Python languages like Rust, C, C++, Fortran, JavaScript, + and others. +* The Python wheel format is preferred by users due to the ease-of-installation. + No code is executed during the installation step, only extracting the archive. +* The Python wheel format requires bundling shared compiled libraries without + a method to encode metadata about these libraries. +* Packages related to Python packaging sometimes need to solve the + "bootstrapping" problem, so include pure Python projects inside their + source code. + +These software components can't be described using Python package metadata and +thus are likely to be missed by software composition analysis (SCA) software +which can mean vulnerable software components aren't reported accurately. + +`For example `__, +the Python package Pillow includes 16 shared object libraries in the wheel that +were bundled by auditwheel as a part of the build. None of those shared object +libraries are detected when using common SCA tools like Syft and Grype. +If an SBOM document is included annotating all the included shared libraries +then SCA tools can identify the included software reliably. + +Regulations +----------- + +SBOMs are required by recent software security regulations, like the +`Secure Software Development Framework`_ (SSDF) and the +`Cyber Resilience Act`_ (CRA). Due to their inclusion in these regulations, +the demand for SBOM documents of open source projects is expected to be high. +One goal is to minimize the demands on open source project maintainers by +enabling open source users that need SBOMs to self-serve using existing +tooling. + +Another goal is to enable contributions from users who need SBOMs to annotate +projects they depend on with SBOM information. Today there is no mechanism to +propagate the results of those contributions for a Python package so there is +no incentive for users to contribute this type of work. + +.. _Cyber Resilience Act: https://digital-strategy.ec.europa.eu/en/policies/cyber-resilience-act +.. _Secure Software Development Framework: https://csrc.nist.gov/Projects/ssdf + +Rationale +========= + +Using SBOM standards instead of Core Metadata fields +---------------------------------------------------- + +Attempting to add every field offered by SBOM standards into Python package +Core Metadata would result in an explosion of new Core Metadata fields, +including the need to keep up-to-date as SBOM standards continue to evolve +to suit new needs in that space. + +Instead, this proposal delegates SBOM-specific metadata to SBOM documents that +are included in Python packages and adds a new Core Metadata field for +discoverability of included SBOM documents. + +This standard also doesn't aim to replace Core Metadata with SBOMs, +instead focusing on the SBOM information being supplemental to Core Metadata. +Included SBOMs only contain information about dependencies included in the +package archive or information about the top-level software in the package that +can't be encoded into Core Metadata but is relevant for the SBOM use-case +("software identifiers", "purpose", "support level", etc). + +Zero-or-more SBOM documents per package +--------------------------------------- + +Rather than requiring at most one included SBOM document per Python package, +this PEP proposes that one or more SBOM documents may be included in a Python +package. This means that code attempting to annotate a Python package with SBOM +data may do so without being concerned about corrupting data already contained +within other SBOM documents. + +This decision also means this PEP is capable of supporting multiple SBOM +standards without favoring one, instead deferring support to SBOM consuming +tools. This decision also means that the merging of SBOM documents never needs +to be handled by Python tools, instead deferring this to SBOM consuming tools +who are better placed to handle cross-standard conversions. + +.. _770-spec: + +Specification +============= + +The changes necessary to implement this PEP include: + +* Additions to `Core Metadata <770-spec-core-metadata_>`_, as defined in the + `Core Metadata specification `__. +* Additions to the author-provided + `project source metadata <770-spec-project-source-metadata_>`_, as defined in the + `pyproject.toml specification `__. +* `Additions <770-spec-project-formats_>`_ to the source distribution (sdist), + built distribution (wheel), and installed project specifications. + +In addition to the above, an informational PEP will be created for tools +consuming included SBOM documents and other Python package metadata to +generate complete SBOM documents for Python packages. + +.. _770-spec-core-metadata: + +Core Metadata +------------- + +Add ``Sbom-File`` field +~~~~~~~~~~~~~~~~~~~~~~~ + +The ``Sbom-File`` is an optional Core Metadata field. Each instance contains a +string representation of the path of an SBOM document. The path is located +within the project source tree, relative to the project root directory. It is a +multi-use field that MAY appear zero or more times and each instance lists the +path to one such file. Files specified under this field are SBOM documents +that are distributed with the package. + +As `specified by this PEP <#770-spec-project-formats>`__, its value is also +that file's path relative to the root SBOM directory in both installed projects +and the standardized Distribution Package types. + +If an ``Sbom-File`` is listed in a +:term:`Source Distribution ` or +:term:`Built Distribution`'s Core Metadata: + +* That file MUST be included in the :term:`distribution archive` at the + specified path relative to the root license directory. +* That file MUST be installed with the :term:`project` at that same relative + path. +* Inside the root SBOM directory, packaging tools MUST reproduce the directory + structure under which the source files are located relative to the project + root. The root SBOM directory is + `specified in a later section <#770-spec-project-formats>`__. +* Path delimiters MUST be the forward slash character (``/``), and parent + directory indicators (``..``) MUST NOT be used. +* SBOM document contents MUST be UTF-8 encoded JSON according to :rfc:`8259`. +* SBOM document contents MUST use an SBOM standard, and for better + interoperability SHOULD be a well-known SBOM standard such as + `CycloneDX `_ or `SPDX `_. +* The "primary" component being described in included SBOM documents MUST be the + Python package. This is achieved in CycloneDX using the ``metadata.component`` + field and in SPDX using the ``DESCRIBES`` relationship. +* SBOM documents MUST include metadata for the timestamp when the SBOM document + was created. This information helps consuming tools understand the order that + multiple SBOM documents were created to untangle conflicts between various + stages building the Python package. +* SBOM documents SHOULD include metadata describing the tool creating the SBOM + document. This information helps users find which tool needs to be fixed in + the case of defects. + +For all newly-uploaded distribution archives that include one or more +``Sbom-File`` fields in their Core Metadata and declare a ``Metadata-Version`` +of ``2.5`` or higher, PyPI SHOULD validate that all specified files are present +in the distribution archives, are valid UTF-8 encoded JSON, and for well-known +SBOM standards provide the minimum required fields by those standards and this +PEP. + +.. _770-spec-project-source-metadata: + +Project source metadata +----------------------- + +This PEP specifies changes to the project's source metadata under a +``[project]`` table in the ``pyproject.toml`` file. + +Add ``sbom-files`` key +~~~~~~~~~~~~~~~~~~~~~~ + +A new ``sbom-files`` key is added to the ``[project]`` table for specifying +paths in the project source tree relative to ``pyproject.toml`` to file(s) +containing SBOMs to be distributed with the package. This key corresponds to the +``Sbom-File`` fields in the Core Metadata. + +Its value is an array of strings which MUST contain valid glob patterns, as +specified below: + +* Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``) + MUST be matched verbatim. +* Special glob characters: ``*``, ``?``, ``**`` and character ranges: ``[]`` + containing only the verbatim matched characters MUST be supported. Within + ``[...]``, the hyphen indicates a locale-agnostic range (e.g. a-z, order based + on Unicode code points). Hyphens at the start or end are matched literally. +* Path delimiters MUST be the forward slash character (``/``). Patterns are + relative to the directory containing ``pyproject.toml``, therefore the leading + slash character MUST NOT be used. +* Parent directory indicators (``..``) MUST NOT be used. + +Any characters or character sequences not covered by this specification are +invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD +reject invalid values with an error. + +Tools MUST assume that SBOM file content is valid UTF-8 encoded JSON, and SHOULD +validate this an raise an error for invalid formats and encodings. + +Literal paths (e.g. ``bom.cdx.json``) are treated as valid globs which means +they can also be defined. + +Build tools: + +* MUST treat each value as a glob pattern, and MUST raise an error if the + pattern contains invalid glob syntax. +* MUST include all files matched by a listed pattern in all distribution + archives. +* MUST list each matched file path under an ``Sbom-File`` field in the + Core Metadata. +* MUST raise an error if any individual user-specified pattern does not match + at least one file. + +If the ``sbom-files`` key is present and is set to a value of an empty array, +then tools MUST NOT include any SBOM files and MUST NOT raise an error. + +Examples of valid SBOM files declarations: + +.. code-block:: toml + + [project] + sbom-files = ["bom.json"] + + [project] + sbom-files = ["sboms/openssl.cdx.json", "licenses/openssl.spdx.json"] + + [project] + sbom-files = ["sboms/*"] + + [project] + sbom-files = [] + +Examples of invalid SBOM files declarations: + +.. code-block:: toml + + [project] + sbom-files = ["..\bom.json"] + +Reason: ``..`` must not be used. ``\\`` is an invalid path delimited, ``/`` +must be used. + +.. code-block:: toml + + [project] + sbom-files = ["bom{.json*"] + +Reason: ``bom{.json`` is not a valid glob. + +.. _770-spec-project-formats: + +SBOM files in project formats +----------------------------- + +A few additions will be made to the existing specifications. + +:term:`Project source trees ` + Per :ref:`639-spec-source-metadata` section, the + `Declaring Project Metadata specification `__ + will be updated to reflect that SBOM file paths MUST be relative to the + project root directory; i.e. the directory containing the ``pyproject.toml`` + (or equivalently, other legacy project configuration, + e.g. ``setup.py``, ``setup.cfg``, etc). + +:term:`Source distributions (sdists) ` + The sdist specification will be updated to reflect that if the + ``Metadata-Version`` is ``2.5`` or greater, the sdist MUST contain any SBOM + files specified by the ``Sbom-File`` field in the ``PKG-INFO`` at their + respective paths relative to the sdist (containing the ``pyproject.toml`` and + the ``PKG-INFO`` Core Metadata). + +:term:`Built distributions ` (:term:`wheels `) + The wheel specification will be updated to reflect that if the + ``Metadata-Version`` is ``2.5`` or greater and one or more ``Sbom-File`` + fields are specified, the ``.dist-info`` directory MUST contain an ``sboms`` + subdirectory, which MUST contain the files listed in the ``Sbom-File`` fields + in the ``METADATA`` file at their respective paths relative to the ``sboms`` + directory. + +:term:`Installed projects ` + The Recording Installed Projects specification will be updated to reflect that + if the ``Metadata-Version`` is ``2.5`` or greater and one or more + ``Sbom-File`` fields is specified, the ``.dist-info`` directory MUST contain + an ``sboms`` subdirectory which MUST contain the files listed in the + ``Sbom-File`` fields in the ``METADATA`` file at their respective paths + relative to the ``sboms`` directory, and that any files in this directory MUST + be copied from wheels by install tools. + +Backwards Compatibility +======================= + +There are no backwards compatibility concerns for this PEP. + +The changes to Python package Core Metadata and ``pyproject.toml`` are +only additive, this PEP doesn't change the behavior of any existing fields. + +Tools which are processing Python packages can use the ``Sbom-File`` core +metadata field to clearly delineate between packages which include SBOM +documents that implement this PEP (and thus have more requirements) and +packages which include SBOM documents before this PEP was authored. + +Security Implications +===================== + +SBOM documents are only as useful as the information encoded in them. +If an SBOM document contains incorrect information then this can result in +incorrect downstream analysis by SCA tools. For this reason, it's important +for tools including SBOM data into Python packages to be confident in the +information they are recording. SBOMs are capable of recording "known unknowns" +in addition to known data. This practice is recommended when not certain about +the data being recorded to allow for further analysis by users. + +Because SBOM documents can encode information about the original system +where a Python package is built (for example, the operating system name and +version, less commonly the names of paths). This information has the potential +to "leak" through the Python package to installers via SBOMs. If this +information is sensitive, then that could represent a security risk. + +How to Teach This +================= + +Most typical users of Python and Python packages won't need to know the details +of this standard. The details of this standard are most important to either +maintainers of Python packages and developers of SCA tools such as +SBOM generation tools and vulnerability scanners. + +Most Python packages don't contain code from other software components and thus +are already measurable by SCA tools without the need of this standard or +additional SBOM documents. Pure-Python packages are about `~90% `__ +of popular packages on PyPI. + +For projects that do contain other software components, documentation will be +added to the Python Packaging User Guide for how to specify and maintain +SBOM documents for Python packages in source code. + +A follow-up informational PEP will be authored to describe how to transform +Python packaging metadata, including the mechanism described in this PEP, +into an SBOM document describing Python packages. + +Reference Implementation +======================== + +`Auditwheel fork `_ +which generates CycloneDX SBOM documents to include in wheels describing +bundled shared library files. These SBOM documents worked as expected for the +Syft and Grype SBOM and vulnerability scanners. + +Rejected Ideas +============== + +Why not require a single SBOM standard? +--------------------------------------- + +Most discussion and development around SBOMs today focuses on two SBOM +standards: `CycloneDX `_ and `SPDX `_. There is no clear +"winner" between these two standards, both standards are frequently used by +projects and software ecosystems. + +Because both standards are frequently used, tools for consuming and processing +SBOM documents commonly need to support both standards. This means that this PEP +is not constrained to select a single SBOM standard by its consumers and thus +can allow tools creating SBOM documents for inclusion in Python packages to +choose which SBOM standard works best for their use-case. + +Open Issues +=========== + +Conditional project source SBOM files +------------------------------------- + +How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional? + +References +========== + +* `Visualizing the Python package SBOM data flow `_. + This is a graphic that shows how this PEP fits into the bigger picture of + Python packaging's SBOM data story. + +* `Adding SBOMs to Python wheels with auditwheel `_. + This was some early results from a fork of auditwheel to add SBOM data to a + wheel and then use an SBOM generation tool Syft to detect the SBOM in the + installed package. + +.. _phantom dependency: https://www.endorlabs.com/learn/dependency-resolution-in-python-beware-the-phantom-dependency +.. _coremetadataspec: https://packaging.python.org/specifications/core-metadata +.. _pyprojecttoml: https://packaging.python.org/en/latest/specifications/pyproject-toml/ +.. _spdxspec: https://spdx.dev/use/specifications/ +.. _cyclonedxspec: https://cyclonedx.org/specification/overview/ +.. _pypi-data: https://github.com/sethmlarson/pypi-data + +Acknowledgements +================ + +Thanks to Karolina Surma for authoring and leading :pep:`639` to acceptance. +This PEP copies the specification from :pep:`639` for specifying files in +project source metadata, Core Metadata, and project formats is based on. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.