Skip to content

Commit cff867a

Browse files
authored
PEP 752: Address feedback, round 4 (#3955)
1 parent 4878da5 commit cff867a

File tree

1 file changed

+185
-63
lines changed

1 file changed

+185
-63
lines changed

peps/pep-0752.rst

Lines changed: 185 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -24,44 +24,58 @@ Motivation
2424
==========
2525

2626
The current ecosystem lacks a way for projects with many packages to signal a
27-
verified pattern of ownership. Some examples:
27+
verified pattern of ownership. Such projects fall into two categories.
28+
29+
The first category is projects [1]_ that want complete control over their
30+
namespace. A few examples:
2831

29-
* `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
30-
maintain type stubs for various packages. The stub packages they maintain
31-
mirror the package name they target and are prefixed by ``types-``. For
32-
example, the package ``requests`` has a stub that users would depend on
33-
called ``types-requests``.
3432
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
35-
for each feature's corresponding package [1]_. For example, most of Google's
33+
for each feature's corresponding package [3]_. For example, most of Google's
3634
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
3735
`using virtual machines <https://cloud.google.com/products/compute>`__.
38-
* Many projects [2]_ support a model where some packages are officially
39-
maintained and third-party developers are encouraged to participate by
40-
creating their own. For example, `Datadog <https://www.datadoghq.com>`__
41-
offers observability as a service for organizations at any scale. The
42-
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
43-
with
44-
`official integrations <https://github.com/DataDog/integrations-core>`__
45-
for many products, like various databases and web servers, which are
46-
distributed as Python packages that are prefixed by ``datadog-``. There is
47-
support for creating `third-party integrations`__ which customers may run.
36+
* `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
37+
observability with `official packages`__ for the core APIs and SDK with
38+
`contrib packages`__ to collect data from various sources. All packages
39+
are prefixed by ``opentelemetry-`` with child prefixes in the form
40+
``opentelemetry-<component>-<name>-``. The contrib packages live in a
41+
central repository and they are the only ones with the ability to publish.
4842

49-
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
43+
__ https://github.com/open-telemetry/opentelemetry-python
44+
__ https://github.com/open-telemetry/opentelemetry-python-contrib
45+
46+
The second category is projects [2]_ that want to share their namespace such
47+
that some packages are officially maintained and third-party developers are
48+
encouraged to participate by publishing their own. Some examples:
49+
50+
* `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
51+
tooling for sharing interactive documents. They support `extensions`__
52+
which in most cases (and in all cases for officially maintained
53+
extensions) are prefixed by ``jupyter-``.
54+
* `Django <https://www.djangoproject.com>`__ is one of the most widely used web
55+
frameworks in existence. They have the concept of `reusable apps`__, which
56+
are commonly installed via
57+
`third-party packages <https://djangopackages.org>`__ that implement a subset
58+
of functionality to extend Django-based websites. These packages are by
59+
convention prefixed by ``django-`` or ``dj-``.
60+
61+
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
62+
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
5063

5164
Such projects are uniquely vulnerable to name-squatting attacks
5265
which can ultimately result in `dependency confusion`__.
5366

5467
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
5568

5669
For example, say a new product is released for which monitoring would be
57-
valuable. It would be reasonable to assume that Datadog would eventually
58-
support it as an official integration. It takes a nontrivial amount of time to
59-
deliver such an integration due to roadmap prioritization and the time required
60-
for implementation. It would be impossible to reserve the name of every
61-
potential package so in the interim an attacker may create a package that
62-
appears legitimate which would execute malicious code at runtime. Not only are
63-
users more likely to install such packages but doing so taints the perception
64-
of the entire project.
70+
valuable. It would be reasonable to assume that
71+
`Datadog <https://www.datadoghq.com>`__ would eventually support it as an
72+
official integration. It takes a nontrivial amount of time to deliver such an
73+
integration due to roadmap prioritization and the time required for
74+
implementation. It would be impossible to reserve the name of every potential
75+
package so in the interim an attacker may create a package that appears
76+
legitimate which would execute malicious code at runtime. Not only are users
77+
more likely to install such packages but doing so taints the perception of the
78+
entire project.
6579

6680
Although :pep:`708` attempts to address this attack vector, it is specifically
6781
about the case of multiple repositories being considered during dependency
@@ -71,7 +85,13 @@ Namespacing also would drastically reduce the incidence of
7185
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
7286
because typos would have to be in the prefix itself which is
7387
`normalized <naming_>`_ and likely to be a short, well-known identifier like
74-
``aws-``.
88+
``aws-``. In recent years, typosquatting has become a popular attack vector
89+
[4]_.
90+
91+
The `current protection`__ against typosquatting used by PyPI is to normalize
92+
similar characters but that is insufficient for these use cases.
93+
94+
__ https://github.com/pypi/warehouse/blob/8615326918a180eb2652753743eac8e74f96a90b/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
7595

7696
Rationale
7797
=========
@@ -113,6 +133,11 @@ namespace. Any solution that requires new package syntax must be built atop the
113133
existing flat namespace and therefore implicit namespaces acquired via a
114134
reservation mechanism would be a prerequisite to such explicit namespaces.
115135

136+
Although existing packages matching a reserved namespace would be untouched,
137+
preventing future unauthorized uploads and strategically applying :pep:`541`
138+
takedown requests for malicious cases would reduce risks to users to a
139+
negligible level.
140+
116141
Terminology
117142
===========
118143

@@ -219,6 +244,8 @@ other organizations to use the grant. In this case, the authorized
219244
organizations have no special permissions and are equivalent to an open grant
220245
without ownership.
221246

247+
.. _hidden-grants:
248+
222249
Hidden Grants
223250
-------------
224251

@@ -235,7 +262,7 @@ restrictions without the need to expose the namespace to the public.
235262
Repository Metadata
236263
-------------------
237264

238-
The :pep:`JSON API <691>` version will be incremented from ``1.0`` to ``1.1``.
265+
The :pep:`JSON API <691>` version will be incremented from ``1.2`` to ``1.3``.
239266
The following API changes MUST be implemented by repositories that support
240267
this PEP. Repositories that do not support this PEP MUST NOT implement these
241268
changes so that consumers of the API are able to determine whether the
@@ -295,6 +322,19 @@ When a reserved namespace becomes unclaimed, repositories MUST set the
295322
Namespaces that were previously claimed but are now not SHOULD be eligible for
296323
claiming again by any organization.
297324

325+
Community Buy-in
326+
================
327+
328+
Representatives from the following organizations have expressed support for
329+
this PEP (with a link to the discussion):
330+
331+
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
332+
* `Typeshed <https://discuss.python.org/t/1609/37>`__
333+
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
334+
(`expanded <https://discuss.python.org/t/61227/48>`__)
335+
* `Microsoft <https://discuss.python.org/t/63191/40>`__
336+
* `DataDog <https://discuss.python.org/t/63191/53>`__
337+
298338
Backwards Compatibility
299339
=======================
300340

@@ -358,6 +398,73 @@ packages released with the scoping would be incompatible with older tools and
358398
would cause confusion for users along with frustration from maintainers having
359399
to triage such complaints.
360400

401+
Encourage Dedicated Package Repositories
402+
----------------------------------------
403+
404+
Critically, this imposes a burden on projects to maintain their own infra. This
405+
is an unrealistic expectation for the vast majority of companies and a complete
406+
non-starter for community projects.
407+
408+
This does not help in most cases because the default behavior of most package
409+
managers is to use PyPI so users attempting to perform a simple ``pip install``
410+
would already be vulnerable to malicious packages.
411+
412+
In this theoretical future every project must document how to add their
413+
repository to dependency resolution, which would be different for each package
414+
manager. Few package managers are able to download specific dependencies from
415+
specific repositories and would require users to use verbose configuration in
416+
the common case.
417+
418+
The ones that do not support this would instead find a given package using an
419+
ordered enumeration of repositories, leading to dependency confusion.
420+
For example, say a user wants two packages from two custom repositories ``X``
421+
and ``Y``. If each repository has both packages but one is malicious on ``X``
422+
and the other is malicious on ``Y`` then the user would be unable to satisfy
423+
their requirements without encountering a malicious package.
424+
425+
Use Fixed Prefixes
426+
------------------
427+
428+
The idea here would be to have one or more top-level fixed prefixes that are
429+
used for namespace reservations:
430+
431+
* ``com-``: Reserved for corporate organizations.
432+
* ``org-``: Reserved for community organizations.
433+
434+
Organizations would then apply for a namespace prefixed by the type of their
435+
organization.
436+
437+
This would cause perpetual disruption because when projects begin it is unknown
438+
whether a user base will be large enough to warrant a namespace reservation.
439+
Whenever that happens the project would have to be renamed which would put a
440+
high maintenance burden on the project maintainers and would cause confusion
441+
for users who have to learn a new way to reference the project's packages.
442+
The potential for this deterring projects from reserving namespaces at all is
443+
high.
444+
445+
Another issue with this approach is that projects often have branding in mind
446+
(`example`__) and would be reluctant to change their package names.
447+
448+
__ https://github.com/apache/airflow/discussions/41657#discussioncomment-10417439
449+
450+
It's unrealistic to expect every company and project to voluntarily change
451+
their existing and future package names.
452+
453+
Use DNS
454+
-------
455+
456+
The `idea <https://discuss.python.org/t/63455>`__ here is to add a new
457+
metadata field to projects in the API called ``domain-authority``. Repositories
458+
would support a new endpoint for verifying the domain via HTTPS. Clients would
459+
then support options to allow certain domains.
460+
461+
This does not solve the problem for the target audience who do not check where
462+
their packages are coming from and is more about checking for the integrity of
463+
uploads which is already supported in a more secure way by :pep:`740`.
464+
465+
Most projects do not have a domain and could not benefit from this, unfairly
466+
favoring organizations that have the financial means to acquire one.
467+
361468
Open Issues
362469
===========
363470

@@ -366,56 +473,71 @@ None at this time.
366473
Footnotes
367474
=========
368475

369-
.. [1] The following shows the package prefixes for the major cloud providers:
370-
371-
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
372-
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
373-
and others based on ``google-``
374-
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
476+
.. [1] Additional examples of projects with restricted namespaces:
375477
376-
.. [2] Some examples of projects that have many packages with a common prefix:
377-
378-
- `Django <https://www.djangoproject.com>`__ is one of the most widely used
379-
web frameworks in existence. They have the concept of `reusable apps`__,
380-
which are commonly installed via
381-
`third-party packages <https://djangopackages.org>`__ that implement a
382-
subset of functionality to extend Django-based websites. These packages
383-
are by convention prefixed by ``django-`` or ``dj-``.
384-
- `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
385-
tooling for sharing interactive documents. They support `extensions`__
386-
which in most cases (and in all cases for officially maintained
387-
extensions) are prefixed by ``jupyter-``.
388-
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
389-
framework. They have the concept of `plugins`__ which may be developed by
390-
anyone and by convention are prefixed by ``pytest-``.
391-
- `MkDocs <https://www.mkdocs.org>`__ is a documentation framework based on
392-
Markdown files. They also have the concept of
393-
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
394-
developed by anyone and are usually prefixed by ``mkdocs-``.
478+
- `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
479+
maintain type stubs for various packages. The stub packages they maintain
480+
mirror the package name they target and are prefixed by ``types-``. For
481+
example, the package ``requests`` has a stub that users would depend on
482+
called ``types-requests``. Unofficial stubs are not supposed to use the
483+
``types-`` prefix and are expected to use a ``-stubs`` suffix instead.
395484
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
396485
popular for large technical projects such as
397486
`Swift <https://www.swift.org>`__ and Python itself. They have
398487
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
399488
many of which are maintained within a
400489
`dedicated organization <https://github.com/sphinx-contrib>`__.
401-
- `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
402-
observability with `official packages`__ for the core APIs and SDK with
403-
`third-party packages`__ to collect data from various sources. All
404-
packages are prefixed by ``opentelemetry-`` with child prefixes in the
405-
form ``opentelemetry-<component>-<name>-``.
406490
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
407491
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
408492
They have the concept of `plugins`__, and also `providers`__ which are
409493
prefixed by ``apache-airflow-providers-``.
410494
411-
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
412-
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
413-
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
495+
.. [2] Additional examples of projects with open namespaces:
496+
497+
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
498+
framework. They have the concept of `plugins`__ which may be developed by
499+
anyone and by convention are prefixed by ``pytest-``.
500+
- `MkDocs <https://www.mkdocs.org>`__ is a documentation framework based on
501+
Markdown files. They also have the concept of
502+
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
503+
developed by anyone and are usually prefixed by ``mkdocs-``.
504+
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service
505+
for organizations at any scale. The
506+
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
507+
with
508+
`official integrations <https://github.com/DataDog/integrations-core>`__
509+
for many products, like various databases and web servers, which are
510+
distributed as Python packages that are prefixed by ``datadog-``. There is
511+
support for creating `third-party integrations`__ which customers may run.
512+
513+
.. [3] The following shows the package prefixes for the major cloud providers:
514+
515+
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
516+
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
517+
and others based on ``google-``
518+
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
519+
520+
.. [4] Examples of typosquatting attacks targeting Python users:
521+
522+
- ``django-`` namespace was squatted, among other packages, leading to
523+
a `postmortem <https://mail.python.org/pipermail/security-announce/2017-September/000000.html>`__
524+
by PyPI.
525+
- ``cupy-`` namespace was
526+
`squatted <https://github.com/cupy/cupy/issues/4787>`__ by a malicious
527+
actor thousands of times.
528+
- ``scikit-`` namespace was
529+
`squatted <https://blog.phylum.io/a-pypi-typosquatting-campaign-post-mortem/>`__,
530+
among other packages. Notice how packages with a known prefix are much
531+
more prone to successful attacks.
532+
- ``typing-`` namespace was
533+
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
534+
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
535+
414536
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
415-
__ https://github.com/open-telemetry/opentelemetry-python
416-
__ https://github.com/open-telemetry/opentelemetry-python-contrib
417537
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
418538
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
539+
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
540+
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
419541

420542
Copyright
421543
=========

0 commit comments

Comments
 (0)