@@ -24,44 +24,58 @@ Motivation
24
24
==========
25
25
26
26
The current ecosystem lacks a way for projects with many packages to signal a
27
- verified pattern of ownership. Some examples:
27
+ verified pattern of ownership. Such projects fall into two categories.
28
+
29
+ The first category is projects [1 ]_ that want complete control over their
30
+ namespace. A few examples:
28
31
29
- * `Typeshed <https://github.com/python/typeshed >`__ is a community effort to
30
- maintain type stubs for various packages. The stub packages they maintain
31
- mirror the package name they target and are prefixed by ``types- ``. For
32
- example, the package ``requests `` has a stub that users would depend on
33
- called ``types-requests ``.
34
32
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
35
- for each feature's corresponding package [1 ]_. For example, most of Google's
33
+ for each feature's corresponding package [3 ]_. For example, most of Google's
36
34
packages are prefixed by ``google-cloud- `` e.g. ``google-cloud-compute `` for
37
35
`using virtual machines <https://cloud.google.com/products/compute >`__.
38
- * Many projects [2 ]_ support a model where some packages are officially
39
- maintained and third-party developers are encouraged to participate by
40
- creating their own. For example, `Datadog <https://www.datadoghq.com >`__
41
- offers observability as a service for organizations at any scale. The
42
- `Datadog Agent <https://docs.datadoghq.com/agent/ >`__ ships out-of-the-box
43
- with
44
- `official integrations <https://github.com/DataDog/integrations-core >`__
45
- for many products, like various databases and web servers, which are
46
- distributed as Python packages that are prefixed by ``datadog- ``. There is
47
- support for creating `third-party integrations `__ which customers may run.
36
+ * `OpenTelemetry <https://opentelemetry.io >`__ is an open standard for
37
+ observability with `official packages `__ for the core APIs and SDK with
38
+ `contrib packages `__ to collect data from various sources. All packages
39
+ are prefixed by ``opentelemetry- `` with child prefixes in the form
40
+ ``opentelemetry-<component>-<name>- ``. The contrib packages live in a
41
+ central repository and they are the only ones with the ability to publish.
48
42
49
- __ https://docs.datadoghq.com/developers/integrations/agent_integration/
43
+ __ https://github.com/open-telemetry/opentelemetry-python
44
+ __ https://github.com/open-telemetry/opentelemetry-python-contrib
45
+
46
+ The second category is projects [2 ]_ that want to share their namespace such
47
+ that some packages are officially maintained and third-party developers are
48
+ encouraged to participate by publishing their own. Some examples:
49
+
50
+ * `Project Jupyter <https://jupyter.org >`__ is devoted to the development of
51
+ tooling for sharing interactive documents. They support `extensions `__
52
+ which in most cases (and in all cases for officially maintained
53
+ extensions) are prefixed by ``jupyter- ``.
54
+ * `Django <https://www.djangoproject.com >`__ is one of the most widely used web
55
+ frameworks in existence. They have the concept of `reusable apps `__, which
56
+ are commonly installed via
57
+ `third-party packages <https://djangopackages.org >`__ that implement a subset
58
+ of functionality to extend Django-based websites. These packages are by
59
+ convention prefixed by ``django- `` or ``dj- ``.
60
+
61
+ __ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
62
+ __ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
50
63
51
64
Such projects are uniquely vulnerable to name-squatting attacks
52
65
which can ultimately result in `dependency confusion `__.
53
66
54
67
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
55
68
56
69
For example, say a new product is released for which monitoring would be
57
- valuable. It would be reasonable to assume that Datadog would eventually
58
- support it as an official integration. It takes a nontrivial amount of time to
59
- deliver such an integration due to roadmap prioritization and the time required
60
- for implementation. It would be impossible to reserve the name of every
61
- potential package so in the interim an attacker may create a package that
62
- appears legitimate which would execute malicious code at runtime. Not only are
63
- users more likely to install such packages but doing so taints the perception
64
- of the entire project.
70
+ valuable. It would be reasonable to assume that
71
+ `Datadog <https://www.datadoghq.com >`__ would eventually support it as an
72
+ official integration. It takes a nontrivial amount of time to deliver such an
73
+ integration due to roadmap prioritization and the time required for
74
+ implementation. It would be impossible to reserve the name of every potential
75
+ package so in the interim an attacker may create a package that appears
76
+ legitimate which would execute malicious code at runtime. Not only are users
77
+ more likely to install such packages but doing so taints the perception of the
78
+ entire project.
65
79
66
80
Although :pep: `708 ` attempts to address this attack vector, it is specifically
67
81
about the case of multiple repositories being considered during dependency
@@ -71,7 +85,13 @@ Namespacing also would drastically reduce the incidence of
71
85
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting >`__
72
86
because typos would have to be in the prefix itself which is
73
87
`normalized <naming _>`_ and likely to be a short, well-known identifier like
74
- ``aws- ``.
88
+ ``aws- ``. In recent years, typosquatting has become a popular attack vector
89
+ [4 ]_.
90
+
91
+ The `current protection `__ against typosquatting used by PyPI is to normalize
92
+ similar characters but that is insufficient for these use cases.
93
+
94
+ __ https://github.com/pypi/warehouse/blob/8615326918a180eb2652753743eac8e74f96a90b/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
75
95
76
96
Rationale
77
97
=========
@@ -113,6 +133,11 @@ namespace. Any solution that requires new package syntax must be built atop the
113
133
existing flat namespace and therefore implicit namespaces acquired via a
114
134
reservation mechanism would be a prerequisite to such explicit namespaces.
115
135
136
+ Although existing packages matching a reserved namespace would be untouched,
137
+ preventing future unauthorized uploads and strategically applying :pep: `541 `
138
+ takedown requests for malicious cases would reduce risks to users to a
139
+ negligible level.
140
+
116
141
Terminology
117
142
===========
118
143
@@ -219,6 +244,8 @@ other organizations to use the grant. In this case, the authorized
219
244
organizations have no special permissions and are equivalent to an open grant
220
245
without ownership.
221
246
247
+ .. _hidden-grants :
248
+
222
249
Hidden Grants
223
250
-------------
224
251
@@ -235,7 +262,7 @@ restrictions without the need to expose the namespace to the public.
235
262
Repository Metadata
236
263
-------------------
237
264
238
- The :pep: `JSON API <691 >` version will be incremented from ``1.0 `` to ``1.1 ``.
265
+ The :pep: `JSON API <691 >` version will be incremented from ``1.2 `` to ``1.3 ``.
239
266
The following API changes MUST be implemented by repositories that support
240
267
this PEP. Repositories that do not support this PEP MUST NOT implement these
241
268
changes so that consumers of the API are able to determine whether the
@@ -295,6 +322,19 @@ When a reserved namespace becomes unclaimed, repositories MUST set the
295
322
Namespaces that were previously claimed but are now not SHOULD be eligible for
296
323
claiming again by any organization.
297
324
325
+ Community Buy-in
326
+ ================
327
+
328
+ Representatives from the following organizations have expressed support for
329
+ this PEP (with a link to the discussion):
330
+
331
+ * `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999 >`__
332
+ * `Typeshed <https://discuss.python.org/t/1609/37 >`__
333
+ * `Project Jupyter <https://discuss.python.org/t/61227/16 >`__
334
+ (`expanded <https://discuss.python.org/t/61227/48 >`__)
335
+ * `Microsoft <https://discuss.python.org/t/63191/40 >`__
336
+ * `DataDog <https://discuss.python.org/t/63191/53 >`__
337
+
298
338
Backwards Compatibility
299
339
=======================
300
340
@@ -358,6 +398,73 @@ packages released with the scoping would be incompatible with older tools and
358
398
would cause confusion for users along with frustration from maintainers having
359
399
to triage such complaints.
360
400
401
+ Encourage Dedicated Package Repositories
402
+ ----------------------------------------
403
+
404
+ Critically, this imposes a burden on projects to maintain their own infra. This
405
+ is an unrealistic expectation for the vast majority of companies and a complete
406
+ non-starter for community projects.
407
+
408
+ This does not help in most cases because the default behavior of most package
409
+ managers is to use PyPI so users attempting to perform a simple ``pip install ``
410
+ would already be vulnerable to malicious packages.
411
+
412
+ In this theoretical future every project must document how to add their
413
+ repository to dependency resolution, which would be different for each package
414
+ manager. Few package managers are able to download specific dependencies from
415
+ specific repositories and would require users to use verbose configuration in
416
+ the common case.
417
+
418
+ The ones that do not support this would instead find a given package using an
419
+ ordered enumeration of repositories, leading to dependency confusion.
420
+ For example, say a user wants two packages from two custom repositories ``X ``
421
+ and ``Y ``. If each repository has both packages but one is malicious on ``X ``
422
+ and the other is malicious on ``Y `` then the user would be unable to satisfy
423
+ their requirements without encountering a malicious package.
424
+
425
+ Use Fixed Prefixes
426
+ ------------------
427
+
428
+ The idea here would be to have one or more top-level fixed prefixes that are
429
+ used for namespace reservations:
430
+
431
+ * ``com- ``: Reserved for corporate organizations.
432
+ * ``org- ``: Reserved for community organizations.
433
+
434
+ Organizations would then apply for a namespace prefixed by the type of their
435
+ organization.
436
+
437
+ This would cause perpetual disruption because when projects begin it is unknown
438
+ whether a user base will be large enough to warrant a namespace reservation.
439
+ Whenever that happens the project would have to be renamed which would put a
440
+ high maintenance burden on the project maintainers and would cause confusion
441
+ for users who have to learn a new way to reference the project's packages.
442
+ The potential for this deterring projects from reserving namespaces at all is
443
+ high.
444
+
445
+ Another issue with this approach is that projects often have branding in mind
446
+ (`example `__) and would be reluctant to change their package names.
447
+
448
+ __ https://github.com/apache/airflow/discussions/41657#discussioncomment-10417439
449
+
450
+ It's unrealistic to expect every company and project to voluntarily change
451
+ their existing and future package names.
452
+
453
+ Use DNS
454
+ -------
455
+
456
+ The `idea <https://discuss.python.org/t/63455 >`__ here is to add a new
457
+ metadata field to projects in the API called ``domain-authority ``. Repositories
458
+ would support a new endpoint for verifying the domain via HTTPS. Clients would
459
+ then support options to allow certain domains.
460
+
461
+ This does not solve the problem for the target audience who do not check where
462
+ their packages are coming from and is more about checking for the integrity of
463
+ uploads which is already supported in a more secure way by :pep: `740 `.
464
+
465
+ Most projects do not have a domain and could not benefit from this, unfairly
466
+ favoring organizations that have the financial means to acquire one.
467
+
361
468
Open Issues
362
469
===========
363
470
@@ -366,56 +473,71 @@ None at this time.
366
473
Footnotes
367
474
=========
368
475
369
- .. [1 ] The following shows the package prefixes for the major cloud providers:
370
-
371
- - Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/ >`__
372
- - Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages >`__
373
- and others based on ``google- ``
374
- - Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk >`__
476
+ .. [1 ] Additional examples of projects with restricted namespaces:
375
477
376
- .. [2 ] Some examples of projects that have many packages with a common prefix:
377
-
378
- - `Django <https://www.djangoproject.com >`__ is one of the most widely used
379
- web frameworks in existence. They have the concept of `reusable apps `__,
380
- which are commonly installed via
381
- `third-party packages <https://djangopackages.org >`__ that implement a
382
- subset of functionality to extend Django-based websites. These packages
383
- are by convention prefixed by ``django- `` or ``dj- ``.
384
- - `Project Jupyter <https://jupyter.org >`__ is devoted to the development of
385
- tooling for sharing interactive documents. They support `extensions `__
386
- which in most cases (and in all cases for officially maintained
387
- extensions) are prefixed by ``jupyter- ``.
388
- - `pytest <https://docs.pytest.org >`__ is Python's most popular testing
389
- framework. They have the concept of `plugins `__ which may be developed by
390
- anyone and by convention are prefixed by ``pytest- ``.
391
- - `MkDocs <https://www.mkdocs.org >`__ is a documentation framework based on
392
- Markdown files. They also have the concept of
393
- `plugins <https://www.mkdocs.org/dev-guide/plugins/ >`__ which may be
394
- developed by anyone and are usually prefixed by ``mkdocs- ``.
478
+ - `Typeshed <https://github.com/python/typeshed >`__ is a community effort to
479
+ maintain type stubs for various packages. The stub packages they maintain
480
+ mirror the package name they target and are prefixed by ``types- ``. For
481
+ example, the package ``requests `` has a stub that users would depend on
482
+ called ``types-requests ``. Unofficial stubs are not supposed to use the
483
+ ``types- `` prefix and are expected to use a ``-stubs `` suffix instead.
395
484
- `Sphinx <https://www.sphinx-doc.org >`__ is a documentation framework
396
485
popular for large technical projects such as
397
486
`Swift <https://www.swift.org >`__ and Python itself. They have
398
487
the concept of `extensions `__ which are prefixed by ``sphinxcontrib- ``,
399
488
many of which are maintained within a
400
489
`dedicated organization <https://github.com/sphinx-contrib >`__.
401
- - `OpenTelemetry <https://opentelemetry.io >`__ is an open standard for
402
- observability with `official packages `__ for the core APIs and SDK with
403
- `third-party packages `__ to collect data from various sources. All
404
- packages are prefixed by ``opentelemetry- `` with child prefixes in the
405
- form ``opentelemetry-<component>-<name>- ``.
406
490
- `Apache Airflow <https://airflow.apache.org >`__ is a platform to
407
491
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
408
492
They have the concept of `plugins `__, and also `providers `__ which are
409
493
prefixed by ``apache-airflow-providers- ``.
410
494
411
- __ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
412
- __ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
413
- __ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
495
+ .. [2 ] Additional examples of projects with open namespaces:
496
+
497
+ - `pytest <https://docs.pytest.org >`__ is Python's most popular testing
498
+ framework. They have the concept of `plugins `__ which may be developed by
499
+ anyone and by convention are prefixed by ``pytest- ``.
500
+ - `MkDocs <https://www.mkdocs.org >`__ is a documentation framework based on
501
+ Markdown files. They also have the concept of
502
+ `plugins <https://www.mkdocs.org/dev-guide/plugins/ >`__ which may be
503
+ developed by anyone and are usually prefixed by ``mkdocs- ``.
504
+ - `Datadog <https://www.datadoghq.com >`__ offers observability as a service
505
+ for organizations at any scale. The
506
+ `Datadog Agent <https://docs.datadoghq.com/agent/ >`__ ships out-of-the-box
507
+ with
508
+ `official integrations <https://github.com/DataDog/integrations-core >`__
509
+ for many products, like various databases and web servers, which are
510
+ distributed as Python packages that are prefixed by ``datadog- ``. There is
511
+ support for creating `third-party integrations `__ which customers may run.
512
+
513
+ .. [3 ] The following shows the package prefixes for the major cloud providers:
514
+
515
+ - Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/ >`__
516
+ - Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages >`__
517
+ and others based on ``google- ``
518
+ - Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk >`__
519
+
520
+ .. [4 ] Examples of typosquatting attacks targeting Python users:
521
+
522
+ - ``django- `` namespace was squatted, among other packages, leading to
523
+ a `postmortem <https://mail.python.org/pipermail/security-announce/2017-September/000000.html >`__
524
+ by PyPI.
525
+ - ``cupy- `` namespace was
526
+ `squatted <https://github.com/cupy/cupy/issues/4787 >`__ by a malicious
527
+ actor thousands of times.
528
+ - ``scikit- `` namespace was
529
+ `squatted <https://blog.phylum.io/a-pypi-typosquatting-campaign-post-mortem/ >`__,
530
+ among other packages. Notice how packages with a known prefix are much
531
+ more prone to successful attacks.
532
+ - ``typing- `` namespace was
533
+ `squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5 >`__
534
+ and this would be useful to prevent as a `hidden grant <hidden-grants _>`__.
535
+
414
536
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
415
- __ https://github.com/open-telemetry/opentelemetry-python
416
- __ https://github.com/open-telemetry/opentelemetry-python-contrib
417
537
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
418
538
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
539
+ __ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
540
+ __ https://docs.datadoghq.com/developers/integrations/agent_integration/
419
541
420
542
Copyright
421
543
=========
0 commit comments