Skip to content

Commit 774b399

Browse files
authored
Merge pull request #1826 from abravalheri/reference-implementation-glob
Extract glob specification to a separated document and add reference implementation
2 parents 6c792a1 + 0a5fbef commit 774b399

File tree

3 files changed

+121
-26
lines changed

3 files changed

+121
-26
lines changed
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
=================
2+
``glob`` patterns
3+
=================
4+
5+
Some PyPA specifications, e.g. :ref:`pyproject.toml's license-files
6+
<pyproject-toml-license-files>`, accept certain types of *glob patterns*
7+
to match a given string containing wildcards and character ranges against
8+
files and directories. This specification defines which patterns are acceptable
9+
and how they should be handled.
10+
11+
12+
Valid glob patterns
13+
===================
14+
15+
For PyPA purposes, a *valid glob pattern* MUST be a string matched against
16+
filesystem entries as specified below:
17+
18+
- Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``)
19+
MUST be matched verbatim.
20+
21+
- Special glob characters: ``*``, ``?``, ``**`` and character ranges: ``[]``
22+
containing only the verbatim matched characters MUST be supported.
23+
Within ``[...]``, the hyphen indicates a locale-agnostic range (e.g. ``a-z``,
24+
order based on Unicode code points).
25+
Hyphens at the start or end are matched literally.
26+
27+
- Path delimiters MUST be the forward slash character (``/``).
28+
29+
- Patterns always refer to *relative paths*,
30+
e.g., when used in :file:`pyproject.toml`, patterns should always be
31+
relative to the directory containing that file.
32+
Therefore the leading slash character MUST NOT be used.
33+
34+
- Parent directory indicators (``..``) MUST NOT be used.
35+
36+
Any characters or character sequences not covered by this specification are
37+
invalid. Projects MUST NOT use such values.
38+
Tools consuming glob patterns SHOULD reject invalid values with an error.
39+
40+
Literal paths (e.g. :file:`LICENSE`) are valid globs which means they
41+
can also be defined.
42+
43+
Tools consuming glob patterns:
44+
45+
- MUST treat each value as a glob pattern, and MUST raise an error if the
46+
pattern contains invalid glob syntax.
47+
- MUST raise an error if any individual user-specified pattern does not match
48+
at least one file.
49+
50+
Examples of valid glob patterns:
51+
52+
.. code-block:: python
53+
54+
"LICEN[CS]E*"
55+
"AUTHORS*"
56+
"licenses/LICENSE.MIT"
57+
"licenses/LICENSE.CC0"
58+
"LICENSE.txt"
59+
"licenses/*"
60+
61+
Examples of invalid glob patterns:
62+
63+
.. code-block:: python
64+
65+
"..\LICENSE.MIT"
66+
# .. must not be used.
67+
# \ is an invalid path delimiter, / must be used.
68+
69+
"LICEN{CSE*"
70+
# the { character is not allowed
71+
72+
73+
Reference implementation in Python
74+
==================================
75+
76+
It is possible to defer the majority of the pattern matching against the file
77+
system to the :mod:`glob` module in Python's standard library. It is necessary
78+
however to perform additional validations.
79+
80+
The code below is as a simple reference implementation:
81+
82+
.. code-block:: python
83+
84+
import os
85+
import re
86+
from glob import glob
87+
88+
89+
def find_pattern(pattern: str) -> list[str]:
90+
"""
91+
>>> find_pattern("/LICENSE.MIT")
92+
Traceback (most recent call last):
93+
...
94+
ValueError: Pattern '/LICENSE.MIT' should be relative...
95+
>>> find_pattern("../LICENSE.MIT")
96+
Traceback (most recent call last):
97+
...
98+
ValueError: Pattern '../LICENSE.MIT' cannot contain '..'...
99+
>>> find_pattern("LICEN{CSE*")
100+
Traceback (most recent call last):
101+
...
102+
ValueError: Pattern 'LICEN{CSE*' contains invalid characters...
103+
"""
104+
if ".." in pattern:
105+
raise ValueError(f"Pattern {pattern!r} cannot contain '..'")
106+
if pattern.startswith((os.sep, "/")) or ":\\" in pattern:
107+
raise ValueError(
108+
f"Pattern {pattern!r} should be relative and must not start with '/'"
109+
)
110+
if re.match(r'^[\w\-\.\/\*\?\[\]]+$', pattern) is None:
111+
raise ValueError(f"Pattern '{pattern}' contains invalid characters.")
112+
found = glob(pattern, recursive=True)
113+
if not found:
114+
raise ValueError(f"Pattern '{pattern}' did not match any files.")
115+
return found

source/specifications/pyproject-toml.rst

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,8 @@ Tools SHOULD validate and perform case normalization of the expression.
247247
The table subkeys of the ``license`` key are deprecated.
248248

249249

250+
.. _pyproject-toml-license-files:
251+
250252
``license-files``
251253
-----------------
252254

@@ -260,43 +262,20 @@ configuration files, e.g. :file:`setup.py`, :file:`setup.cfg`, etc.)
260262
to file(s) containing licenses and other legal notices to be
261263
distributed with the package.
262264

263-
The strings MUST contain valid glob patterns, as specified below:
264-
265-
- Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``)
266-
MUST be matched verbatim.
267-
268-
- Special glob characters: ``*``, ``?``, ``**`` and character ranges: ``[]``
269-
containing only the verbatim matched characters MUST be supported.
270-
Within ``[...]``, the hyphen indicates a locale-agnostic range (e.g. ``a-z``,
271-
order based on Unicode code points).
272-
Hyphens at the start or end are matched literally.
265+
The strings MUST contain valid glob patterns, as specified in
266+
:doc:`/specifications/glob-patterns`.
273267

274-
- Path delimiters MUST be the forward slash character (``/``).
275-
Patterns are relative to the directory containing :file:`pyproject.toml`,
276-
therefore the leading slash character MUST NOT be used.
277-
278-
- Parent directory indicators (``..``) MUST NOT be used.
279-
280-
Any characters or character sequences not covered by this specification are
281-
invalid. Projects MUST NOT use such values.
282-
Tools consuming this field SHOULD reject invalid values with an error.
268+
Patterns are relative to the directory containing :file:`pyproject.toml`,
283269

284270
Tools MUST assume that license file content is valid UTF-8 encoded text,
285271
and SHOULD validate this and raise an error if it is not.
286272

287-
Literal paths (e.g. :file:`LICENSE`) are valid globs which means they
288-
can also be defined.
289-
290273
Build tools:
291274

292-
- MUST treat each value as a glob pattern, and MUST raise an error if the
293-
pattern contains invalid glob syntax.
294275
- MUST include all files matched by a listed pattern in all distribution
295276
archives.
296277
- MUST list each matched file path under a License-File field in the
297278
Core Metadata.
298-
- MUST raise an error if any individual user-specified pattern does not match
299-
at least one file.
300279

301280
If the ``license-files`` key is present and
302281
is set to a value of an empty array, then tools MUST NOT include any

source/specifications/section-distribution-metadata.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ Package Distribution Metadata
1414
inline-script-metadata
1515
platform-compatibility-tags
1616
well-known-project-urls
17+
glob-patterns

0 commit comments

Comments
 (0)