Skip to content

Note greediness of PEP 723 reference parser #1960

@SnoopJ

Description

@SnoopJ

Issue Description

While preparing a PR for PEP 723 support in pip, I noticed that the reference parser defined by the PEP and listed in the PyPA docs will collate multiple adjacent /// TYPE blocks as a single match, even when separated by a comment line (the spec refers to it as a "content line"). This greedy collation is surprising and makes distinguishing error cases a little complicated, so I think it merits a warning in the docs if it is not possible to update the specification itself.

I believe this quirk is caused by the last + in the reference regex being greedy and matching all the way to the trailing /// instead of to the first available one. In my limited experimentation, replacing this quantifier with +? resolves the issue, producing the expected number of matches.

This shouldn't slip through anybody's code unnoticed, as the collation will produce invalid TOML (the interior /// is invalid syntax), but it is a surprising enough edge case that I thought to report it here.

click for code
import re

script_A = """
# /// script
# data (1)
# ///
#
# /// script
# data (2)
# ///
"""

script_B = """
# /// script
# data (1)
# ///

# /// script
# data (2)
# ///
"""

# These lines adapted from PEP 723's reference parser:
# https://peps.python.org/pep-0723/#reference-implementation

REGEX = r"(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$"
name = "script"
matches_A = list(
    filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_A))
)
matches_B = list(
    filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_B))
)

# output:
# 1
# 2
print(len(matches_A))
print(len(matches_B))

Code of Conduct

  • I am aware that participants in this repository must follow the PSF Code of Conduct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions