Skip to content

spec: grammar#578

Open
jkowalleck wants to merge 50 commits intopackage-url:mainfrom
jkowalleck:spec/grammar-ABNF
Open

spec: grammar#578
jkowalleck wants to merge 50 commits intopackage-url:mainfrom
jkowalleck:spec/grammar-ABNF

Conversation

@jkowalleck
Copy link
Member

@jkowalleck jkowalleck commented Aug 6, 2025

@jkowalleck jkowalleck requested review from a team and pombredanne August 6, 2025 12:38
@jkowalleck jkowalleck marked this pull request as draft August 6, 2025 12:42
@jkowalleck jkowalleck force-pushed the spec/grammar-ABNF branch 7 times, most recently from 32049cf to 0a144bd Compare August 6, 2025 13:05
Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck jkowalleck marked this pull request as ready for review August 6, 2025 13:13
@jkowalleck jkowalleck requested a review from mjherzog August 6, 2025 13:13
@jkowalleck
Copy link
Member Author

The ABNF is done according to latest spec, ready for review.

possible followup: have a pipeline that checks the test-suite against that ABNF.

@jkowalleck jkowalleck mentioned this pull request Aug 6, 2025
Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkowalleck, thanks for submitting this!

I'll check it in detail later, right now these two problems show up:

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the PR! 🎉

I only spotted a couple of issues, the rest looks solid.

Random thought while reading: what if we defined a common set of productions for the basic character sequences, and then split things into two separate ABNF grammars: one for canonical PURLs and one for non-canonical PURLs?

The spec is very lenient about repeated slashes /: they’re allowed not just at the start, but also smack in the middle of a namespace or subpath. Basically, you could take a nap with your finger on the / key and still be fully PURL-compliant. 😴///😴///😴

If you fall asleep, you'll need a bigger screen to display the PURL, though. 😉

@jkowalleck
Copy link
Member Author

The spec is very lenient about repeated slashes /: they’re allowed not just at the start, but also smack in the middle of a namespace or subpath. Basically, you could take a nap with your finger on the / key and still be fully PURL-compliant. 😴///😴///😴

tha's not what the current spec says.
unlimited slashes are allowed on the start/end:

- All leading and trailing slashes '/' are not significant and SHOULD be

but not on the middle:
- If present, the ``namespace`` MAY contain one or more segments, separated
by a single unencoded slash '/' character.

jkowalleck and others added 3 commits August 7, 2025 10:26
Co-authored-by: Piotr P. Karwasz <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@ppkarwasz
Copy link
Contributor

Hi @jkowalleck,

that's not what the current spec says. unlimited slashes are allowed on the start/end:

- All leading and trailing slashes '/' are not significant and SHOULD be

but not on the middle:

- If present, the ``namespace`` MAY contain one or more segments, separated
by a single unencoded slash '/' character.

The spec doesn’t mention slashes in the middle explicitly, but the spec instructs parsers to ignore empty path segments, which are precisely what repeated slashes (///) represent.

- The left side is the ``remainder``
- Strip the right side from leading and trailing '/'
- Split this on '/'
- Discard any empty string segment from that split
- Percent-decode each segment
- Discard any '.' or '..' segment from that split
- UTF-8-decode each segment if needed in your programming language
- Join segments back with a '/'
- This is the ``subpath``

So in practice, multiple slashes are allowed anywhere (start, middle, or end), since they collapse during parsing.

@jkowalleck
Copy link
Member Author

jkowalleck commented Aug 7, 2025

So in practice, multiple slashes are allowed anywhere (start, middle, or end), since they collapse during parsing.

I understand, but this is not the right scope for your concerns. I've built the grammar based on the spec, and not on the parser-rules.
If you feel there is something off in the spec, then better open a dedicated ticket for this.

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@mjherzog mjherzog dismissed their stale review October 17, 2025 19:30

The requested change was completed.

[ "/" namespace-canonical ] "/" name
[ "@" version ] [ "?" qualifiers-canonical ] [ "#" subpath-canonical ]

scheme = %x70.6B.67 ; lowercase string "pkg"
Copy link
Member Author

@jkowalleck jkowalleck Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you dont like this original ABNF case-sensitive notation, we could use RFC7405's case-sensitive notation:

Suggested change
scheme = %x70.6B.67 ; lowercase string "pkg"
scheme = %s"pkg"

@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 23, 2025

need to check whether anything in the grammar changed since the last commit here (Oct 1, 2025) (-> see this diff )
As PURL v1.0 is tagged and released, we should be able to finalize the grammar here.

PS: since the PURL core-spec is a standard now, I'd rather have the core-grammar be part of that standard, too.

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 23, 2025

re: #578 (comment)

reviewed/revisited the grammar according to Ecma standard.
current state of the PR reflects PURL spec v1.0

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a formal ABNF grammar specification for Package-URL (PURL), addressing issue #535. The grammar defines the structure and character encoding rules for both regular and canonical PURL formats.

  • Introduces comprehensive ABNF grammar rules for PURL components (scheme, type, namespace, name, version, qualifiers, and subpath)
  • Defines character encoding rules and permitted character classes following RFC5234
  • Specifies both canonical and non-canonical forms of PURL strings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
jkowalleck and others added 2 commits December 24, 2025 10:15
Co-authored-by: Piotr P. Karwasz <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck jkowalleck requested a review from a team December 24, 2025 09:31
@jkowalleck
Copy link
Member Author

@mjherzog @pombredanne please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation PURL core specification Format and syntax that define PURL (excludes PURL type definitions)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

publish the grammar

4 participants