Skip to content

fix: grammar according to rfc3987#4

Open
jkowalleck wants to merge 4 commits into
willynilly:mainfrom
jkowalleck:fix/grammar_according_t-rfxc3987
Open

fix: grammar according to rfc3987#4
jkowalleck wants to merge 4 commits into
willynilly:mainfrom
jkowalleck:fix/grammar_according_t-rfxc3987

Conversation

@jkowalleck

@jkowalleck jkowalleck commented Jul 24, 2025

Copy link
Copy Markdown
Collaborator

reviewed the grammar (lark) in comparison to the RFC 3987 (ABNF) - https://www.rfc-editor.org/rfc/rfc3987#section-2.2

found some issues and ...

  • fixed them.
  • added tests for them

see the tests failing in #5

Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
| h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" ls32
| h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" h16
| h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::"
ipv6address: ( h16 ":" )~6 ls32

@jkowalleck jkowalleck Jul 24, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ipv6address needs some optimization, still. ...
currently simply transformed from ABNF to lark


iunreserved: alpha | digit | "-" | "." | "_" | "~" | ucschar

ucschar: /[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some character ranges were forgotten

iunreserved: alpha | digit | "-" | "." | "_" | "~" | ucschar

ucschar: /[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]/
iprivate: /[\uE000-\uF8FF]/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some character ranges were forgotten


ipvfuture: "v" hexdig+ "." (unreserved | sub_delims | ":")+

ipv6address: h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" ls32

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipv6address is just wrong - it includes just edge cases ... not everything from the RFC

@jkowalleck jkowalleck changed the title [WIP] fix: grammar according t rfc3987 [WIP] fix: grammar according to rfc3987 Jul 25, 2025
@jkowalleck jkowalleck force-pushed the fix/grammar_according_t-rfxc3987 branch from 2ab6465 to 448ba8c Compare August 15, 2025 15:24
@jkowalleck jkowalleck changed the title [WIP] fix: grammar according to rfc3987 fix: grammar according to rfc3987 Aug 15, 2025
@jkowalleck jkowalleck marked this pull request as ready for review August 15, 2025 15:24
@jkowalleck

Copy link
Copy Markdown
Collaborator Author

ready for review

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

@willynilly could I ask you for a review?

@willynilly

Copy link
Copy Markdown
Owner

i will do that later tonight

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

do you need additional documentation?

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

@willynilly ping

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

@willynilly , cold I ask you for a review?

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

@willynilly , could I ask for a code review?

@musicinmybrain

Copy link
Copy Markdown

It looks like @willynilly hasn’t been active on GitHub for most of the past year. I hope that they are well.

Given that this library is now a dependency for https://github.com/python-jsonschema/jsonschema, it’s unfortunate that it may be unmaintained.

@jkowalleck

Copy link
Copy Markdown
Collaborator Author

Given that this library is now a dependency for https://github.com/python-jsonschema/jsonschema, it’s unfortunate that it may be unmaintained.

seee #15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the project’s RFC 3987 Lark grammar to more closely match the RFC 3987 ABNF (notably expanding ucschar/iprivate Unicode ranges and refactoring the IPv6address production), and extends the JSON-based syntax test vectors to exercise the updated grammar.

Changes:

  • Expand ucschar and iprivate grammar definitions to include supplementary-plane ranges per RFC 3987.
  • Refactor ipv6address grammar into a more compact EBNF form.
  • Add additional iri_reference test vectors covering ucschar, iprivate, and several IPv6address forms.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/rfc3987_syntax/syntax_rfc3987.lark Updates Unicode range terminals and refactors IPv6address grammar alternatives to better match RFC ABNF.
tests/valid_syntax.json Adds new valid iri_reference examples targeting the updated Unicode and IPv6 grammar behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" h16
| h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::"
ipv6address: ( h16 ":" )~6 ls32
| "::" ( h16 ":" )~3 ls32
Comment thread tests/valid_syntax.json
"reason": ""
},
{
"value": "ucschar/\u00A0-\uD7FF/\uF900-\uFFCF/\uFDF0-\uFFEF",
@@ -67,15 +67,15 @@ ip_literal: "[" (ipv6address | ipvfuture) "]"

ipvfuture: "v" hexdig+ "." (unreserved | sub_delims | ":")+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants