fix: grammar according to rfc3987#4
Conversation
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
| | h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" ls32 | ||
| | h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" h16 | ||
| | h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" | ||
| ipv6address: ( h16 ":" )~6 ls32 |
There was a problem hiding this comment.
this ipv6address needs some optimization, still. ...
currently simply transformed from ABNF to lark
|
|
||
| iunreserved: alpha | digit | "-" | "." | "_" | "~" | ucschar | ||
|
|
||
| ucschar: /[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]/ |
There was a problem hiding this comment.
some character ranges were forgotten
| iunreserved: alpha | digit | "-" | "." | "_" | "~" | ucschar | ||
|
|
||
| ucschar: /[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]/ | ||
| iprivate: /[\uE000-\uF8FF]/ |
There was a problem hiding this comment.
some character ranges were forgotten
|
|
||
| ipvfuture: "v" hexdig+ "." (unreserved | sub_delims | ":")+ | ||
|
|
||
| ipv6address: h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" ls32 |
There was a problem hiding this comment.
ipv6address is just wrong - it includes just edge cases ... not everything from the RFC
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
…cording_t-rfxc3987
2ab6465 to
448ba8c
Compare
|
ready for review |
|
@willynilly could I ask you for a review? |
|
i will do that later tonight |
|
do you need additional documentation? |
|
@willynilly ping |
|
@willynilly , cold I ask you for a review? |
|
@willynilly , could I ask for a code review? |
|
It looks like @willynilly hasn’t been active on GitHub for most of the past year. I hope that they are well. Given that this library is now a dependency for https://github.com/python-jsonschema/jsonschema, it’s unfortunate that it may be unmaintained. |
seee #15 |
There was a problem hiding this comment.
Pull request overview
This PR updates the project’s RFC 3987 Lark grammar to more closely match the RFC 3987 ABNF (notably expanding ucschar/iprivate Unicode ranges and refactoring the IPv6address production), and extends the JSON-based syntax test vectors to exercise the updated grammar.
Changes:
- Expand
ucscharandiprivategrammar definitions to include supplementary-plane ranges per RFC 3987. - Refactor
ipv6addressgrammar into a more compact EBNF form. - Add additional
iri_referencetest vectors coveringucschar,iprivate, and several IPv6address forms.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/rfc3987_syntax/syntax_rfc3987.lark |
Updates Unicode range terminals and refactors IPv6address grammar alternatives to better match RFC ABNF. |
tests/valid_syntax.json |
Adds new valid iri_reference examples targeting the updated Unicode and IPv6 grammar behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" h16 | ||
| | h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 ":" h16 "::" | ||
| ipv6address: ( h16 ":" )~6 ls32 | ||
| | "::" ( h16 ":" )~3 ls32 |
| "reason": "" | ||
| }, | ||
| { | ||
| "value": "ucschar/\u00A0-\uD7FF/\uF900-\uFFCF/\uFDF0-\uFFEF", |
| @@ -67,15 +67,15 @@ ip_literal: "[" (ipv6address | ipvfuture) "]" | |||
|
|
|||
| ipvfuture: "v" hexdig+ "." (unreserved | sub_delims | ":")+ | |||
|
|
|||
reviewed the grammar (lark) in comparison to the RFC 3987 (ABNF) - https://www.rfc-editor.org/rfc/rfc3987#section-2.2
found some issues and ...
see the tests failing in #5