You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[PATCH] urllib.parse: Restrict IPv6 ZoneID characters to RFC 6874-compliant set
The current parsing logic for IPv6 addresses with Zone Identifiers (ZoneIDs)
uses the `ipaddress` module, which validates ZoneIDs according to RFC 4007,
allowing any non-null string. However, when used in URLs, ZoneIDs must follow
the percent-encoded format defined in RFC 6874.
This patch adds a check to restrict ZoneIDs to the allowed characters:
ALPHA / DIGIT / "-" / "." / "_" / "~" / "% HEXDIG HEXDIG"
RFC 6874 §2.1 specifies the format of an IPv6 address with a ZoneID in a URI as:
`IPv6addrz = IPv6address "%25" ZoneID`
Additionally, RFC 6874 recommends accepting a bare `%` without hex digits as a
liberal extension, but that flexibility still requires ZoneID content to conform
to a safe character set. This patch enforces that ZoneIDs do not include
characters outside the permitted range.
### Before the fix:
```py
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[::1%2|test]/path")
ParseResult(scheme='http', netloc='[::1%2|test]', path='/path', ...)
```
Invalid characters such as `|` were incorrectly accepted in ZoneIDs.
### After the fix:
```py
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[::1%2|test]/path")
Traceback (most recent call last):
...
ValueError: IPv6 ZoneID is invalid
```
This patch ensures `urllib.parse` properly rejects ZoneIDs with invalid characters,
improving compliance with the URI standards and helping prevent subtle bugs
or security vulnerabilities.
0 commit comments