Skip to content

Commit 58f0d09

Browse files
[PATCH] urllib.parse: Restrict IPv6 ZoneID characters to RFC 6874-compliant set
The current parsing logic for IPv6 addresses with Zone Identifiers (ZoneIDs) uses the `ipaddress` module, which validates ZoneIDs according to RFC 4007, allowing any non-null string. However, when used in URLs, ZoneIDs must follow the percent-encoded format defined in RFC 6874. This patch adds a check to restrict ZoneIDs to the allowed characters: ALPHA / DIGIT / "-" / "." / "_" / "~" / "% HEXDIG HEXDIG" RFC 6874 §2.1 specifies the format of an IPv6 address with a ZoneID in a URI as: `IPv6addrz = IPv6address "%25" ZoneID` Additionally, RFC 6874 recommends accepting a bare `%` without hex digits as a liberal extension, but that flexibility still requires ZoneID content to conform to a safe character set. This patch enforces that ZoneIDs do not include characters outside the permitted range. ### Before the fix: ```py >>> import urllib.parse >>> urllib.parse.urlparse("http://[::1%2|test]/path") ParseResult(scheme='http', netloc='[::1%2|test]', path='/path', ...) ``` Invalid characters such as `|` were incorrectly accepted in ZoneIDs. ### After the fix: ```py >>> import urllib.parse >>> urllib.parse.urlparse("http://[::1%2|test]/path") Traceback (most recent call last): ... ValueError: IPv6 ZoneID is invalid ``` This patch ensures `urllib.parse` properly rejects ZoneIDs with invalid characters, improving compliance with the URI standards and helping prevent subtle bugs or security vulnerabilities.
1 parent 2bd4ff0 commit 58f0d09

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

Lib/urllib/parse.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,8 @@ def _check_bracketed_host(hostname):
466466
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
467467
if isinstance(ip, ipaddress.IPv4Address):
468468
raise ValueError(f"An IPv4 address cannot be in brackets")
469+
if "%" in hostname and not re.match(r"\A(%[a-fA-F0-9]{2}|[\w\.~-])+\z", hostname.split("%", 1)[1]):
470+
raise ValueError(f"IPv6 ZoneID is invalid")
469471

470472
# typed=True avoids BytesWarnings being emitted during cache key
471473
# comparison since this API supports both bytes and str input.

0 commit comments

Comments
 (0)