Skip to content

Uri does not accept percent-encoded characters in hostname #836

@tremwil

Description

@tremwil

uri::authority::validate_authority_bytes (and thus Authority::from_static/from_maybe_shared) returns Err(AuthorityError::InvalidAuthority) for host names with percent-encoded characters.

I'm not sure if this is intentional, as percent-encoding is explicitly allowed for the userinfo part of the authority in the parsing logic, with this comment referencing RFC3986 (and specifically mentioning that % shouldn't be allowed in a host name):

// Per https://tools.ietf.org/html/rfc3986#section-3.2.1 and
// https://url.spec.whatwg.org/#authority-state
// the userinfo can have a percent-encoded username and password,
// so record that a `%` was found. If this turns out to be
// part of the userinfo, this flag will be cleared.
// Also per https://tools.ietf.org/html/rfc6874, percent-encoding can
// be used to indicate a zone identifier.
// If the flag hasn't been cleared at the end, that means this
// was part of the host name (and not part of an IPv6 address), and
// will fail with an error.

However, RFC3986 section 3.2.2 defines the host part of the authority as

host = IP-literal / IPv4address / reg-name

with reg-name being defined a few paragraphs later as

reg-name = *( unreserved / pct-encoded / sub-delims )

https://url.spec.whatwg.org/#host-parsing also agrees that the host can contain percent-encoded characters. So, from my understanding percent-encoded characters in a host name should be allowed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions