Skip to content

urlparse() should decode percent-encoding in userinfo #123644

@SmartHypercube

Description

@SmartHypercube

Bug report

Bug description:

RFC 3986 3.2.1 says userinfo = *( unreserved / pct-encoded / sub-delims / ":" ), so userinfo can contain pct-encoded parts and I would expect them to be automatically decoded when parsed. I also tested and confirmed that cURL will decode percent-encoding in userinfo.

This decoding is important because username may be email address which contains @, and password may contain symbols like %, @, etc. If one manipulates URLs using urllib.parse, the current behavior can lead to hard-to-notice bugs.

How to reproduce

import urllib.parse
parse_result = urllib.parse.urlparse('http://a%40b:c%25d@e/')
print(parse_result.username, parse_result.password)

Expected output

a@b c%d

Actual output

a%40b c%25d

CPython versions tested on:

3.9, 3.12

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions